Introduction

I only had an image of reinforcement learning, so I actually tried it for the time being. This time, we are running a reinforcement learning algorithm on a tool called OpenAI Gym.

I'm just trying to move it for the time being, so I haven't explained the reinforcement learning algorithm in detail.

Target audience

――I want to actually move reinforcement learning

What is reinforcement learning?

Reinforcement learning is a mechanism that learns actions to maximize rewards through trial and error in a certain environment **. With the advent of deep learning, you can do more, and the famous AlphaGo also uses reinforcement learning.

What is OpenAI Gym?

OpenAI Gym is a tool for developing / comparing reinforcement learning algorithms. You can try reinforcement learning algorithms in various environments such as stick stands, mountain climbing by car, and Space Invaders.

Reference: List of environments that can be used with Gym

Try to move

For the time being, let's move the gym. Here, it is operated in a stick-standing environment.

The action (whether to move the place where the stick stands to the left or right) is decided at random.

import gym

#Environment generation
env = gym.make('CartPole-v0')

for i_episode in range(20):
    #Initialize the environment and get obsersavation
    observation = env.reset()
    for t in range(100):
        env.render()
        print(observation)
        #Action decision(random)
        action = env.action_space.sample()
        #Get data after action
        observation, reward, done, info = env.step(action)
        if done:
            print("Episode finished after {} timesteps".format(t+1))
            break
env.close()

Since the action is decided randomly, it will collapse immediately.

Move with reinforcement learning

Then use a reinforcement learning algorithm to select the action. This time, we will train using an algorithm called DQN (Deep Q-Learning).

Library to use

Use keras-rl for the reinforcement learning library. However, please note that if you are using Keras integrated from ** tensorflow 2, you need to use keras-rl2 **.

The version of the library used this time is as follows.

keras-rl2==1.0.4
tensorflow==2.3.0

Learn with DQN

Then, let's actually learn using DQN.

One episode corresponds to the end of the stick stand, and one action to move the place where the stick stands to the left or right corresponds to one step. Here, we will train up to 50,000 steps.

import gym
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Activation, Flatten
from tensorflow.keras.optimizers import Adam
from rl.agents.dqn import DQNAgent
from rl.policy import BoltzmannQPolicy
from rl.memory import SequentialMemory

#Environment generation
env = gym.make('CartPole-v0')
nb_actions = env.action_space.n

#Model definition
model = Sequential()
model.add(Flatten(input_shape=(1,) + env.observation_space.shape))
model.add(Dense(16))
model.add(Activation('relu'))
model.add(Dense(16))
model.add(Activation('relu'))
model.add(Dense(16))
model.add(Activation('relu'))
model.add(Dense(nb_actions))
model.add(Activation('linear'))

#Agent settings
memory = SequentialMemory(limit=50000, window_length=1)
policy = BoltzmannQPolicy()
dqn = DQNAgent(model=model, nb_actions=nb_actions, memory=memory, nb_steps_warmup=10,
               target_model_update=1e-2, policy=policy)
dqn.compile(Adam(lr=1e-3), metrics=['mae'])

#Learning
dqn.fit(env, nb_steps=50000, visualize=True, verbose=2)

#Test the model
dqn.test(env, nb_episodes=5, visualize=True)

After learning 64 episodes ...

After learning 216 episodes ...

Tested with a trained model

As a result of learning 50000 steps, it became possible to stand a stick stably as follows.

kaggle competition

Even at kaggle, which is a platform for machine learning competitions, there was a competition in which models trained in reinforcement learning compete against each other.

Connect X | Kaggle

This competition is in a format called Connect Four, in which the trained agents fight against each other to determine the rate and rank. It's quite interesting to let the agent you made fight, so please try it.

There is also a lecture in kaggle where you can learn game AI and reinforcement learning through the Connect X competition, so I think it's a good idea to try it from here. Learn Intro to Game AI and Reinforcement Learning Tutorials | Kaggle

Summary

――For the time being, I was able to run the reinforcement learning algorithm. ――I want to try it in other environments

In the future, I would like to firmly understand the algorithm inside.

reference

-[Deep Learning Textbook Deep Learning G Test (Generalist) Official Text](https://www.amazon.co.jp/%E6%B7%B1%E5%B1%A4%E5%AD%A6%E7%BF % 92% E6% 95% 99% E7% A7% 91% E6% 9B% B8-% E3% 83% 87% E3% 82% A3% E3% 83% BC% E3% 83% 97% E3% 83% A9% E3% 83% BC% E3% 83% 8B% E3% 83% B3% E3% 82% B0-G% E6% A4% 9C% E5% AE% 9A-% E3% 82% B8% E3% 82 % A7% E3% 83% 8D% E3% 83% A9% E3% 83% AA% E3% 82% B9% E3% 83% 88-% E5% 85% AC% E5% BC% 8F% E3% 83% 86% E3% 82% AD% E3% 82% B9% E3% 83% 88 / dp / 4798157554)

[Introduction to Reinforcement Learning] Reinforcement learning to try moving for the time being