I only had an image of reinforcement learning, so I actually tried it for the time being. This time, we are running a reinforcement learning algorithm on a tool called OpenAI Gym.
I'm just trying to move it for the time being, so I haven't explained the reinforcement learning algorithm in detail.
――I want to actually move reinforcement learning
Reinforcement learning is a mechanism that learns actions to maximize rewards through trial and error in a certain environment **. With the advent of deep learning, you can do more, and the famous AlphaGo also uses reinforcement learning.
OpenAI Gym is a tool for developing / comparing reinforcement learning algorithms. You can try reinforcement learning algorithms in various environments such as stick stands, mountain climbing by car, and Space Invaders.
Reference: List of environments that can be used with Gym
For the time being, let's move the gym. Here, it is operated in a stick-standing environment.
The action (whether to move the place where the stick stands to the left or right) is decided at random.
import gym
#Environment generation
env = gym.make('CartPole-v0')
for i_episode in range(20):
#Initialize the environment and get obsersavation
observation = env.reset()
for t in range(100):
env.render()
print(observation)
#Action decision(random)
action = env.action_space.sample()
#Get data after action
observation, reward, done, info = env.step(action)
if done:
print("Episode finished after {} timesteps".format(t+1))
break
env.close()
Since the action is decided randomly, it will collapse immediately.
Then use a reinforcement learning algorithm to select the action. This time, we will train using an algorithm called DQN (Deep Q-Learning).
Use keras-rl for the reinforcement learning library. However, please note that if you are using Keras integrated from ** tensorflow 2, you need to use keras-rl2 **.
The version of the library used this time is as follows.
keras-rl2==1.0.4
tensorflow==2.3.0
Then, let's actually learn using DQN.
One episode corresponds to the end of the stick stand, and one action to move the place where the stick stands to the left or right corresponds to one step. Here, we will train up to 50,000 steps.
import gym
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Activation, Flatten
from tensorflow.keras.optimizers import Adam
from rl.agents.dqn import DQNAgent
from rl.policy import BoltzmannQPolicy
from rl.memory import SequentialMemory
#Environment generation
env = gym.make('CartPole-v0')
nb_actions = env.action_space.n
#Model definition
model = Sequential()
model.add(Flatten(input_shape=(1,) + env.observation_space.shape))
model.add(Dense(16))
model.add(Activation('relu'))
model.add(Dense(16))
model.add(Activation('relu'))
model.add(Dense(16))
model.add(Activation('relu'))
model.add(Dense(nb_actions))
model.add(Activation('linear'))
#Agent settings
memory = SequentialMemory(limit=50000, window_length=1)
policy = BoltzmannQPolicy()
dqn = DQNAgent(model=model, nb_actions=nb_actions, memory=memory, nb_steps_warmup=10,
target_model_update=1e-2, policy=policy)
dqn.compile(Adam(lr=1e-3), metrics=['mae'])
#Learning
dqn.fit(env, nb_steps=50000, visualize=True, verbose=2)
#Test the model
dqn.test(env, nb_episodes=5, visualize=True)
After learning 64 episodes ...
After learning 216 episodes ...
As a result of learning 50000 steps, it became possible to stand a stick stably as follows.
Even at kaggle, which is a platform for machine learning competitions, there was a competition in which models trained in reinforcement learning compete against each other.
This competition is in a format called Connect Four, in which the trained agents fight against each other to determine the rate and rank. It's quite interesting to let the agent you made fight, so please try it.
There is also a lecture in kaggle where you can learn game AI and reinforcement learning through the Connect X competition, so I think it's a good idea to try it from here. Learn Intro to Game AI and Reinforcement Learning Tutorials | Kaggle
――For the time being, I was able to run the reinforcement learning algorithm. ――I want to try it in other environments
In the future, I would like to firmly understand the algorithm inside.
-[Deep Learning Textbook Deep Learning G Test (Generalist) Official Text](https://www.amazon.co.jp/%E6%B7%B1%E5%B1%A4%E5%AD%A6%E7%BF % 92% E6% 95% 99% E7% A7% 91% E6% 9B% B8-% E3% 83% 87% E3% 82% A3% E3% 83% BC% E3% 83% 97% E3% 83% A9% E3% 83% BC% E3% 83% 8B% E3% 83% B3% E3% 82% B0-G% E6% A4% 9C% E5% AE% 9A-% E3% 82% B8% E3% 82 % A7% E3% 83% 8D% E3% 83% A9% E3% 83% AA% E3% 82% B9% E3% 83% 88-% E5% 85% AC% E5% BC% 8F% E3% 83% 86% E3% 82% AD% E3% 82% B9% E3% 83% 88 / dp / 4798157554)
Recommended Posts