[Introduction to Reinforcement Learning] Reinforcement learning to try moving for the time being

Introduction

I only had an image of reinforcement learning, so I actually tried it for the time being. This time, we are running a reinforcement learning algorithm on a tool called OpenAI Gym.

I'm just trying to move it for the time being, so I haven't explained the reinforcement learning algorithm in detail.

Target audience

――I want to actually move reinforcement learning

What is reinforcement learning?

Reinforcement learning is a mechanism that learns actions to maximize rewards through trial and error in a certain environment **. With the advent of deep learning, you can do more, and the famous AlphaGo also uses reinforcement learning.

What is OpenAI Gym?

OpenAI Gym is a tool for developing / comparing reinforcement learning algorithms. You can try reinforcement learning algorithms in various environments such as stick stands, mountain climbing by car, and Space Invaders.

Reference: List of environments that can be used with Gym

Try to move

For the time being, let's move the gym. Here, it is operated in a stick-standing environment.

The action (whether to move the place where the stick stands to the left or right) is decided at random.

import gym

#Environment generation
env = gym.make('CartPole-v0')

for i_episode in range(20):
    #Initialize the environment and get obsersavation
    observation = env.reset()
    for t in range(100):
        env.render()
        print(observation)
        #Action decision(random)
        action = env.action_space.sample()
        #Get data after action
        observation, reward, done, info = env.step(action)
        if done:
            print("Episode finished after {} timesteps".format(t+1))
            break
env.close()

random.gif

Since the action is decided randomly, it will collapse immediately.

Move with reinforcement learning

Then use a reinforcement learning algorithm to select the action. This time, we will train using an algorithm called DQN (Deep Q-Learning).

Library to use

Use keras-rl for the reinforcement learning library. However, please note that if you are using Keras integrated from ** tensorflow 2, you need to use keras-rl2 **.

The version of the library used this time is as follows.

keras-rl2==1.0.4
tensorflow==2.3.0

Learn with DQN

Then, let's actually learn using DQN.

One episode corresponds to the end of the stick stand, and one action to move the place where the stick stands to the left or right corresponds to one step. Here, we will train up to 50,000 steps.

import gym
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Activation, Flatten
from tensorflow.keras.optimizers import Adam
from rl.agents.dqn import DQNAgent
from rl.policy import BoltzmannQPolicy
from rl.memory import SequentialMemory

#Environment generation
env = gym.make('CartPole-v0')
nb_actions = env.action_space.n

#Model definition
model = Sequential()
model.add(Flatten(input_shape=(1,) + env.observation_space.shape))
model.add(Dense(16))
model.add(Activation('relu'))
model.add(Dense(16))
model.add(Activation('relu'))
model.add(Dense(16))
model.add(Activation('relu'))
model.add(Dense(nb_actions))
model.add(Activation('linear'))

#Agent settings
memory = SequentialMemory(limit=50000, window_length=1)
policy = BoltzmannQPolicy()
dqn = DQNAgent(model=model, nb_actions=nb_actions, memory=memory, nb_steps_warmup=10,
               target_model_update=1e-2, policy=policy)
dqn.compile(Adam(lr=1e-3), metrics=['mae'])

#Learning
dqn.fit(env, nb_steps=50000, visualize=True, verbose=2)

#Test the model
dqn.test(env, nb_episodes=5, visualize=True)

After learning 64 episodes ...

dqn_64.gif

After learning 216 episodes ...

dqn_216.gif

Tested with a trained model

As a result of learning 50000 steps, it became possible to stand a stick stably as follows.

dqn_test.gif

kaggle competition

Even at kaggle, which is a platform for machine learning competitions, there was a competition in which models trained in reinforcement learning compete against each other.

Connect X | Kaggle image.png

This competition is in a format called Connect Four, in which the trained agents fight against each other to determine the rate and rank. It's quite interesting to let the agent you made fight, so please try it.

There is also a lecture in kaggle where you can learn game AI and reinforcement learning through the Connect X competition, so I think it's a good idea to try it from here. Learn Intro to Game AI and Reinforcement Learning Tutorials | Kaggle

Summary

――For the time being, I was able to run the reinforcement learning algorithm. ――I want to try it in other environments

In the future, I would like to firmly understand the algorithm inside.

reference

-[Deep Learning Textbook Deep Learning G Test (Generalist) Official Text](https://www.amazon.co.jp/%E6%B7%B1%E5%B1%A4%E5%AD%A6%E7%BF % 92% E6% 95% 99% E7% A7% 91% E6% 9B% B8-% E3% 83% 87% E3% 82% A3% E3% 83% BC% E3% 83% 97% E3% 83% A9% E3% 83% BC% E3% 83% 8B% E3% 83% B3% E3% 82% B0-G% E6% A4% 9C% E5% AE% 9A-% E3% 82% B8% E3% 82 % A7% E3% 83% 8D% E3% 83% A9% E3% 83% AA% E3% 82% B9% E3% 83% 88-% E5% 85% AC% E5% BC% 8F% E3% 83% 86% E3% 82% AD% E3% 82% B9% E3% 83% 88 / dp / 4798157554)

Recommended Posts

[Introduction to Reinforcement Learning] Reinforcement learning to try moving for the time being
Molecular dynamics simulation to try for the time being
Try using LINE Notify for the time being
Try posting to Qiita for the first time
[Python] [Machine learning] Beginners without any knowledge try machine learning for the time being
Deep Reinforcement Learning 1 Introduction to Reinforcement Learning
Flow memo to move LOCUST for the time being
Next to Excel, for the time being, jupyter notebook
Try adding an external module to pepper. For the time being, in requests.
Try Q-learning in Dragon Quest-style battle [Introduction to Reinforcement Learning]
For the time being, try using the docomo chat dialogue API
I want to create a Dockerfile for the time being.
Introduction to Deep Learning for the first time (Chainer) Japanese character recognition Chapter 1 [Environment construction]
Java programmer tried to touch Go language (for the time being)
Before the introduction to machine learning. ~ Technology required for machine learning other than machine learning ~
For the time being, I want to convert files with ffmpeg !!
Try using FireBase Cloud Firestore in Python for the time being
Python Master RTA for the time being
An introduction to OpenCV for machine learning
Let's try Linux for the first time
An introduction to Python for machine learning
[Reinforcement learning] Search for the best route
[Introduction] Reinforcement learning
Introduction to Deep Learning for the first time (Chainer) Japanese character recognition Chapter 3 [Character recognition using a model]
Take the free "Introduction to Python for Machine Learning" online until 4/27 application
How to use MkDocs for the first time
Make a histogram for the time being (matplotlib)
Use logger with Python for the time being
Run yolov4 "for the time being" on windows
I played with Floydhub for the time being
Introduction to Deep Learning for the first time (Chainer) Japanese character recognition Chapter 2 [Model generation by machine learning]
virtualenv For the time being, this is all!
[For beginners] Introduction to vectorization in machine learning
GTUG Girls + PyLadiesTokyo Meetup I went to machine learning for the first time
Reinforcement learning for tic-tac-toe
Introduction to machine learning
Run with CentOS7 + Apache2.4 + Python3.6 for the time being
I will install Arch Linux for the time being.
Introduction to Deep Learning for the first time (Chainer) Japanese character recognition Chapter 4 [Improvement of recognition accuracy by expanding data]
Kaggle for the first time (kaggle ①)
[Learning memorandum] Introduction to vim
Try to evaluate the performance of machine learning / regression model
An introduction to machine learning
I tried running PIFuHD on Windows for the time being
Try to evaluate the performance of machine learning / classification model
[Introduction to matplotlib] Read the end time from COVID-19 data ♬
Kaguru for the first time
I want to use the Ubuntu desktop environment on Android for the time being (Termux version)
I want to use Ubuntu's desktop environment on Android for the time being (UserLAnd version)
Introduction to Deep Learning ~ Learning Rules ~
Super introduction to machine learning
Introduction to Python For, While
Introduction to Deep Learning ~ Backpropagation ~
If you're learning Linux for the first time, do this!
I made a function to check if the webhook is received in Lambda for the time being
Differences C # engineers felt when learning python for the first time
Python learning memo for machine learning by Chainer Chapter 8 Introduction to Numpy
Understanding the python class Struggle (1) Let's move it for the time being
Python learning memo for machine learning by Chainer Chapter 10 Introduction to Cupy
[Introduction to Python] How to use the in operator in a for statement?
Challenge image classification by TensorFlow2 + Keras 1-Move for the time being-