Try running CNN with ChainerRL

Introduction

I changed the sample of QuickStart a little and moved CNN. The target of learning is Atari's "Pong-v0". ChainerRL Quickstart Guide

I am referencing this article. Try using chainerRL

I can't tell if I'm learning properly due to lack of knowledge about Linux, Python, and reinforcement learning, but I've confirmed that it works. Please give us any mistakes or advice.

environment

OS: ubuntu 16.04 python: 3.6.0 chainer: 1.21.0

Package import

There are two main changes below. I used it to grayscale and resize the game screen.

train.py


import chainer
import chainer.functions as F
import chainer.links as L
import chainerrl
import gym
import numpy as np
import datetime
from skimage.color import rgb2gray
from skimage.transform import resize

Game selection etc.

I haven't changed this area.

train.py


env = gym.make('Pong-v0')
obs = env.reset()
env.render()

agent settings, etc.

I don't know how to set it up, so I'm just using the old model for CNN.

train.py


class QFunction(chainer.Chain):
    def __init__(self, n_history=1, n_action=6):
        super().__init__(
            l1=L.Convolution2D(n_history, 32, ksize=8, stride=4, nobias=False, wscale=np.sqrt(2)),
            l2=L.Convolution2D(32, 64, ksize=3, stride=2, nobias=False, wscale=np.sqrt(2)),
            l3=L.Convolution2D(64, 64, ksize=3, stride=1, nobias=False, wscale=np.sqrt(2)),
            l4=L.Linear(3136, 512, wscale=np.sqrt(2)),
            out=L.Linear(512, n_action, initialW=np.zeros((n_action, 512), dtype=np.float32))
        )

    def __call__(self, x, test=False):
        s = chainer.Variable(x)
        h1 = F.relu(self.l1(s))
        h2 = F.relu(self.l2(h1))
        h3 = F.relu(self.l3(h2))
        h4 = F.relu(self.l4(h3))
        h5 = self.out(h4)
        return chainerrl.action_value.DiscreteActionValue(h5)

The same applies to this. I haven't studied enough. n_history is used to mean a channel. This time I made it grayscale, so the channel is 1.

train.py


n_action = env.action_space.n
n_history=1
q_func = QFunction(n_history, n_action)

optimizer settings, etc.

Changed capacity from 10 ** 6.

optimizer = chainer.optimizers.Adam(eps=1e-2)
optimizer.setup(q_func)

gamma = 0.95

explorer = chainerrl.explorers.ConstantEpsilonGreedy(
    epsilon=0.3, random_action_func=env.action_space.sample)

replay_buffer = chainerrl.replay_buffer.ReplayBuffer(capacity=10 ** 4)

phi = lambda x: x.astype(np.float32, copy=False)

Game progress etc.

train.py


agent = chainerrl.agents.DoubleDQN(
    q_func, optimizer, replay_buffer, gamma, explorer,
    minibatch_size=4, replay_start_size=500, update_frequency=1,
    target_update_frequency=100, phi=phi)

last_time = datetime.datetime.now()
n_episodes = 1000
for i in range(1, n_episodes + 1):
    obs = resize(rgb2gray(env.reset()),(80,80))
    obs = obs[np.newaxis, :, :]

    reward = 0
    done = False
    R = 0

    while not done:
        action = agent.act_and_train(obs, reward)
        obs, reward, done, _ = env.step(action)
        obs = resize(rgb2gray(obs), (80, 80))
        obs = obs[np.newaxis, :, :]

        if reward != 0:
            R += reward

    elapsed_time = datetime.datetime.now() - last_time
    print('episode:', i, '/', n_episodes,
          'reward:', R,
          'minutes:', elapsed_time.seconds/60)
    last_time = datetime.datetime.now()

    if i % 100 == 0:
        filename = 'agent_Breakout' + str(i)
        agent.save(filename)

    agent.stop_episode_and_train(obs, reward, done)
print('Finished.')

The main changes are these two lines. The first line is grayscale and resizing. In the second line, I changed the shape to put it in Convolution2D.

obs = resize(rgb2gray(env.reset()),(80,80))
obs = obs[np.newaxis, :, :]

I used a laptop with 8GB of memory, but if I set the capacity to 10 ** 6 and not grayscale, it will be killed around 300 episodes. I don't know which one works, but these two changes have fixed it.

If you study about 200 episodes, you will get 21 points in a row. I got about 5 points in 1000 episodes. Learning 1000 episodes takes a whole day.

I will post it because I think it may be helpful for beginners. If you have any mistakes or points to be improved, please give us some advice.

Recommended Posts

Try running CNN with ChainerRL
Try running Python with Try Jupyter
Reinforcement learning 13 Try Mountain_car with ChainerRL.
Try running Jupyter with VS Code
Try running Pyston 0.1
Reinforcement learning 11 Try OpenAI acrobot with ChainerRL.
Try running Google Chrome with Python and Selenium
Try running Amazon Timestream
Try scraping with Python.
Try SNN with BindsNET
Try regression with TensorFlow
Try to factorial with recursion
Try running python in a Django environment created with pipenv
CNN with keras Try it with the image you picked up
Try deep learning with TensorFlow
CNN implementation with just numpy
Try using PythonTex with Texpad.
Try implementing RBM with chainer.
Try Google Mock with C
Try using matplotlib with PyCharm
Try programming with a shell!
Try GUI programming with Hy
Try an autoencoder with Pytorch
Try Python output with Haxe 3.2
Try matrix operation with NumPy
Try implementing XOR with PyTorch
Try various things with PhantomJS
Try Deep Learning with FPGA
Easily build CNN with Keras
Use Maxout + CNN with Pylearn2
Try implementing perfume with Go
Survive Christmas with character-level CNN
Try Selenium Grid with Docker
Try face recognition with Python
Try OpenCV with Google Colaboratory
Try machine learning with Kaggle
Try TensorFlow MNIST with RNN
Try building JupyterHub with Docker
Try using folium with anaconda
Try to extract the features of the sensor data with CNN
[Machine learning] Try running Spark MLlib with Python and make recommendations
Try Deep Learning with FPGA-Select Cucumbers
Decrypt the QR code with CNN
Try scraping with Python + Beautiful Soup
Robot running with Arduino and python
Try to operate Facebook with Python
Tips for running Go with docker
Try singular value decomposition with Python
Try deep learning with TensorFlow Part 2
Try http-prompt with interactive http access
Try audio signal processing with librosa-Beginner
Try face recognition with Generated Photos
Try horse racing prediction with Chainer
Try to profile with ONNX Runtime
Try machine learning with scikit-learn SVM
Try L Chika with raspberry pi
Try face recognition with python + OpenCV
Try mining Bitcoin with Python's hashlib
Beginner RNN (LSTM) | Try with Keras
Try moving 3 servos with Raspberry Pi
Try frequency control simulation with Python