Reinforcement learning 10 Try using a trained neural network.

It is assumed that you have completed reinforcement learning 9. Development uses jupyter notebook. Since VSCode is not used, it is easy to switch.

Chainer RL quick start as it is. First, install matplotlib.

pip install matplotlib

The following is a copy from Jupyter notebook.

import chainer
import chainer.functions as F
import chainer.links as L
import chainerrl
import gym
import numpy as np
env = gym.make('CartPole-v0')
print('observation space:', env.observation_space)
print('action space:', env.action_space)

obs = env.reset()
#env.render()
print('initial observation:', obs)

action = env.action_space.sample()
obs, r, done, info = env.step(action)
print('next observation:', obs)
print('reward:', r)
print('done:', done)
print('info:', info)
class QFunction(chainer.Chain):

    def __init__(self, obs_size, n_actions, n_hidden_channels=50):
        super().__init__()
        with self.init_scope():
            self.l0 = L.Linear(obs_size, n_hidden_channels)
            self.l1 = L.Linear(n_hidden_channels, n_hidden_channels)
            self.l2 = L.Linear(n_hidden_channels, n_actions)

    def __call__(self, x, test=False):
        """
        Args:
            x (ndarray or chainer.Variable): An observation
            test (bool): a flag indicating whether it is in test mode
        """
        h = F.tanh(self.l0(x))
        h = F.tanh(self.l1(h))
        return chainerrl.action_value.DiscreteActionValue(self.l2(h))

obs_size = env.observation_space.shape[0]
n_actions = env.action_space.n
q_func = QFunction(obs_size, n_actions)
# Use Adam to optimize q_func. eps=1e-2 is for stability.
optimizer = chainer.optimizers.Adam(eps=1e-2)
optimizer.setup(q_func)
# Set the discount factor that discounts future rewards.
gamma = 0.95

# Use epsilon-greedy for exploration
explorer = chainerrl.explorers.ConstantEpsilonGreedy(
    epsilon=0.3, random_action_func=env.action_space.sample)

# DQN uses Experience Replay.
# Specify a replay buffer and its capacity.
replay_buffer = chainerrl.replay_buffer.ReplayBuffer(capacity=10 ** 6)

# Since observations from CartPole-v0 is numpy.float64 while
# Chainer only accepts numpy.float32 by default, specify
# a converter as a feature extractor function phi.
phi = lambda x: x.astype(np.float32, copy=False)

# Now create an agent that will interact with the environment.
agent = chainerrl.agents.DoubleDQN(
    q_func, optimizer, replay_buffer, gamma, explorer,
    replay_start_size=500, update_interval=1,
    target_update_interval=100, phi=phi)
# Start virtual display
from pyvirtualdisplay import Display
display = Display(visible=0, size=(1024, 768))
display.start()
import os
os.environ["DISPLAY"] = ":" + str(display.display) + "." + str(display.screen)
agent.load('agent')
frames = []
for i in range(3):
    obs = env.reset()
    done = False
    R = 0
    t = 0
    while not done and t < 200:
        frames.append(env.render(mode = 'rgb_array'))
        action = agent.act(obs)
        obs, r, done, _ = env.step(action)
        R += r
        t += 1
    print('test episode:', i, 'R:', R)
    agent.stop_episode()
env.render()

import matplotlib.pyplot as plt
import matplotlib.animation
import numpy as np
from IPython.display import HTML

plt.figure(figsize=(frames[0].shape[1] / 72.0, frames[0].shape[0] / 72.0), dpi = 72)
patch = plt.imshow(frames[0])
plt.axis('off')
animate = lambda i: patch.set_data(frames[i])
ani = matplotlib.animation.FuncAnimation(plt.gcf(), animate, frames=len(frames), interval = 50)
HTML(ani.to_jshtml())

Since windows are quite different, I will write them together in Reinforcement Learning 12.

A small summary of up to 10. The chainerrl quickstart was generally good, with some mines here and there. Is chainerrl a wrapper for chainer? It's easy to remodel, and I think it's excellent. I will use tensorflow in the future, but for the time being, I think I will use chainerrl. Up to about 30, I will do OpenAI gym.

The reason for chainer is that I have high expectations for prefix Networks. There is a system in the United States that rewards researchers such as Google with a large amount of money, but there are few in Japan. An unexplored project that pays research funds as an incubator also has an hourly wage of 1600 yen. The preffered internship is 2500 yen. Moreover, there are various allowances. Here is their seriousness. And the benchmark is always high. I'm looking forward to it in the future.

Recommended Posts

Reinforcement learning 10 Try using a trained neural network.
Reinforcement learning 8 Try using Chainer UI
Try to build a deep learning / neural network with scratch
Try building a neural network in Python without using a library
Try using TensorFlow-Part 2-Convolutional Neural Network (MNIST)
Learning neural networks using Chainer-Creating a Web API server
Implementation of a convolutional neural network using only Numpy
Implement a 3-layer neural network
Reinforcement learning 5 Try programming CartPole?
Rank learning using neural network (Implementation of RankNet by Chainer)
Reinforcement learning 13 Try Mountain_car with ChainerRL.
Implementation of a two-layer neural network 2
What is a Convolutional Neural Network?
I implemented a two-layer neural network
I tried reinforcement learning using PyBrain
Creating a learning model using MNIST
Try to edit a new image using the trained StyleGAN2 model
Try to make a blackjack strategy by reinforcement learning ((1) Implementation of blackjack)
Survivor prediction using kaggle's titanic neural network [80.8%]
Compose with a neural network! Run Magenta
Implementation of 3-layer neural network (no learning)
Try using a stochastic programming language (Pyro)
Implementation of "blurred" neural network using Chainer
Simple neural network implementation using Chainer-Data preparation-
Python & Machine Learning Study Memo ③: Neural Network
Let's try neural machine translation using Transformer
Try using platypus, a multipurpose optimization library
Simple neural network implementation using Chainer-Model description-
Try OpenAI's standard reinforcement learning algorithm PPO
I made an image discrimination (cifar10) model using a convolutional neural network.
Reinforcement learning 11 Try OpenAI acrobot with ChainerRL.
[Deep Learning from scratch] Initial value of neural network weight using sigmoid function
[Python / Machine Learning] Why Deep Learning # 1 Perceptron Neural Network
A story about simple machine learning using TensorFlow
Experiment with various optimization algorithms with a neural network
Learning neural networks using the genetic algorithm (GA)
Another style conversion method using Convolutional Neural Network
Visualize the inner layer of a neural network
Try using a QR code on a Raspberry Pi
Try using Sourcetrail, a source code visualization tool
Try using Jupyter Notebook of Azure Machine Learning
Try using Tkinter
[Introduction] Reinforcement learning
Try using docker-py
Try using cookiecutter
Try using PDFMiner
Try to make a blackjack strategy by reinforcement learning (② Register the environment in gym)
Try using geopandas
Try using Selenium
Try using scipy
Future reinforcement learning_2
Future reinforcement learning_1
Try using pandas.DataFrame
Try using django-swiftbrowser
Try using matplotlib
Try using tf.metrics
Try using PyODE
[Deep Learning from scratch] Initial value of neural network weight when using Relu function
Parametric Neural Network
Train MNIST data with a neural network in PyTorch
Try creating a compressed file using Python and zlib