Deep Reinforcement Learning 3 Practical Edition: Breakout

Aidemy 2020/11/22

Introduction

Hello, it is Yope! I'm a crunchy literary school, but I was interested in the possibilities of AI, so I went to the AI-specialized school "Aidemy" to study. I would like to share the knowledge gained here with you, and I am summarizing it on Qiita. I am very happy that many people have read the previous summary article. Thank you! This is the third post of deep reinforcement learning. Nice to meet you.

What to learn this time ・ ・

Reinforcement learning practice by breaking blocks

Creating an environment

-Create an environment with the same method __ (gym.make ()) __ as in Chapter 2. In case of breakout, specify __ "BreakoutDeterministic-v4" __ as an argument. -The number of actions can be confirmed by __ "env.action_space.n" __.

·code スクリーンショット 2020-11-20 14.52.49.png

Model building

・ Here, a multi-layer neural network is constructed. Input is __ "4 frames of breakout screen" __. Also, in order to reduce the amount of calculation, the image is resized to __grayscale 84 × 84 pixels __. -The model uses Sequential (). As in Chapter 2, smooth the input with __Flatten () __, add the fully connected layer with Dense, and the activation function with Activation. -Since this time we are inputting an image (two-dimensional), we will use __ "Convolution2D ()" __, which is a two-dimensional convolution layer. The first argument is __ "filter" __, which specifies the number of dimensions __ of the output space, and the second argument is __ "kernel_size" __, the width and height of the collapsed __ window. Specify __. __ "strides" __ specifies the stride, that is, the width and height of the __ window that moves at one time __.

·code スクリーンショット 2020-11-20 15.21.21.png

History and policy settings

-Similar to Chapter 2, set __History __ and __Measures __ required to create an agent. -Use __ "SequentialMemory ()" __ for history. Specify limit and window_length as arguments. -Use __ "BoltzmannQPolicy ()" __ when using the Boltzmann policy, and __ "EpsGreedyQPolicy ()" __ when using the ε-greedy method. -Also, when changing the parameter ε to Linear, use __ "LinearAnnealedPolicy ()" __. When the argument is specified as shown in the code below, it means that the parameter ε is transformed into a linear line with a maximum of 1.0 and a minimum of 0.1 in 10 steps during training, and fixed at 0.05 during testing.

·code スクリーンショット 2020-11-20 15.59.01.png

Agent settings

-An agent can be created by passing model, memory, policy, nb_actions, nb_steps_warmup to the argument of __ "DQNAgent ()" __. After that, you can specify the learning method with __ "dqn.compile ()" __. Specify __optimization algorithm __ in the first argument and __ evaluation function __ in the second argument.

・ Code![Screenshot 2020-11-20 16.08.36.png](https://qiita-image-store.s3.ap-northeast-1.amazonaws.com/0/698700/22bcf494-f1c6-0c09- 0a70-8f2b885c4b41.png)

Implementation of learning

・ After completing the settings in the previous section, learn using the DQN algorithm. __ "dqn.fit ()" __, specify the environment in the first argument and how many steps to learn in "nb_steps" in the second argument. -Also, the learning result can be saved in hdf5 format with __ "dqn.save_weights ()" __. Specify the file name in the first argument and whether to enable overwriting with "overwrite" in the second argument.

·code スクリーンショット 2020-11-21 11.15.16.png

Conducting the test

-Test with a trained agent. __ Do with "dqn.test ()" __. The argument is the same as fit, and the number of episodes "nb_episodes" is specified instead of the number of steps nb_steps. ・ By the way, in this breakout, it is one episode until the ball is dropped.

Dueling DQN

What is Dueling DQN?

-__Dueling DQN (DDQN) __ is an advanced version of DQN, which is a modification of the end of the DQN network layer. -In DQN, the Q value was output via the fully connected layer after the first three "convolution layers", but DDQN divides this __ fully connected layer into two , while the __ state. Outputs the value V and the other output action A. By finding the Q value from the last fully connected layer that takes these two as inputs, the performance is higher than DQN.

・ Figure![Screenshot 2020-11-21 11.32.40.png](https://qiita-image-store.s3.ap-northeast-1.amazonaws.com/0/698700/eaf40627-b0cc-5f53- 1bfe-5bf6e6330637.png)

Implementation of Dueling DQN

-The implementation of Dueling DQN is the same as DQN until the addition of layers. It can be implemented by setting __ "enable_dueling_network = True" __ in the argument to __ (DQNAgent ()) __ when setting the agent and specifying the Q value calculation method __ "dueling_type" __. You can specify __ "'avg','max','naive'" __ for dueling_type.

・ Code![Screenshot 2020-11-21 12.12.37.png](https://qiita-image-store.s3.ap-northeast-1.amazonaws.com/0/698700/1d6b9e57-b793-07c2- 684c-856785593a98.png)

・ Result![Screenshot 2020-11-21 12.13.16.png](https://qiita-image-store.s3.ap-northeast-1.amazonaws.com/0/698700/3445992d-30e4-50c0- defc-754ab85a2bb4.png)

Summary

-Even if you break a block, you can define the environment as in Chapter 2. -For model construction, this time we are using 2D image recognition, so we will use convolution. __ Use the "Convolution 2D" __ layer. -This policy uses the ε-greedy method, but the parameter ε needs to be changed linearly. In such a case, use __ "LinearAnnealedPolicy ()" __ to change __ linearly __. -Models that have been trained can be saved in hdf5 format by using __ "dqn.save_weights ()" __. -DuelingDQN is a DQN that divides the __total bond into two __, calculates the state value V and the action A respectively, and obtains the Q value from the two in the last layer. Implementation should specify __ "enable_dueling_network" __ and __ "dueling_type" __ in __DQNAgent () __.

Recommended Posts

Deep Reinforcement Learning 3 Practical Edition: Breakout
Reinforcement learning 1 introductory edition
Deep Reinforcement Learning 1 Introduction to Reinforcement Learning
Deep reinforcement learning 2 Implementation of reinforcement learning
Deep Learning
Introduction to Deep Learning ~ Dropout Edition ~
Learn while making! Deep reinforcement learning_1
Reinforcement learning to learn from zero to deep
Deep learning from scratch (forward propagation edition)
<Course> Deep Learning Day4 Reinforcement Learning / Tensor Flow
[Introduction] Reinforcement learning
Deep Learning Memorandum
Future reinforcement learning_1
Python Deep Learning
Deep learning × Python
First Deep Learning ~ Struggle ~
Stock investment by deep reinforcement learning (policy gradient method) (1)
Reinforcement learning 1 Python installation
Reinforcement learning 3 OpenAI installation
Python: Deep Learning Practices
Deep learning / activation functions
Deep Learning from scratch
Deep learning 1 Practice of deep learning
Reinforcement learning for tic-tac-toe
Deep learning / cross entropy
First Deep Learning ~ Preparation ~
First Deep Learning ~ Solution ~
I tried deep learning
[Reinforcement learning] Bandit task
Deep learning large-scale technology
Python + Unity Reinforcement Learning (Learning)
Deep learning / softmax function
Automatic composition by deep learning (Stacked LSTM edition) [DW Day 6]
Reinforcement learning 18 Colaboratory + Acrobat + ChainerRL
Deep Learning from scratch 1-3 chapters
Try deep learning with TensorFlow
Reinforcement learning 7 Learning data log output
Play with reinforcement learning with MuZero
Deep Learning Gaiden ~ GPU Programming ~
Reinforcement learning 17 Colaboratory + CartPole + ChainerRL
Reinforcement learning 28 colaboratory + OpenAI + chainerRL
Reinforcement learning 19 Colaboratory + Mountain_car + ChainerRL
Reinforcement learning 2 Installation of chainerrl
Deep learning image recognition 1 theory
[Reinforcement learning] Tracking by multi-agent
Deep running 2 Tuning of deep learning
Reinforcement learning 6 First Chainer RL
Reinforcement learning starting with Python
Reinforcement learning 20 Colaboratory + Pendulum + ChainerRL
Deep learning / LSTM scratch code
Rabbit Challenge Deep Learning 1Day
<Course> Deep Learning: Day1 NN
Deep Kernel Learning with Pyro
Reinforcement learning 5 Try programming CartPole?
Reinforcement learning 9 ChainerRL magic remodeling
Deep learning for compound formation?
Introducing Udacity Deep Learning Nanodegree
Practical machine learning system memo
Subjects> Deep Learning: Day3 RNN
Introduction to Deep Learning ~ Learning Rules ~
Supervised learning (regression) 2 Advanced edition