Aidemy　2020/11/22

Introduction

Hello, it is Yope! I'm a crunchy literary school, but I was interested in the possibilities of AI, so I went to the AI-specialized school "Aidemy" to study. I would like to share the knowledge gained here with you, and I am summarizing it on Qiita. I am very happy that many people have read the previous summary article. Thank you! This is the second post of deep reinforcement learning. Nice to meet you.

This article is a summary of what you learned in "Aidemy" "in your own words". It may contain mistakes and misunderstandings. Please note.

What to learn this time ・ Implementation of reinforcement learning

Implementation of reinforcement learning

Environment creation

-In the Chapter of "Reinforcement Learning", the environment etc. was defined by myself, but this time we will create the environment etc. using the library that prepares various environments for reinforcement learning. -The library to be used is __ "keras-rl" __, which consists of Keras and __OpenAIGym (Gym) __. This time, we will use this to learn Cartpole demo with DQN.

・ First, create environment. The method is just __ "env = gym.make ()" __. Specify the type of environment in the argument. The environment of the cart pole is specified as __ "" CartPole-v0 "" __. After that, it can be operated by accessing the env instance. -In this cart pole, there are two actions, __ "move the cart to the right" and "move the cart to the left" __, and to get this __ "env.action_space.n" You can do it with __.

-Code![Screenshot 2020-11-19 10.31.55.png](https://qiita-image-store.s3.ap-northeast-1.amazonaws.com/0/698700/d10ad3fb-5063-56c5- aaf1-453352008e27.png)

Model building

-Once the environment is created, use Keras functions to build a __multilayer neural network __. The model is built with the Sequential model. Next, __Flatten () __ transforms a multidimensional input into one dimension. For the input shape "input_shape", use __ "env.observation_space.shape" __ to specify the current state of the cart pole. -Add layers with __ "model.add ()" __. The fully connected layer is Dense, and the activation function is specified by Activation (). Specify "relu", "linear", etc. as arguments.

-Code![Screenshot 2020-11-19 10.32.22.png](https://qiita-image-store.s3.ap-northeast-1.amazonaws.com/0/698700/a8af8cb2-4c1f-3956- 62f8-26e9086e06ff.png)

Agent settings 1 History and measures

-Here, the __agent setting __, which is the main body of reinforcement learning, is performed. First, set the __history __ and policy __ required for this setting. (History is "History of what you did in the past") - History __ can be set with __ "Sequential Memory (limit, window_length)" __. limit is the number of memories to store. -For policy, use __ "BoltzmannQPolicy ()" __ when taking the Boltzmann policy, and use __ "EpsGreedyQPolicy ()" __ when taking the ε-greedy method.

-Code![Screenshot 2020-11-19 10.41.15.png](https://qiita-image-store.s3.ap-northeast-1.amazonaws.com/0/698700/b834e4f7-d691-531d- e28c-98eda7808306.png)

Agent settings 2

-Set up an agent using the history and measures in the previous section. Call __ "DQNAgent ()" __ that implements the DQN algorithm and give the following arguments. -In the arguments, model __ "model" __, history __ "memory" __, policy __ "policy" __, number of actions __ "nb_actions" __, how many steps at the beginning are not used for reinforcement learning Set the specified __ "nb_steps_warmup" __. -If you put the above in a variable called "dqn", specify the agent learning method with __ "dqn.compile ()" __. The __optimization function __ can be specified as the first argument, and the evaluation function represented by metrics can be specified as the second argument.

・ Code (use the one in the previous section for model etc.)![Screenshot 2020-11-19 11.12.57.png](https://qiita-image-store.s3.ap-northeast-1.amazonaws.com/ 0/698700 / ff94997e-3e16-3d04-961d-1f9fd389a025.png)

Conducting the test

-To train the dqn agent in the previous section, use __ "dqn.fit ()" __. Arguments are environment (env in the code), number of episodes __ "nb_steps" __, whether to visualize __ "visualize" __, and whether to output logs __ "verbose" __ is there. -If you let the agent learn, do this test. The test runs the agent and evaluates how much reward it actually gets. This can be done with __ "dqn.test ()" __. The argument is the same as "dqn.fit ()", but only the number of episodes is __ "nb_episodes" __.

Summary

-Use keras-rl when performing reinforcement learning using the library. -After creating the environment with __ "gym.make ()" __, create a model and add layers. Also, set __history __ and __policy __ and use these to create an agent. -Train the created dqn agent with __ "dqn.fit ()" __ and test it with __ "dqn.test ()" __.

This time is over. Thank you for reading this far.

Deep reinforcement learning 2 Implementation of reinforcement learning