Explore the maze with reinforcement learning

Introduction

This time, I would like to explore the maze using reinforcement learning, especially Q-learning.

Q learning

Overview

To put it simply, a value called Q value is retained for each pair of "state" and "behavior", and the Q value is updated using "reward" or the like. Actions that are more likely to get a positive reward will converge to a higher Q value. In the maze, the squares in the passage correspond to the state, and moving up, down, left, and right corresponds to the action. In other words, it is necessary to keep the Q value in the memory for the number of squares in the passage * the number of action patterns (4 for up, down, left, and right). Therefore, it cannot be easily adapted when there are many "state" and "action" pairs, that is, when the state and action space explodes.

This time, we will deal with the problem that the number of squares in the aisle is 60 and the number of actions that can be taken is four, about 240 in the vertical and horizontal directions.

algorithm

Update Q value

Initially, all Q values are initialized to 0. The Q value is updated every time the action $ a $ is taken in the state $ s_t $.

Q(s_t, a) \leftarrow Q(s_t, a) + \alpha(r_{t+1} + \gamma \max_{p}{Q(s_{t+1}, p)} -Q(s_t, a))

Action selection

This time we will use ε-greedy. Random actions are selected with a small probability of ε, and actions with the maximum Q value are selected with a probability of 1-ε.

Source code

The code has been uploaded to Github. Do it as python map.py. I wrote it about two years ago, but it's pretty terrible.

Experiment

environment

The experimental environment is as shown in the photo below. The light blue square in the lower right is the goal, the square in the upper left is the start, and the blue squares are the learning agents. When you reach the goal, you will receive a positive reward. Also, the black part is the wall and the agent cannot enter. So the agent has no choice but to go through the white passage. The Q value of each cell is initialized to 0, but when the Q value becomes larger than 0, the largest Q value of the four Q values in that cell is the shade of color, and the action is displayed by an arrow. It is a mechanism.

result

The experimental results are posted on youtube. You can see that the Q value is propagated as the agent reaches the goal. IMAGE ALT TEXT HERE

in conclusion

I want to try Q-learning + neural network

Recommended Posts

Explore the maze with reinforcement learning
See the behavior of drunkenness with reinforcement learning
Reinforcement learning starting with Python
Reinforcement learning 13 Try Mountain_car with ChainerRL.
Reinforcement learning in the shortest time with Keras with OpenAI Gym
Validate the learning model with Pylearn2
[Reinforcement learning] DQN with your own library
[Python] Easy Reinforcement Learning (DQN) with Keras-RL
[Reinforcement learning] Search for the best route
Reinforcement learning 11 Try OpenAI acrobot with ChainerRL.
[Introduction] Reinforcement learning
Future reinforcement learning_2
Future reinforcement learning_1
The story of doing deep learning with TPU
Challenge block breaking with Actor-Critic model reinforcement learning
[Mac] I tried reinforcement learning with OpenAI Baselines
Learning Python with ChemTHEATER 03
"Object-oriented" learning with python
Reinforcement learning 1 Python installation
Learning Python with ChemTHEATER 05-1
Reinforcement learning 3 OpenAI installation
Reinforcement learning for tic-tac-toe
Learning Python with ChemTHEATER 02
Reinforcement learning 37 Make an automatic start with Atari's wrapper
Predict the gender of Twitter users with machine learning
[Reinforcement learning] Bandit task
Learning Python with ChemTHEATER 01
Summary of the basic flow of machine learning with Python
Record of the first machine learning challenge with Keras
Python + Unity Reinforcement Learning (Learning)
I investigated the reinforcement learning algorithm of algorithmic trading
Reinforcement learning 1 introductory edition
I learned the basics of reinforcement learning and played with Cart Pole (implementing simple Q Learning)
Recognize your boss and hide the screen with Deep Learning
I captured the Touhou Project with Deep Learning ... I wanted to.
[Introduction to Reinforcement Learning] Reinforcement learning to try moving for the time being
Reinforcement learning 23 Create and use your own module with Colaboratory
Let's move word2vec with Chainer and see the learning progress
Reinforcement learning 18 Colaboratory + Acrobat + ChainerRL
Try deep learning with TensorFlow
Reinforcement learning 7 Learning data log output
Insert the debugger with nose
Reinforcement learning 17 Colaboratory + CartPole + ChainerRL
Ensemble learning summary! !! (With implementation)
Reinforcement learning 28 colaboratory + OpenAI + chainerRL
Kill the process with sudo kill -9
Reinforcement learning 2 Installation of chainerrl
[Reinforcement learning] Tracking by multi-agent
Reinforcement learning 20 Colaboratory + Pendulum + ChainerRL
About learning with google colab
Machine learning with Python! Preparation
Try Deep Learning with FPGA
Reinforcement learning 5 Try programming CartPole?
Reinforcement learning 9 ChainerRL magic remodeling
Reinforcement learning Learn from today
Guess the password with klee
Linux fastest learning with AWS
gethostbyaddr () communicates with the outside
Machine learning Minesweeper with PyTorch
scraping the Nikkei 225 with playwright-python
Check the code with flake8