I tried reinforcement learning using PyBrain

Pybrain is a Python library that implements the mechanism of neural networks.

This time, I did Reinforcement Learning, which is on the tutorial, so I will describe it with the meaning of the memorandum.

Reinforcement learning is a learning control framework that adapts to the environment through trial and error.

Screen Shot 2015-05-12 at 8.20.22.jpg

For example

The new salesman is the agent and the environment is the customer.

Action the action that the new salesman sells again State observation of customer's reaction to sales The reward will be "whether the customer's purchasing motivation has increased".

Since novice salesmen have no sales experience, they cannot know whether the reward, that is, "whether the customer's purchasing motivation has increased" is accurate.

In addition, new salesmen cannot accurately grasp the reaction of customers to sales.

Reinforcement learning used in situations where there is high uncertainty and no teacher data is called POMDP.

Please refer to the following for a detailed explanation (Source: NTT Communication Science Laboratories, Yasuhiro Minami) http://www.lai.kyutech.ac.jp/sig-slud/SLUD63-minami-POMDP-tutorial.pdf

The tutorial below uses MDP, which assumes that the observed state is correct.

MDP (Markov decision process) http://www.orsj.or.jp/~wiki/wiki/index.php/%E3%83%9E%E3%83%AB%E3%82%B3%E3%83%95%E6%B1%BA%E5%AE%9A%E9%81%8E%E7%A8%8B

This time, I practiced a tutorial in Python to clear a maze game using this framework.

Please refer to the following for how to start PyBrain.

https://github.com/pybrain/pybrain/blob/master/docs/documentation.pdf

Install the required libraries.

from scipy import *
import sys, time

from pybrain.rl.environments.mazes import Maze, MDPMazeTask
from pybrain.rl.learners.valuebased import ActionValueTable
from pybrain.rl.agents import LearningAgent
from pybrain.rl.learners import Q, SARSA
from pybrain.rl.experiments import Experiment
from pybrain.rl.environments import Task

Get ready for visualization.

import pylab
pylab.gray()
pylab.ion()

Since the goal of the tutorial is to clear the maze game, we will define the following maze structure.

structure = array([[1, 1, 1, 1, 1, 1, 1, 1, 1],
                   [1, 0, 0, 1, 0, 0, 0, 0, 1],
                   [1, 0, 0, 1, 0, 0, 1, 0, 1],
                   [1, 0, 0, 1, 0, 0, 1, 0, 1],
                   [1, 0, 0, 1, 0, 1, 1, 0, 1],
                   [1, 0, 0, 0, 0, 0, 1, 0, 1],
                   [1, 1, 1, 1, 1, 1, 1, 0, 1],
                   [1, 0, 0, 0, 0, 0, 0, 0, 1],
                   [1, 1, 1, 1, 1, 1, 1, 1, 1]])

Define the maze structure as an environment. I will pass you the maze structure and the final goal you defined earlier.

environment = Maze(structure, (7, 7))

Next, define the action of the agent. The agent action is now an agent with a table of values, with 81 states and 4 actions. And initialize the state of the agent.

81 state: Because the maze structure is 9x9 structure 4 actions: Because up, down, right, down actions are possible

Interfaces for action definition include ActionValueTable and ActionValueNetwork.

ActionValueTable: Used for discrete actions ActionValueNetwork: Used for continuous actions

controller = ActionValueTable(81, 4)
controller.initialize(1.)

Defines how the agent learns. Define the agent's actions to be optimized for rewards using Q-learning.

learner = Q()
agent = LearningAgent(controller, learner)

Define the tasks that connect the agent to the environment.

task = MDPMazeTask(environment)

I actually practice reinforcement learning 100 times with the code below, and plot the position of the agent again for each practice.


experiment = Experiment(task, agent)

while True:
    experiment.doInteractions(100)
    agent.learn()
    agent.reset()

    pylab.pcolor(controller.params.reshape(81,4).max(1).reshape(9,9))
    pylab.draw()
    pylab.show() 

That's it.

Recommended Posts

I tried reinforcement learning using PyBrain
I tried deep learning using Theano
I tried using parameterized
I tried using argparse
I tried using mimesis
I tried using anytree
I tried using Summpy
I tried using coturn
I tried using Pipenv
I tried using matplotlib
I tried using "Anvil".
I tried using Hubot
I tried using ESPCN
I tried using openpyxl
I tried deep learning
I tried using Ipython
I tried using PyCaret
I tried using cron
I tried using ngrok
I tried using face_recognition
I tried using Jupyter
I tried using PyCaret
I tried using Heapq
I tried using doctest
I tried using folium
I tried using jinja2
I tried using folium
I tried using time-window
[Mac] I tried reinforcement learning with OpenAI Baselines
[I tried using Pythonista 3] Introduction
I tried using easydict (memo).
I tried face recognition using Face ++
I tried using Random Forest
I tried using BigQuery ML
[Python] I tried using OpenPose
I tried using magenta / TensorFlow
I tried using AWS Chalice
I tried using Slack emojinator
I tried learning my own dataset using Chainer Trainer
PyTorch Learning Note 2 (I tried using a pre-trained model)
I tried to compress the image using machine learning
I tried using Rotrics Dex Arm # 2
I tried machine learning with liblinear
I tried using Rotrics Dex Arm
I tried using GrabCut of OpenCV
I tried using Tensorboard, a visualization tool for machine learning
Reinforcement learning 8 Try using Chainer UI
[TF] I tried to visualize the learning result using Tensorboard
I tried server-client communication using tmux
Somehow I tried using jupyter notebook
I tried learning LightGBM with Yellowbrick
[Kaggle] I tried undersampling using imbalanced-learn
I tried shooting Kamehameha using OpenPose
I tried using the checkio API
[Python] I tried using YOLO v3
I tried asynchronous processing using asyncio
I tried deep reinforcement learning (Double DQN) for tic-tac-toe with ChainerRL
I tried using Amazon SQS with django-celery
[Reinforcement learning] I implemented / explained R2D3 (Keras-RL)
I tried playing a ○ ✕ game using TensorFlow
I tried using YOUTUBE Data API V3