Pybrain is a Python library that implements the mechanism of neural networks.

This time, I did Reinforcement Learning, which is on the tutorial, so I will describe it with the meaning of the memorandum.

Reinforcement learning is a learning control framework that adapts to the environment through trial and error.

Screen Shot 2015-05-12 at 8.20.22.jpg

For example

The new salesman is the agent and the environment is the customer.

Action the action that the new salesman sells again State observation of customer's reaction to sales The reward will be "whether the customer's purchasing motivation has increased".

Since novice salesmen have no sales experience, they cannot know whether the reward, that is, "whether the customer's purchasing motivation has increased" is accurate.

In addition, new salesmen cannot accurately grasp the reaction of customers to sales.

Reinforcement learning used in situations where there is high uncertainty and no teacher data is called POMDP.

Please refer to the following for a detailed explanation (Source: NTT Communication Science Laboratories, Yasuhiro Minami) http://www.lai.kyutech.ac.jp/sig-slud/SLUD63-minami-POMDP-tutorial.pdf

The tutorial below uses MDP, which assumes that the observed state is correct.

MDP (Markov decision process) http://www.orsj.or.jp/~wiki/wiki/index.php/%E3%83%9E%E3%83%AB%E3%82%B3%E3%83%95%E6%B1%BA%E5%AE%9A%E9%81%8E%E7%A8%8B

This time, I practiced a tutorial in Python to clear a maze game using this framework.

Please refer to the following for how to start PyBrain.

https://github.com/pybrain/pybrain/blob/master/docs/documentation.pdf

Install the required libraries.

from scipy import *
import sys, time

from pybrain.rl.environments.mazes import Maze, MDPMazeTask
from pybrain.rl.learners.valuebased import ActionValueTable
from pybrain.rl.agents import LearningAgent
from pybrain.rl.learners import Q, SARSA
from pybrain.rl.experiments import Experiment
from pybrain.rl.environments import Task

Get ready for visualization.

import pylab
pylab.gray()
pylab.ion()

Since the goal of the tutorial is to clear the maze game, we will define the following maze structure.

structure = array([[1, 1, 1, 1, 1, 1, 1, 1, 1],
                   [1, 0, 0, 1, 0, 0, 0, 0, 1],
                   [1, 0, 0, 1, 0, 0, 1, 0, 1],
                   [1, 0, 0, 1, 0, 0, 1, 0, 1],
                   [1, 0, 0, 1, 0, 1, 1, 0, 1],
                   [1, 0, 0, 0, 0, 0, 1, 0, 1],
                   [1, 1, 1, 1, 1, 1, 1, 0, 1],
                   [1, 0, 0, 0, 0, 0, 0, 0, 1],
                   [1, 1, 1, 1, 1, 1, 1, 1, 1]])

Define the maze structure as an environment. I will pass you the maze structure and the final goal you defined earlier.

environment = Maze(structure, (7, 7))

Next, define the action of the agent. The agent action is now an agent with a table of values, with 81 states and 4 actions. And initialize the state of the agent.

81 state: Because the maze structure is 9x9 structure 4 actions: Because up, down, right, down actions are possible

Interfaces for action definition include ActionValueTable and ActionValueNetwork.

ActionValueTable: Used for discrete actions ActionValueNetwork: Used for continuous actions

controller = ActionValueTable(81, 4)
controller.initialize(1.)

Defines how the agent learns. Define the agent's actions to be optimized for rewards using Q-learning.

learner = Q()
agent = LearningAgent(controller, learner)

Define the tasks that connect the agent to the environment.

task = MDPMazeTask(environment)

I actually practice reinforcement learning 100 times with the code below, and plot the position of the agent again for each practice.


experiment = Experiment(task, agent)

while True:
    experiment.doInteractions(100)
    agent.learn()
    agent.reset()

    pylab.pcolor(controller.params.reshape(81,4).max(1).reshape(9,9))
    pylab.draw()
    pylab.show()

That's it.

I tried reinforcement learning using PyBrain

Pybrain is a Python library that implements the mechanism of neural networks.

Reinforcement learning is a learning control framework that adapts to the environment through trial and error.

This time, I practiced a tutorial in Python to clear a maze game using this framework.