Hello core business is the people not a programmer.

When the keyword "deep learning" is buzzing on TV See Robot Control with Distributed Deep Reinforcement Learning | Preferred Research I wanted to try it, so I didn't call it a clone, but made a simple one first. ⇒ Storage: DeepQNetworkTest

the aim

Python for the first time! Chainer is also the first time! I don't even know how to program, but there are no software shops around! But I want to let the self-propelled machine do reinforcement learning! ⇒ If you publish it for the time being, some people may tell you

There are really few cases (I feel) of moving a machine that has inertia or something. ⇒Let's put it in the next step

What i did

-Clone something like ConvNetJS Deep Q Learning Reinforcement Learning with Neural Network demo --I made a GUI with Python! --I implemented Deep Q Learning (e-greedy)! (It's closer to copying sutras, and it's not deep at all)

Program overview

environment

Red apples and poison rings are lined up in a garden surrounded by an outer frame and an inner frame. Artificial intelligence wants to eat a lot of red apples and doesn't want to eat poisoned apples.

Remains the original settings. Reinterpreting from the perspective of controlling mobile objects Red apple is the destination you want to guide The way to avoid poisoned apples

Outer frame and inner wall

An obstacle that blocks the movement and field of vision of Artificial Intelligence. Artificial intelligence likes to have an open view.

Red apple / poisoned apple

When you hit a red apple, you will be rewarded. Poisoned apples are punishable.

Artificial intelligence

A blue dot with a 300px field of view 120 ° forward. --The field of view consists of nine eyes, blocked by apples and walls (only the closest one can be seen). --Continue to move at a constant speed ――The actions you can take are turn right, turn right, go straight, turn left, turn left. ――I want to go straight when the field of view is wide

Reinforcement learning

Neural net

I'm using Relu with 59 inputs, 50 hidden layers x 2 and 5 outputs (as original)

Learning

Mini-batch learning by stocking 30,000 experiences. I often see how to learn. I haven't done anything fashionable, such as using Double DQN or LSTM.

What I was able to do / strange place

――Artificial intelligence learns little by little and begins to eat red apples ――I like to stick to the wall strangely ――It seems that poisoned apples are also actively going to eat, but is ε-greedy ε out?

It may be necessary to adjust the action as a reward. It must be that the progress of learning is not illustrated.

I want you to tell me and help me! 2016/04/22

How to use Numpy

Chainer memo 11 When the speed does not come out on GPU --studylog / Northern clouds

For those who usually use numpy for crunching, this code is not possible at the level of blowing tea, but until a while ago I was often mixed with such a code.

There is a thing, but this article itself is about cupy, Even if I limit it to numpy, I don't know how to do it, so Isn't this strange about how to write that it will be faster? I would like to know if there is any.

`DQN001.py`


        memsize     = self.eMem.shape[0]
        batch_index = np.random.permutation(memsize)[:self.batch_num]
        batch       = np.array(self.eMem[batch_index], dtype=np.float32).reshape(self.batch_num, -1)

        x = Variable(batch[:,0:STATE_DIM])
        targets = self.model.predict(x).data.copy()

        for i in range(self.batch_num):
            #[ state..., action, reward, seq_new]
            a = int(batch[i,STATE_DIM])
            r = batch[i, STATE_DIM+1]

            new_seq= batch[i,(STATE_DIM+2):(STATE_DIM*2+2)]

            targets[i,a]=( r + self.gamma * np.max(self.get_action_value(new_seq)))

        t = Variable(np.array(targets, dtype=np.float32).reshape((self.batch_num,-1)))

Should we consider an implementation that can convert the inside of a for loop into a vector operation?

How to use wxPython

I'm not sure about the parent-child relationship between Frame and Panel and how to handle the device context (dc). I want to add a graph at the bottom of the screen (waiting for construction) ⇒ wxPython: Simultaneous drawing of animation and graph drawing --Qiita

After that: distributed learning and graphs were added

It looks like fireflies are flying.

reference

ConvNetJS Deep Q Learning Reinforcement Learning with Neural Network demo -Learning with an inverted pendulum DQN (Deep Q Network) --Qiita

Caution

This article will be added or rewritten little by little

Try with Chainer Deep Q Learning --Launch