Hello core business is the people not a programmer.
When the keyword "deep learning" is buzzing on TV See Robot Control with Distributed Deep Reinforcement Learning | Preferred Research I wanted to try it, so I didn't call it a clone, but made a simple one first. ⇒ Storage: DeepQNetworkTest
Python for the first time! Chainer is also the first time! I don't even know how to program, but there are no software shops around! But I want to let the self-propelled machine do reinforcement learning! ⇒ If you publish it for the time being, some people may tell you
There are really few cases (I feel) of moving a machine that has inertia or something. ⇒Let's put it in the next step
-Clone something like ConvNetJS Deep Q Learning Reinforcement Learning with Neural Network demo --I made a GUI with Python! --I implemented Deep Q Learning (e-greedy)! (It's closer to copying sutras, and it's not deep at all)
Red apples and poison rings are lined up in a garden surrounded by an outer frame and an inner frame. Artificial intelligence wants to eat a lot of red apples and doesn't want to eat poisoned apples.
An obstacle that blocks the movement and field of vision of Artificial Intelligence. Artificial intelligence likes to have an open view.
When you hit a red apple, you will be rewarded. Poisoned apples are punishable.
A blue dot with a 300px field of view 120 ° forward. --The field of view consists of nine eyes, blocked by apples and walls (only the closest one can be seen). --Continue to move at a constant speed ――The actions you can take are turn right, turn right, go straight, turn left, turn left. ――I want to go straight when the field of view is wide
I'm using Relu with 59 inputs, 50 hidden layers x 2 and 5 outputs (as original)
Mini-batch learning by stocking 30,000 experiences. I often see how to learn. I haven't done anything fashionable, such as using Double DQN or LSTM.
――Artificial intelligence learns little by little and begins to eat red apples ――I like to stick to the wall strangely ――It seems that poisoned apples are also actively going to eat, but is ε-greedy ε out?
It may be necessary to adjust the action as a reward. It must be that the progress of learning is not illustrated.
Chainer memo 11 When the speed does not come out on GPU --studylog / Northern clouds
For those who usually use numpy for crunching, this code is not possible at the level of blowing tea, but until a while ago I was often mixed with such a code.
There is a thing, but this article itself is about cupy, Even if I limit it to numpy, I don't know how to do it, so Isn't this strange about how to write that it will be faster? I would like to know if there is any.
DQN001.py
memsize = self.eMem.shape[0]
batch_index = np.random.permutation(memsize)[:self.batch_num]
batch = np.array(self.eMem[batch_index], dtype=np.float32).reshape(self.batch_num, -1)
x = Variable(batch[:,0:STATE_DIM])
targets = self.model.predict(x).data.copy()
for i in range(self.batch_num):
#[ state..., action, reward, seq_new]
a = int(batch[i,STATE_DIM])
r = batch[i, STATE_DIM+1]
new_seq= batch[i,(STATE_DIM+2):(STATE_DIM*2+2)]
targets[i,a]=( r + self.gamma * np.max(self.get_action_value(new_seq)))
t = Variable(np.array(targets, dtype=np.float32).reshape((self.batch_num,-1)))
Should we consider an implementation that can convert the inside of a for loop into a vector operation?
I'm not sure about the parent-child relationship between Frame and Panel and how to handle the device context (dc). I want to add a graph at the bottom of the screen (waiting for construction) ⇒ wxPython: Simultaneous drawing of animation and graph drawing --Qiita
It looks like fireflies are flying.
This article will be added or rewritten little by little
Recommended Posts