Last time, there were many requests for learning Super Nintendo software, so this time I will share a program for learning NES that runs on Macbook etc.
I uploaded it to Git so that it can work in the experiment. Please Star if you want to download! https://github.com/tsunaki00/super_mario
I tested it in the following machine environment.
environment | |
---|---|
PC | Macbook PRO 2016 OSX |
CPU | i7 |
MEM | 16GB |
development language | python |
Learning with Tensorflow is made appropriately, so please improve it.
When I run it, I get the following error and I need to fix the library.
$ python3 start.py
Traceback (most recent call last):
File "start.py", line 22, in <module>
import gym_pull
File "/usr/local/lib/python3.6/site-packages/gym_pull/__init__.py", line 41, in <module>
import gym_pull.monitoring.monitor
File "/usr/local/lib/python3.6/site-packages/gym_pull/monitoring/monitor.py", line 10, in <module>
class Monitor(gym.monitoring.monitor.Monitor):
AttributeError: module 'gym.monitoring' has no attribute 'monitor'
↓ Corrected below
$ vi /usr/local/lib/python3.6/site-packages/gym_pull/monitoring/monitor.py
:
:
class Monitor(gym.monitoring.monitor.Monitor):
↓
class Monitor(gym.monitoring.monitor_manager.MonitorManager):
The learning method is Reinforcement Learning. You will learn to evaluate the actions you have performed, which is a little different from supervised learning and unsupervised learning.
There was a description of DQN in the article below. History of DQN + Deep Q-Network written in Chainer
We were able to make the fastest Mario as shown below by letting the evaluation learn the results of our own execution! Ratings are distance, time and score.
[[Artificial Intelligence] I tried to let AIVA play Mario![WORLD1-1]] https://www.youtube.com/watch?v=T4dO1GKPx4Y
――Action should be narrowed down properly. --The array is arranged so that it can be used for mini-batch processing, so change it as appropriate. ――It will be even better if you add randomness once in a while!
import tensorflow as tf
import gym
import gym_pull
import ppaquette_gym_super_mario
from gym.wrappers import Monitor
import random
import numpy as np
class Game :
def __init__(self):
self.episode_count = 10000;
## select stage
self.env = gym.make('ppaquette/SuperMarioBros-1-1-Tiles-v0')
def weight_variable(self, shape):
initial = tf.truncated_normal(shape, stddev = 0.01)
return tf.Variable(initial)
def bias_variable(self, shape):
initial = tf.constant(0.01, shape = shape)
return tf.Variable(initial)
def conv2d(self, x, W, stride):
return tf.nn.conv2d(x, W, strides = [1, stride, stride, 1], padding = "SAME")
def max_pool_2x2(self, x):
return tf.nn.max_pool(x, ksize = [1, 2, 2, 1], strides = [1, 2, 2, 1], padding = "SAME")
def create_network(self, action_size):
#Make it two layers
W_conv1 = self.weight_variable([8, 8, 1, 16])
b_conv1 = self.bias_variable([16])
W_conv2 = self.weight_variable([4, 4, 16, 32])
b_conv2 = self.bias_variable([32])
W_conv3 = self.weight_variable([4, 4, 32, 64])
b_conv3 = self.bias_variable([64])
W_fc1 = self.weight_variable([512, action_size])
b_fc1 = self.bias_variable([action_size])
s = tf.placeholder("float", [None, 13, 16, 1])
# hidden layers
h_conv1 = tf.nn.relu(self.conv2d(s, W_conv1, 2) + b_conv1)
h_conv2 = tf.nn.relu(self.conv2d(h_conv1, W_conv2, 2) + b_conv2)
h_conv3 = tf.nn.relu(self.conv2d(h_conv2, W_conv3, 1) + b_conv3)
h_conv3_flat = tf.reshape(h_conv3, [-1, 512])
readout = tf.matmul(h_conv3_flat, W_fc1) + b_fc1
return s, readout
def play_game(self) :
action_list = []
for i in range(64) :
command = format(i, 'b')
command = '{0:06d}'.format(int(command))
actions = []
for cmd in list(command) :
actions.append(int(cmd))
action_list.append(actions)
sess = tf.InteractiveSession()
s, readout = self.create_network(len(action_list))
a = tf.placeholder("float", [None, len(action_list)])
y = tf.placeholder("float", [None, 1])
readout_action = tf.reduce_sum(tf.multiply(readout, a), reduction_indices = 1)
cost = tf.reduce_mean(tf.square(y - readout_action))
train_step = tf.train.AdamOptimizer(1e-6).minimize(cost)
saver = tf.train.Saver()
sess.run(tf.initialize_all_variables())
checkpoint = tf.train.get_checkpoint_state("./saved_networks/checkpoints")
if checkpoint and checkpoint.model_checkpoint_path:
saver.restore(sess, checkpoint.model_checkpoint_path)
print ("Successfully loaded:", checkpoint.model_checkpoint_path)
else:
print ("Could not find old network weights")
for episode in range(self.episode_count):
self.env.reset()
total_score = 0
distance = 0
is_finished = False
actions, rewards, images = [], [] ,[]
while is_finished == False :
#Get the dot at the top left of the screen(If you do it with an image, self.env.You can get it on the screen)
screen = np.reshape(self.env.tiles, (13, 16, 1))
if episode < 10 :
action_index = random.randint(0, len(action_list) - 1)
else :
readout_t = readout.eval(feed_dict = {s : [screen]})[0]
action_index = np.argmax(readout_t)
# (1)Processing to Mario on the screen(self.env.step)
obs, reward, is_finished, info = self.env.step(action_list[action_index])
##In an array to make it a Mini Batch
action_array = np.zeros(len(action_list))
action_array[action_index] = 1
actions.append(action_array)
# (2)Give a reward
rewards.append([float(info['distance'])])
images.append(screen)
train_step.run(feed_dict = {
a : actions, y : rewards, s : images
})
print('Episode : ', episode, 'Actions : ', action_list[action_index], 'Rewards', reward)
actions, rewards, images = [], [] ,[]
self.env.render()
saver.save(sess, 'saved_networks/model-dqn', global_step = episode)
if __name__ == '__main__' :
game = Game()
game.play_game()
$ python3 start.py
Docker + browser version will be written separately.
Recently, many technical books such as AI have been published, but I feel that the hurdles are quite high ... I think it's a good idea to practice with something familiar like this!
A separate study session will be held for details on the program for GameAI and Python. Please join us if you like,
Tech Twitter started. We will update it from time to time, so please follow us if you like. https://twitter.com/gauss_club
Recommended Posts