I tried to create a reinforcement learning environment for Othello with Open AI gym

Introduction

I created an environment for reinforcement learning of Othello with OpenAI gym. I hope it will be helpful for those who want to create an environment for reinforcement learning in the future. The learning algorithm is not implemented yet. I will study from now on. Click here for the code https://github.com/pigooosuke/gym_reversi

By default, gym / envs contains various learning environments. By the way, in board games, there are Go and hex. This time, I created it with reference to these codes.

Creation procedure

  1. Create an original learning environment under gym / envs /
  2. Register the created environment in gym / envs / \ _ \ _ init \ _ \ _. Py with default values It will be the flow.

The created Env can be called as follows.

import gym
env = gym.make('Reversi8x8-v0')

Env file

Class description

Created class ReversiEnv Basically, it is necessary to write code centering on 5 methods in Env. _step Advance the number of steps by one (Output the player's hand and opponent's hand and check if the game is over)

_reset Load Env defaults (Loading the board, first and second attack, etc.)

_render Illustrate the Status of Env (image, RGB, text are set) (Displays the status of stones on the board)

_close Discard all Env information (Unused this time)

_seed Used to determine action by random seed (Setting)

class init

The initial value is player_color: Player stone color (black first) opponent: opponent's strategy (random this time) observation_type: State encoding (unnecessary settings? Maybe you can delete it. Declaration that status is managed by numpy3c. I have left it for the time being) illegal_place_mode: Penalties for misplay (losing, etc.) board_size: board size (8 this time)

Is set.

action action decides which action to take against Env. Since it is an 8x8 board, 0-63 is the position of the striker, 64 is the end, and 65 is the pass. It is an image that introduces the output of reinforcement learning in the action part

done Later in the step process, you need to check if the game is finished as a result of this step. --There is no place to put stones. ――One of the players has become a single stone. If the condition is met, Returns a reward at the end of the game.

reward The evaluation method is set to win or lose 1, -1.

Game end confirmation

def game_finished(board):
    # Returns 1 if player 1 wins, -1 if player 2 wins and 0 otherwise
    d = board.shape[-1]

    player_score_x, player_score_y = np.where(board[0, :, :] == 1)
    player_score = len(player_score_x)
    opponent_score_x, opponent_score_y = np.where(board[1, :, :] == 1)
    opponent_score = len(opponent_score_x)
    if player_score == 0:
        return -1
    elif opponent_score == 0:
        return 1
    else:
        free_x, free_y = np.where(board[2, :, :] == 1)
        if free_x.size == 0:
            if player_score > (d**2)/2:
                return 1
            elif player_score == (d**2)/2:
                return 1
            else:
                return -1
        else:
            return 0
    return 0

Failure story

At first, I did not set any rules at all, I set 0-63 in action in any status (I can put stones anywhere) and tried to learn the rules themselves, but it converged by learning the first and second moves I couldn't learn well because I ended up limiting the value of action.

Check candidates for stone placement

python


def get_enable_to_actions(board, player_color):
    actions=[]
    d = board.shape[-1]
    opponent_color = 1 - player_color
    for pos_x in range(d):
        for pos_y in range(d):
            if (board[2, pos_x, pos_y]==0):
                continue
            for dx in [-1, 0, 1]:
                for dy in [-1, 0, 1]:
                    if(dx == 0 and dy == 0):
                        continue
                    nx = pos_x + dx
                    ny = pos_y + dy
                    n = 0
                    if (nx not in range(d) or ny not in range(d)):
                        continue
                    while(board[opponent_color, nx, ny] == 1):
                        tmp_nx = nx + dx
                        tmp_ny = ny + dy
                        if (tmp_nx not in range(d) or tmp_ny not in range(d)):
                            break
                        n += 1
                        nx += dx
                        ny += dy
                    if(n > 0 and board[player_color, nx, ny] == 1):
                        actions.append(pos_x*8+pos_y)
    if len(actions)==0:
        actions = [d**2 + 1]
    return actions

Recommended Posts

I tried to create a reinforcement learning environment for Othello with Open AI gym
I tried to create a table only with Django
I want to climb a mountain with reinforcement learning
When I tried to create a virtual environment with Python, it didn't work
I tried to create a button for Slack with Raspberry Pi + Tact Switch
I tried to automatically create a report with Markov chain
I tried to create a bot for PES event notification
I tried to divide with a deep learning language model
I tried to build an environment for machine learning with Python (Mac OS X)
I tried to make Othello AI that I learned 7.2 million hands by deep learning with Chainer
I tried to create a server environment that runs on Windows 10
I tried to create a list of prime numbers with python
I tried deep reinforcement learning (Double DQN) for tic-tac-toe with ChainerRL
I tried to create Bulls and Cows with a shell program
I tried to make a strange quote for Jojo with LSTM
I tried to create a linebot (implementation)
I tried to make Othello AI with tensorflow without understanding the theory of machine learning ~ Introduction ~
I tried to make Othello AI with tensorflow without understanding the theory of machine learning ~ Implementation ~
I tried to make a face diagnosis AI for a female professional golfer ①
I tried to create a program to convert hexadecimal numbers to decimal numbers with python
I tried to make a face diagnosis AI for a female professional golfer ②
I tried to create a plug-in with HULFT IoT Edge Streaming [Development] (2/3)
I tried to create a plug-in with HULFT IoT Edge Streaming [Execution] (3/3)
[Outlook] I tried to automatically create a daily report email with Python
I tried to create a plug-in with HULFT IoT Edge Streaming [Setup] (1/3)
I tried to build a Mac Python development environment with pythonz + direnv
I tried to predict next year with AI
I tried to make AI for Smash Bros.
[Mac] I tried reinforcement learning with OpenAI Baselines
I tried to make Othello AI with tensorflow without understanding the theory of machine learning ~ Battle Edition ~
I want to create a nice Python development environment for my new Mac
I tried to easily create a fully automatic attendance system with Selenium + Python
I tried to extract a line art from an image with Deep Learning
I tried to create an environment to check regularly using Selenium with AWS Fargate
A machine learning beginner tried to create a sheltie judgment AI in one day
I tried to create a model with the sample of Amazon SageMaker Autopilot
Create a dataset of images to use for learning
I tried to move machine learning (ObjectDetection) with TouchDesigner
I tried to draw a route map with Python
[Go + Gin] I tried to build a Docker environment
I tried to automatically generate a password with Python3
I want to manually create a legend with matplotlib
I tried a simple RPA for login with selenium
Create a machine learning environment from scratch with Winsows 10
I tried to build an environment with WSL + Ubuntu + VS Code in a Windows environment
I tried to make a real-time sound source separation mock with Python machine learning
Try to make a blackjack strategy by reinforcement learning (② Register the environment in gym)
How to quickly create a machine learning environment using Jupyter Notebook with UbuntuServer 16.04 LTS
[Python] I tried to automatically create a daily report of YWT with Outlook mail
Steps to quickly create a deep learning environment on Mac with TensorFlow and OpenCV
I tried to make creative art with AI! I programmed a novelty! (Paper: Creative Adversarial Network)
I tried to create a class to search files with Python's Glob method in VBA
I tried scraping food recall information with Python to create a pandas data frame
I tried to implement a volume moving average with Quantx
I want to start a jupyter environment with one command
Create an environment for "Deep Learning from scratch" with Docker
I tried using Tensorboard, a visualization tool for machine learning
I tried to solve a combination optimization problem with Qiskit
I tried to get started with Hy ・ Define a class
Mayungo's Python Learning Episode 3: I tried to print numbers with print
I want to use a virtual environment with jupyter notebook!