Solve Copy-v0 of OpenAI Gym

Task

--Since the character string you want to copy is written on the tape, copy it using move and write. --Every time it succeeds, the character string to be copied becomes long.

https://gym.openai.com/envs/Copy-v0

Clearing conditions

-Get 25 or more rewards in the last 100 trials.

Reward

--1.0 if you can copy correctly, -0.5 if you make a mistake

Data structure

When I read the code,

https://github.com/openai/gym/blob/master/gym/envs/algorithmic/copy_.py

The action space is

Tuple(Discrete(2), Discrete(2), Discrete(5))

--Tuple 1st: 1 if you want to go to the right of the tape, 0 if left --Tuple 2nd: 1 for writing --Third tuple: Value to be written converted to a number (represented by a number from 1 to 5)

State space

--Five letters A to E (represented by numbers 1 to 5)

Discrete(6)

solution

――It would be nice to push the previous state into the next action as it is, but it is probably different because reinforcement learning is not used.

code

import numpy as np
import gym
from gym import wrappers

def run():
    env = gym.make('Copy-v0')
    env = wrappers.Monitor(env, '/tmp/copy-v0', force=True)
    Gs = []
    for episode in range(1000):
        x = env.reset()
        G = 0
        for t in range(100):
            a = (1,1, x)
            x, r, done, _ = env.step(a)
            G += r
            if done:
                Gs.append(G)
                break
        score = np.mean(Gs[-100:])
        print("Episode: %3d, Score: %.3f" % (episode, score))
        if score > 25:
            break


if __name__ == "__main__":
    run()

References

LEARNING SIMPLE ALGORITHMS FROM EXAMPLES, Zaremba et al., 2016.
OpenAI Gym, Brockman et al., 2016.