--Since the character string you want to copy is written on the tape, copy it using move and write. --Every time it succeeds, the character string to be copied becomes long.
https://gym.openai.com/envs/Copy-v0
-Get 25 or more rewards in the last 100 trials.
--1.0 if you can copy correctly, -0.5 if you make a mistake
When I read the code,
https://github.com/openai/gym/blob/master/gym/envs/algorithmic/copy_.py
The action space is
Tuple(Discrete(2), Discrete(2), Discrete(5))
--Tuple 1st: 1 if you want to go to the right of the tape, 0 if left --Tuple 2nd: 1 for writing --Third tuple: Value to be written converted to a number (represented by a number from 1 to 5)
--Five letters A to E (represented by numbers 1 to 5)
Discrete(6)
――It would be nice to push the previous state into the next action as it is, but it is probably different because reinforcement learning is not used.
import numpy as np
import gym
from gym import wrappers
def run():
env = gym.make('Copy-v0')
env = wrappers.Monitor(env, '/tmp/copy-v0', force=True)
Gs = []
for episode in range(1000):
x = env.reset()
G = 0
for t in range(100):
a = (1,1, x)
x, r, done, _ = env.step(a)
G += r
if done:
Gs.append(G)
break
score = np.mean(Gs[-100:])
print("Episode: %3d, Score: %.3f" % (episode, score))
if score > 25:
break
if __name__ == "__main__":
run()
References