DeepMind's framework for reinforcement learning Acme (a) Introductory article about research framework for reinforcement learning)
(As usual, the Japanese reprint of the English article posted on my blog.)
The code (part of) of the framework for reinforcement learning that DeepMind researchers actually use on a daily basis is published as OSS.
Acme provides a simple learning loop API, which is roughly coded as follows.
Learning loop
loop = acme.EnvironmentLoop(environment, agent)
loop.run()
Acme supports TensorFlow (* specific nightly builds only) and JAX as networks. ..
I also use Reverb for experience playback. (My Commentary Article 1 and Commentary Article 2)
The Technical Report describes it as "Acme: A Research Framework for Distributed Reinforcement Learning", but [FAQ](https: / As far as I read (/github.com/deepmind/acme/blob/master/docs/faq.md), unfortunately the distributed learning agent has not been released, and it seems that the time to release it has not been decided.
It is published with the package name "dm-acme" in PyPI.
There are five options that can be specified: "jax", "tf", "envs", "reverb", and "testing".
Personally, I think that "reverb" for experience reproduction is required, and either "tf" or "jax" is required.
Especially for TensorFlow, 'tf-nightly == 2.4.0.dev20200708'
is specified and the compatibility of the environment is thrown away, so install it in a quiet and clean environment with the Acme installation option. recommend.
By specifying "envs", the environment for reinforcement learning such as "dm-control" and "gym" is installed. ("dm-control" requires a license from MuJoCo)
Install TensorFlow version
pip install dm-acme[reverb,tf]
Installation JAX version
pip install dm-acme[reverb,jax]
You can use an environment that conforms to the DeepMind Environment API. The big difference from gym.Env
is that member functions such asstep (action)
return their own class, dm_env.TimeStep
, instead of tuple
. In the TimeStep
class, in addition to the observed values and rewards, the steps (transitions) are at the beginning (StepType.FIRST
), middle (StepType.MID
), and end (StepType.LAST
) of the episode. It also holds information about which one it is.
ʻAcme.wrappers.GymWrapper is provided for
gym.Env`, and you can use it with Acme just by wrapping it.
gym.Use Env
import acme
import gym
env = acme.wrappers.GymWrapper(gym.make("MountainCarContinuous-v0"))
The implemented agents can be found at here. As of August 2020, 11 agents are available.
These implementations are implemented by extending ʻacme.agents.agent.Agent`, so when implementing a custom algorithm, it is (probably) possible to extend it as well.
The agents provided above are DeepMind's deep learning libraries Sonnet (TensorFlow) or Haiku. I am using dm-haiku) (JAX). Therefore, in order to define the internal network of the implemented agent (as far as I checked briefly), it is assumed that a deep learning model is constructed and passed by Sonnet or Haiku. (Of course, it has nothing to do with the interface, so it does not matter when implementing it independently.)
Acme provides a Quickstart Notebook (https://github.com/deepmind/acme/blob/master/examples/quickstart.ipynb). Also, if you have a MuJoCo license, you have a [Tutorial Notebook](https://github.com/deepmind/acme/blob/master/examples/ You can also try tutorial.ipynb).
We investigated the outline of Acme, a framework for reinforcement learning published by DeepMind. The same code that first-class companies usually use is provided (albeit partly), which is very interesting, including future developments.
However, the version of TensorFlow is fixed to a certain build at nightly, and it seems that it is a little difficult for general users to use.
I would like to continue to watch the situation.
Code for displaying the Gym environment on a notebook, such as the code written in scratch in Quickstart Is packaged and provided as Gym-Notebook-Wrapper. (Explanatory article) If you are interested in this, please use it.
Recommended Posts