DeepMind's framework for reinforcement learning Acme (a) Introductory article about research framework for reinforcement learning)

(As usual, the Japanese reprint of the English article posted on my blog.)

1.First of all

The code (part of) of the framework for reinforcement learning that DeepMind researchers actually use on a daily basis is published as OSS.

Acme provides a simple learning loop API, which is roughly coded as follows.

`Learning loop`


loop = acme.EnvironmentLoop(environment, agent)
loop.run()

Acme supports TensorFlow (* specific nightly builds only) and JAX as networks. ..

I also use Reverb for experience playback. (My Commentary Article 1 and Commentary Article 2)

The Technical Report describes it as "Acme: A Research Framework for Distributed Reinforcement Learning", but [FAQ](https: / As far as I read (/github.com/deepmind/acme/blob/master/docs/faq.md), unfortunately the distributed learning agent has not been released, and it seems that the time to release it has not been decided.

2. Installation

It is published with the package name "dm-acme" in PyPI.

There are five options that can be specified: "jax", "tf", "envs", "reverb", and "testing". Personally, I think that "reverb" for experience reproduction is required, and either "tf" or "jax" is required. Especially for TensorFlow, 'tf-nightly == 2.4.0.dev20200708' is specified and the compatibility of the environment is thrown away, so install it in a quiet and clean environment with the Acme installation option. recommend.

By specifying "envs", the environment for reinforcement learning such as "dm-control" and "gym" is installed. ("dm-control" requires a license from MuJoCo)

`Install TensorFlow version`


pip install dm-acme[reverb,tf]

`Installation JAX version`


pip install dm-acme[reverb,jax]

3. Environment

You can use an environment that conforms to the DeepMind Environment API. The big difference from gym.Env is that member functions such asstep (action)return their own class, dm_env.TimeStep, instead of tuple. In the TimeStep class, in addition to the observed values and rewards, the steps (transitions) are at the beginning (StepType.FIRST), middle (StepType.MID), and end (StepType.LAST) of the episode. It also holds information about which one it is.

ʻAcme.wrappers.GymWrapper is provided for gym.Env`, and you can use it with Acme just by wrapping it.

`gym.Use Env`


import acme
import gym

env = acme.wrappers.GymWrapper(gym.make("MountainCarContinuous-v0"))

4. Agent

The implemented agents can be found at here. As of August 2020, 11 agents are available.

Continuous value control
- Deep Deterministic Policy Gradient (DDPG)
- Distributed Distributional Deep Determinist (D4PG)
- Maximum a posteriori Policy Optimisation (MPO)
- Distributional Maximum a posteriori Policy Optimisation (DMPO)
Discrete value control
- Deep Q-Networks (DQN)
- Importance-Weighted Actor-Learner Architectures (IMPALA)
- Recurrent Replay Distributed DQN (R2D2)
Batch RL
- Behavior Cloning (BC)
Learning from demonstrations
- Deep Q-Learning from Demonstrations (DQfD)
- Recurrent Replay Distributed DQN from Demonstratinos (R2D3)
Model-based RL
- Monte-Carlo Tree Search (MCTS)

These implementations are implemented by extending ʻacme.agents.agent.Agent`, so when implementing a custom algorithm, it is (probably) possible to extend it as well.

The agents provided above are DeepMind's deep learning libraries Sonnet (TensorFlow) or Haiku. I am using dm-haiku) (JAX). Therefore, in order to define the internal network of the implemented agent (as far as I checked briefly), it is assumed that a deep learning model is constructed and passed by Sonnet or Haiku. (Of course, it has nothing to do with the interface, so it does not matter when implementing it independently.)

5. Tutorial

Acme provides a Quickstart Notebook (https://github.com/deepmind/acme/blob/master/examples/quickstart.ipynb). Also, if you have a MuJoCo license, you have a [Tutorial Notebook](https://github.com/deepmind/acme/blob/master/examples/ You can also try tutorial.ipynb).

6. Conclusion

We investigated the outline of Acme, a framework for reinforcement learning published by DeepMind. The same code that first-class companies usually use is provided (albeit partly), which is very interesting, including future developments.

However, the version of TensorFlow is fixed to a certain build at nightly, and it seems that it is a little difficult for general users to use.

I would like to continue to watch the situation.

Digression

Code for displaying the Gym environment on a notebook, such as the code written in scratch in Quickstart Is packaged and provided as Gym-Notebook-Wrapper. (Explanatory article) If you are interested in this, please use it.

DeepMind Reinforcement Learning Framework Acme