DeepMind Reinforcement Learning Framework Acme

DeepMind's framework for reinforcement learning Acme (a) Introductory article about research framework for reinforcement learning)

(As usual, the Japanese reprint of the English article posted on my blog.)

1.First of all

The code (part of) of the framework for reinforcement learning that DeepMind researchers actually use on a daily basis is published as OSS.

Acme provides a simple learning loop API, which is roughly coded as follows.

Learning loop


loop = acme.EnvironmentLoop(environment, agent)
loop.run()

Acme supports TensorFlow (* specific nightly builds only) and JAX as networks. ..

I also use Reverb for experience playback. (My Commentary Article 1 and Commentary Article 2)

The Technical Report describes it as "Acme: A Research Framework for Distributed Reinforcement Learning", but [FAQ](https: / As far as I read (/github.com/deepmind/acme/blob/master/docs/faq.md), unfortunately the distributed learning agent has not been released, and it seems that the time to release it has not been decided.

2. Installation

It is published with the package name "dm-acme" in PyPI.

There are five options that can be specified: "jax", "tf", "envs", "reverb", and "testing". Personally, I think that "reverb" for experience reproduction is required, and either "tf" or "jax" is required. Especially for TensorFlow, 'tf-nightly == 2.4.0.dev20200708' is specified and the compatibility of the environment is thrown away, so install it in a quiet and clean environment with the Acme installation option. recommend.

By specifying "envs", the environment for reinforcement learning such as "dm-control" and "gym" is installed. ("dm-control" requires a license from MuJoCo)

Install TensorFlow version


pip install dm-acme[reverb,tf]

Installation JAX version


pip install dm-acme[reverb,jax]

3. Environment

You can use an environment that conforms to the DeepMind Environment API. The big difference from gym.Env is that member functions such asstep (action)return their own class, dm_env.TimeStep, instead of tuple. In the TimeStep class, in addition to the observed values and rewards, the steps (transitions) are at the beginning (StepType.FIRST), middle (StepType.MID), and end (StepType.LAST) of the episode. It also holds information about which one it is.

ʻAcme.wrappers.GymWrapper is provided for gym.Env`, and you can use it with Acme just by wrapping it.

gym.Use Env


import acme
import gym

env = acme.wrappers.GymWrapper(gym.make("MountainCarContinuous-v0"))

4. Agent

The implemented agents can be found at here. As of August 2020, 11 agents are available.

These implementations are implemented by extending ʻacme.agents.agent.Agent`, so when implementing a custom algorithm, it is (probably) possible to extend it as well.

The agents provided above are DeepMind's deep learning libraries Sonnet (TensorFlow) or Haiku. I am using dm-haiku) (JAX). Therefore, in order to define the internal network of the implemented agent (as far as I checked briefly), it is assumed that a deep learning model is constructed and passed by Sonnet or Haiku. (Of course, it has nothing to do with the interface, so it does not matter when implementing it independently.)

5. Tutorial

Acme provides a Quickstart Notebook (https://github.com/deepmind/acme/blob/master/examples/quickstart.ipynb). Also, if you have a MuJoCo license, you have a [Tutorial Notebook](https://github.com/deepmind/acme/blob/master/examples/ You can also try tutorial.ipynb).

6. Conclusion

We investigated the outline of Acme, a framework for reinforcement learning published by DeepMind. The same code that first-class companies usually use is provided (albeit partly), which is very interesting, including future developments.

However, the version of TensorFlow is fixed to a certain build at nightly, and it seems that it is a little difficult for general users to use.

I would like to continue to watch the situation.

Digression

Code for displaying the Gym environment on a notebook, such as the code written in scratch in Quickstart Is packaged and provided as Gym-Notebook-Wrapper. (Explanatory article) If you are interested in this, please use it.

Recommended Posts

DeepMind Reinforcement Learning Framework Acme
[Reinforcement Learning] Bakusei: DeepMind Experience Replay Framework Reverb
[Introduction] Reinforcement learning
Future reinforcement learning_2
Future reinforcement learning_1
Reinforcement learning 1 Python installation
Reinforcement learning 3 OpenAI installation
Reinforcement learning for tic-tac-toe
[Reinforcement learning] Bandit task
Reinforcement learning 1 introductory edition
Reinforcement learning 18 Colaboratory + Acrobat + ChainerRL
Reinforcement learning 7 Learning data log output
Reinforcement learning 17 Colaboratory + CartPole + ChainerRL
Reinforcement learning 28 colaboratory + OpenAI + chainerRL
Reinforcement learning 2 Installation of chainerrl
[Reinforcement learning] Tracking by multi-agent
Reinforcement learning starting with Python
Reinforcement learning 20 Colaboratory + Pendulum + ChainerRL
Reinforcement learning 5 Try programming CartPole?
Reinforcement learning 9 ChainerRL magic remodeling
Reinforcement learning Learn from today
Deep Reinforcement Learning 1 Introduction to Reinforcement Learning
Deep reinforcement learning 2 Implementation of reinforcement learning
Reinforcement learning: Accelerate Value Iteration
[Reinforcement learning] DeepMind Experience Replay library Reverb usage survey [Client edition]
Reinforcement learning 21 Colaboratory + Pendulum + ChainerRL + A2C
TF2RL: Reinforcement learning library for TensorFlow2.x
Reinforcement learning 34 Make continuous Agent videos
Reinforcement learning 13 Try Mountain_car with ChainerRL.
Python + Unity Reinforcement learning environment construction
Reinforcement learning 22 Colaboratory + CartPole + ChainerRL + A3C
Reinforcement learning 8 Try using Chainer UI
Reinforcement learning 24 Colaboratory + CartPole + ChainerRL + ACER
Deep Reinforcement Learning 3 Practical Edition: Breakout
I tried reinforcement learning using PyBrain
Learn while making! Deep reinforcement learning_1