A Japanese reprint of the blog article written in English.
On May 26th, DeepMind released Reverb as a framework for Experience Replay in reinforcement learning. (Reference)
Reverb is an efficient and easy-to-use data storage and transport system designed for machine learning research. Reverb is primarily used as an experience replay system for distributed reinforcement learning algorithms but the system also supports multiple data structure representations such as FIFO, LIFO, and priority queues.
Reverb is an efficient and easy-to-use data storage and data transfer system designed for machine learning research. Reverb is primarily used for experience replay for distributed reinforcement learning algorithms, but the system supports data structures such as FIFOs, LIFOs, and weighted queues. (Up to here translated by the author)
DeepMind's reinforcement learning framework Acme (A research framework for reinforcement learning) uses this Reverb. (For Acme, take another opportunity)
As of June 26th, as of writing the article, Reverb officially states that it only supports Linux-based operating systems and is not at the production use level.
A development version of TensorFlow is required and can be installed from PyPI with the following command
pip install tf-nightly==2.3.0.dev20200604 dm-reverb-nightly
Reverb uses a server-client approach, and I feel its naming rules are closer to database terminology than other Replay Buffer implementations.
The sample code on the server side is as follows.
import reverb
server = reverb.Server(tables=[
reverb.Table(
name='my_table',
sampler=reverb.selectors.Uniform(),
remover=reverb.selectors.Fifo(),
max_size=100,
rate_limiter=reverb.rate_limiters.MinSize(1)),
],
port=8000
)
In this example, a normal Replay Buffer with a capacity of 100
(uniform sampling, overwriting from oldest to newest) listens on port 8000
. reverb.rate_limiters.MinSize (1)
means to block any sampling requests until at least 1
items are in.
sampler
/ remove
)As you can see in the above example, Reverb allows you to specify the element sampling and deletion (overwriting) logic independently.
The logic supported by Reverb is implemented in reverb.selectors
and has the following:
Prioritized
: Randomly selected according to priorityFifo
: Select the oldest dataLifo
: Select the newest dataMinHeap
: Select the data with the lowest priorityMaxHeap
: Select the data with the highest priorityrate_limiter
)The rate_limiter
argument can set the conditions for using the Replay Buffer.
The conditions supported by Reverb are implemented in reverb.rate_limiters
and
There are the following
MinSize
: Set the minimum number of items that can be sampledSampleToInsertRatio
: Set the average ratio between data insertion (update) and data sampleQueue
: Extracted exactly once before being overwritten (for FIFO)Stack
: Ejected exactly once before being overwritten (for LIFO)Looking at the comments in the source code, reverb.rate_limiters.Queue
and reverb.rate_limiters.Stack
are not recommended for direct use, instead the static methods reverb.Table.queue
and
reverb.Table.queue and
It sets sampler
, remove
, and rate_limiter
appropriately so that reverb.Table.stack` is a Replay Buffer with FIFO and LIFO logic, respectively.
The sample code of the client program is below
import reverb
client = reverb.Client('localhost:8000') #If the server and client are the same machine
# [0,1]State(observation)Priority 1.Example of putting in Replay Buffer with 0
client.insert([0, 1], priorities={'my_table': 1.0})
#After sampling, the generator is returned
client.sample('my_table', num_samples=2))
Reverb supports data save / load functions. By executing the following code from the client, the data in the current server is saved on the file and the saved file path can be obtained.
checkpoint_path = client.checkpoint()
The state of the original data can be restored by creating a server using the saved data.
It should be noted that the tables
argument of the constructor does not ** specify exactly the same as the original server that stored the data ** at your own risk.
checkpointer = reverb.checkpointers.DefaultCheckpointer(path=checkpoint_path)
server = reverb.Server(tables=[...], checkpointer=checkpointer)
DeepMind's new framework for Experience Replay, Reverb, hasn't reached a stable version yet, but I felt it was promising for flexible and large-scale reinforcement learning.
For my Experience Replay library cpprb, a huge rival suddenly emerged, but in smaller reinforcement learning experiments, cpprb I think there are some parts that are more convenient and easier to use. (See Past Qiita Articles)
(Update: 2020.6.29) I researched and wrote How to use the client!
Recommended Posts