An introductory article about the library TF2RL for reinforcement learning developed by a friend @ohtake_i. I am also helping out (issue response, PR creation). (It is no exaggeration to say that my replay buffer library cpprb was also created for TF2RL.)
As the name suggests, it is a library for reinforcement learning written in TensorFlow 2 system.
The TensorFlow 1 series had some parts that were difficult to get to, such as Session
and placeholder
, but since it is quite easy to write in the TensorFlow 2 series, "TensorFlow is difficult to write, so [PyTorch](https:: //pytorch.org/) I want people who think "choice" to see it.
The latest status can be found in the [README] on GitHub (https://github.com/keiohta/tf2rl#algorithms). As of August 26, 2020, the following algorithms have been implemented. (We plan to increase it from time to time.)
Some algorithms also support ApeX and GAE.
The version of TensorFlow should support from 2.0 to the latest 2.3. Due to the version of TensorFlow, TensorFlow is not installed as a dependent library by default, so it will be pip or conda. Please install it as / latest /). (Of course, the GPU version is also OK. Since 2.1, the PyPI binary is no longer distinguished between the CPU / GPU version, so I think that the chances of worrying about it will decrease in the future. But)
For pip
pip install tensorflow
For conda
conda install -c anaconda tensorflow
TF2RL is public on PyPI, so you can install it with pip.
pip install tf2rl
The following is a code example using DDPG described in README.
Build an algorithm agent and pass it to Trainer
along with the environment (gym.Env
) to learn according to the algorithm.
Example of Pendulum in DDPG
import gym
from tf2rl.algos.ddpg import DDPG
from tf2rl.experiments.trainer import Trainer
parser = Trainer.get_argument()
parser = DDPG.get_argument(parser)
args = parser.parse_args()
env = gym.make("Pendulum-v0")
test_env = gym.make("Pendulum-v0")
policy = DDPG(
state_shape=env.observation_space.shape,
action_dim=env.action_space.high.size,
gpu=-1, # Run on CPU. If you want to run on GPU, specify GPU number
memory_capacity=10000,
max_action=env.action_space.high[0],
batch_size=32,
n_warmup=500)
trainer = Trainer(policy, env, args, test_env=test_env)
trainer()
You can check the learning results on the TensorBoard.
tensorboard --logdir results
Some parameters can be passed as command line options at program runtime via ʻargparse`.
Originally it was supposed to execute scripts & commands, so Trainer
is tightly bound to ʻargparseand works well in notebook environments such as [Google Colab](https://colab.research.google.com/). Cannot be executed. (Since agents other than
Trainerwork without any problem, it is possible to write a loop for using only the model & learning by scratch.) I'd like to put a scalpel in
Trainer` and do something about it.
For some reason, it seems that there are many people who seem to be Chinese who actively give feedback. I think that the number of Japanese users will increase, so I would be very happy if you could give me feedback (issue, PR) after using it.
Recommended Posts