An introductory article about the library TF2RL for reinforcement learning developed by a friend @ohtake_i. I am also helping out (issue response, PR creation). (It is no exaggeration to say that my replay buffer library cpprb was also created for TF2RL.)

1.First of all

As the name suggests, it is a library for reinforcement learning written in TensorFlow 2 system. The TensorFlow 1 series had some parts that were difficult to get to, such as Session and placeholder, but since it is quite easy to write in the TensorFlow 2 series, "TensorFlow is difficult to write, so [PyTorch](https:: //pytorch.org/) I want people who think "choice" to see it.

2. Implemented algorithm

The latest status can be found in the [README] on GitHub (https://github.com/keiohta/tf2rl#algorithms). As of August 26, 2020, the following algorithms have been implemented. (We plan to increase it from time to time.)

VPG
PPO
DQN
- DDQN
- Prior. DQN
- Duel. DQN
- Distrib. DQN
- Noisy DQN
DDPG (TD3, BiResDDPG ) Including)
SAC
MPC
GAIL
GAIfO
VAIL (including Spectral Normalization)

Some algorithms also support ApeX and GAE.

3. Installation method

The version of TensorFlow should support from 2.0 to the latest 2.3. Due to the version of TensorFlow, TensorFlow is not installed as a dependent library by default, so it will be pip or conda. Please install it as / latest /). (Of course, the GPU version is also OK. Since 2.1, the PyPI binary is no longer distinguished between the CPU / GPU version, so I think that the chances of worrying about it will decrease in the future. But)

`For pip`


pip install tensorflow

`For conda`


conda install -c anaconda tensorflow

TF2RL is public on PyPI, so you can install it with pip.

pip install tf2rl

4. How to use

The following is a code example using DDPG described in README. Build an algorithm agent and pass it to Trainer along with the environment (gym.Env) to learn according to the algorithm.

`Example of Pendulum in DDPG`


import gym
from tf2rl.algos.ddpg import DDPG
from tf2rl.experiments.trainer import Trainer


parser = Trainer.get_argument()
parser = DDPG.get_argument(parser)
args = parser.parse_args()

env = gym.make("Pendulum-v0")
test_env = gym.make("Pendulum-v0")
policy = DDPG(
    state_shape=env.observation_space.shape,
    action_dim=env.action_space.high.size,
    gpu=-1,  # Run on CPU. If you want to run on GPU, specify GPU number
    memory_capacity=10000,
    max_action=env.action_space.high[0],
    batch_size=32,
    n_warmup=500)
trainer = Trainer(policy, env, args, test_env=test_env)
trainer()

You can check the learning results on the TensorBoard.

tensorboard --logdir results

Some parameters can be passed as command line options at program runtime via ʻargparse`.

5. Challenges and future

Originally it was supposed to execute scripts & commands, so Trainer is tightly bound to ʻargparseand works well in notebook environments such as [Google Colab](https://colab.research.google.com/). Cannot be executed. (Since agents other thanTrainerwork without any problem, it is possible to write a loop for using only the model & learning by scratch.) I'd like to put a scalpel inTrainer` and do something about it.

in conclusion

For some reason, it seems that there are many people who seem to be Chinese who actively give feedback. I think that the number of Japanese users will increase, so I would be very happy if you could give me feedback (issue, PR) after using it.

TF2RL: Reinforcement learning library for TensorFlow2.x