Skip to content

Releases: DLR-RM/rl-baselines3-zoo

RL-Zoo3 v1.6.2: The RL Zoo is now a package!

03 Oct 16:13
b372e9a

Choose a tag to compare

Highlights

You can now install the RL Zoo via pip: pip install rl-zoo3 and it has a basic command line interface (rl_zoo3 train|enjoy|plot_train|all_plots) that has the same interface as the scripts (train.py|enjoy.py|...).

You can use the RL Zoo from outside, for instance with the experimental Stable Baselines3 Jax version (SBX).

File: train.py (you can use python train.py --algo sbx_tqc --env Pendulum-v1 afterward)

import rl_zoo3
import rl_zoo3.train
from rl_zoo3.train import train

from sbx import TQC

# Add new algorithm
rl_zoo3.ALGOS["sbx_tqc"] = TQC
rl_zoo3.train.ALGOS = rl_zoo3.ALGOS
rl_zoo3.exp_manager.ALGOS = rl_zoo3.ALGOS

if __name__ == "__main__":
    train()

Breaking Changes

  • RL Zoo is now a python package
  • low pass filter was removed

New Features

  • RL Zoo cli: rl_zoo3 train and rl_zoo3 enjoy

SB3 v1.6.1: Progress bar and custom yaml file

30 Sep 12:32

Choose a tag to compare

Breaking Changes

  • Upgraded to Stable-Baselines3 (SB3) >= 1.6.1
  • Upgraded to sb3-contrib >= 1.6.1

New Features

  • Added --yaml-file argument option for train.pyto read hyperparameters from custom yaml files (@JohannesUl)

Bug fixes

  • Added custom_object parameter on record_video.py (@Affonso-Gui)
  • Changed optimize_memory_usage to False for DQN/QR-DQN on record_video.py (@Affonso-Gui)
  • In ExperimentManager _maybe_normalize set training to False for eval envs,
    to prevent normalization stats from being updated in eval envs (e.g. in EvalCallback) (@pchalasani).
  • Only one env is used to get the action space while optimizing hyperparameters and it is correctly closed (@SammyRamone)
  • Added progress bar via the -P argument using tqdm and rich

SB3 v1.6.0: Huggingface hub integration, Recurrent PPO (PPO LSTM)

17 Aug 15:47
89d4e0c

Choose a tag to compare

Release 1.6.0 (2022-08-05)

Breaking Changes

  • Change default value for number of hyperparameter optimization trials from 10 to 500. (@ernestum)
  • Derive number of intermediate pruning evaluations from number of time steps (1 evaluation per 100k time steps.) (@ernestum)
  • Updated default --eval-freq from 10k to 25k steps
  • Update default horizon to 2 for the HistoryWrapper
  • Upgrade to Stable-Baselines3 (SB3) >= 1.6.0
  • Upgrade to sb3-contrib >= 1.6.0

New Features

  • Support setting PyTorch's device with thye --device flag (@Gregwar)
  • Add --max-total-trials parameter to help with distributed optimization. (@ernestum)
  • Added vec_env_wrapper support in the config (works the same as env_wrapper)
  • Added Huggingface hub integration
  • Added RecurrentPPO support (aka ppo_lstm)
  • Added autodownload for "official" sb3 models from the hub
  • Added Humanoid-v3, Ant-v3, Walker2d-v3 models for A2C (@pseudo-rnd-thoughts)
  • Added MsPacman models

Bug fixes

  • Fix Reacher-v3 name in PPO hyperparameter file
  • Pinned ale-py==0.7.4 until new SB3 version is released
  • Fix enjoy / record videos with LSTM policy
  • Fix bug with environments that have a slash in their name (@ernestum)
  • Changed optimize_memory_usage to False for DQN/QR-DQN on Atari games,
    if you want to save RAM, you need to deactivate handle_timeout_termination
    in the replay_buffer_kwargs

Documentation

Other

  • When pruner is set to "none", use NopPruner instead of diverted MedianPruner (@qgallouedec)

SB3 v1.5.0: Support for Weight and Biases experiment tracking

25 Mar 14:21
2fe4418

Choose a tag to compare

Release 1.5.0 (2022-03-25)

Support for Weight and Biases experiment tracking

Breaking Changes

  • Upgrade to Stable-Baselines3 (SB3) >= 1.5.0
  • Upgrade to sb3-contrib >= 1.5.0
  • Upgraded to gym 0.21

New Features

  • Verbose mode for each trial (when doing hyperparam optimization) can now be activated using the debug mode (verbose == 2)
  • Support experiment tracking via Weights and Biases via the --track flag (@vwxyzjn)
  • Support tracking raw episodic stats via RawStatisticsCallback (@vwxyzjn, see #216)

Bug fixes

  • Policies saved during during optimization with distributed Optuna load on new systems (@JKTerry)
  • Fixed script for recording video that was not up to date with the enjoy script

SB3 v1.4.0: TRPO, ARS and multi env training for off-policy algorithms

19 Jan 14:01
41983ab

Choose a tag to compare

Breaking Changes

  • Dropped python 3.6 support
  • Upgrade to Stable-Baselines3 (SB3) >= 1.4.0
  • Upgrade to sb3-contrib >= 1.4.0

New Features

  • Added mujoco hyperparameters
  • Added MuJoCo pre-trained agents
  • Added script to parse best hyperparameters of an optuna study
  • Added TRPO support
  • Added ARS support and pre-trained agents

Documentation

  • Replace front image

SB3 v1.3.0: rliable plots and bug fixes

23 Oct 16:00
8607c67

Choose a tag to compare

WARNING: This version will be the last one supporting Python 3.6 (end of life in Dec 2021). We highly recommended you to upgrade to Python >= 3.7.

Breaking Changes

  • Upgrade to panda-gym 1.1.1
  • Upgrade to Stable-Baselines3 (SB3) >= 1.3.0
  • Upgrade to sb3-contrib >= 1.3.0

New Features

  • Added support for using rliable for performance comparison

Bug fixes

  • Fix training with Dict obs and channel last images

Other

  • Updated docker image
  • constrained gym version: gym>=0.17,<0.20
  • Better hyperparameters for A2C/PPO on Pendulum

SB3 v1.2.0

08 Sep 11:40
7b4465b

Choose a tag to compare

Breaking Changes

  • Upgrade to Stable-Baselines3 (SB3) >= 1.2.0
  • Upgrade to sb3-contrib >= 1.2.0

Bug fixes

  • Fix --load-last-checkpoint (@SammyRamone)
  • Fix TypeError for gym.Env class entry points in ExperimentManager (@schuderer)
  • Fix usage of callbacks during hyperparameter optimization (@SammyRamone)

Other

  • Added python 3.9 to Github CI
  • Increased DQN replay buffer size for Atari games (@nikhilrayaprolu)

SB3 v1.1.0

02 Jul 10:08
96f1a59

Choose a tag to compare

Breaking Changes

  • Upgrade to Stable-Baselines3 (SB3) >= 1.1.0
  • Upgrade to sb3-contrib >= 1.1.0
  • Add timeout handling (cf SB3 doc)
  • HER is now a replay buffer class and no more an algorithm
  • Removed PlotNoiseRatioCallback
  • Removed PlotActionWrapper
  • Changed 'lr' key in Optuna param dict to 'learning_rate' so the dict can be directly passed to SB3 methods (@justinkterry)

New Features

  • Add support for recording videos of best models and checkpoints (@mcres)
  • Add support for recording videos of training experiments (@mcres)
  • Add support for dictionary observations
  • Added experimental parallel training (with utils.callbacks.ParallelTrainCallback)
  • Added support for using multiple envs for evaluation
  • Added --load-last-checkpoint option for the enjoy script
  • Save Optuna study object at the end of hyperparameter optimization and plot the results (plotly package required)
  • Allow to pass multiple folders to scripts/plot_train.py
  • Flag to save logs and optimal policies from each training run (@justinkterry)

Bug fixes

  • Fixed video rendering for PyBullet envs on Linux
  • Fixed get_latest_run_id() so it works in Windows too (@NicolasHaeffner)
  • Fixed video record when using HER replay buffer

Documentation

  • Updated README (dict obs are now supported)

Other

  • Added is_bullet() to ExperimentManager
  • Simplify close() for the enjoy script
  • Updated docker image to include latest black version
  • Updated TD3 Walker2D model (thanks @modanesh)
  • Fixed typo in plot title (@scottemmons)
  • Minimum cloudpickle version added to requirements.txt (@amy12xx)
  • Fixed atari-py version (ROM missing in newest release)
  • Updated SAC and TD3 search spaces
  • Cleanup eval_freq documentation and variable name changes (@justinkterry)
  • Add clarifying print statement when printing saved hyperparameters during optimization (@justinkterry)
  • Clarify n_evaluations help text (@justinkterry)
  • Simplified hyperparameters files making use of defaults
  • Added new TQC+HER agents
  • Add panda-gymenvironments (@qgallouedec)

Stable-Baselines3 v1.0 - 100+ pre-trained models

17 Mar 14:28

Choose a tag to compare

Blog post: https://araffin.github.io/post/sb3/

Breaking Changes

  • Upgrade to SB3 >= 1.0
  • Upgrade to sb3-contrib >= 1.0

New Features

  • Added 100+ trained agents + benchmark file
  • Add support for loading saved model under python 3.8+ (no retraining possible)
  • Added Robotics pre-trained agents (@sgillen)

Bug fixes

  • Bug fixes for HER handling action noise
  • Fixed double reset bug with HER and enjoy script

Documentation

  • Added doc about plotting scripts

Other

  • Updated HER hyperparameters

Big refactor - SB3 upgrade - Last before v1.0

27 Feb 19:33
f71d490

Choose a tag to compare

Breaking Changes

  • Removed LinearNormalActionNoise
  • Evaluation is now deterministic by default, except for Atari games
  • sb3_contrib is now required
  • TimeFeatureWrapper was moved to the contrib repo
  • Replaced old plot_train.py script with updated plot_training_success.py
  • Renamed n_episodes_rollout to train_freq tuple to match latest version of SB3

New Features

  • Added option to choose which VecEnv class to use for multiprocessing
  • Added hyperparameter optimization support for TQC
  • Added support for QR-DQN from SB3 contrib

Bug fixes

  • Improved detection of Atari games
  • Fix potential bug in plotting script when there is not enough timesteps
  • Fixed a bug when using HER + DQN/TQC for hyperparam optimization

Documentation

Other

  • Refactored train script, now uses a ExperimentManager class
  • Replaced make_env with SB3 built-in make_vec_env
  • Add more type hints (utils/utils.py done)
  • Use f-strings when possible
  • Changed PPO atari hyperparameters (removed vf clipping)
  • Changed A2C atari hyperparameters (eps value of the optimizer)
  • Updated benchmark script
  • Updated hyperparameter optim search space (commented gSDE for A2C/PPO)
  • Updated DQN hyperparameters for CartPole
  • Do not wrap channel-first image env (now natively supported by SB3)
  • Removed hack to log success rate
  • Simplify plot script