Releases · DLR-RM/rl-baselines3-zoo

03 Oct 16:13

araffin

v1.6.2

b372e9a

RL-Zoo3 v1.6.2: The RL Zoo is now a package!

Highlights

You can use the RL Zoo from outside, for instance with the experimental Stable Baselines3 Jax version (SBX).

File: train.py (you can use python train.py --algo sbx_tqc --env Pendulum-v1 afterward)

import rl_zoo3
import rl_zoo3.train
from rl_zoo3.train import train

from sbx import TQC

# Add new algorithm
rl_zoo3.ALGOS["sbx_tqc"] = TQC
rl_zoo3.train.ALGOS = rl_zoo3.ALGOS
rl_zoo3.exp_manager.ALGOS = rl_zoo3.ALGOS

if __name__ == "__main__":
    train()

Breaking Changes

RL Zoo is now a python package
low pass filter was removed

New Features

RL Zoo cli: rl_zoo3 train and rl_zoo3 enjoy

Assets 2

30 Sep 12:32

araffin

v1.6.1

8600d80

SB3 v1.6.1: Progress bar and custom yaml file

Breaking Changes

Upgraded to Stable-Baselines3 (SB3) >= 1.6.1
Upgraded to sb3-contrib >= 1.6.1

New Features

Added --yaml-file argument option for train.pyto read hyperparameters from custom yaml files (@JohannesUl)

Bug fixes

Added custom_object parameter on record_video.py (@Affonso-Gui)
Changed optimize_memory_usage to False for DQN/QR-DQN on record_video.py (@Affonso-Gui)
In ExperimentManager _maybe_normalize set training to False for eval envs,
to prevent normalization stats from being updated in eval envs (e.g. in EvalCallback) (@pchalasani).
Only one env is used to get the action space while optimizing hyperparameters and it is correctly closed (@SammyRamone)
Added progress bar via the -P argument using tqdm and rich

Contributors

pchalasani, SammyRamone, and 2 other contributors

Assets 2

17 Aug 15:47

araffin

v1.6.0

89d4e0c

SB3 v1.6.0: Huggingface hub integration, Recurrent PPO (PPO LSTM)

Release 1.6.0 (2022-08-05)

Breaking Changes

Change default value for number of hyperparameter optimization trials from 10 to 500. (@ernestum)
Derive number of intermediate pruning evaluations from number of time steps (1 evaluation per 100k time steps.) (@ernestum)
Updated default --eval-freq from 10k to 25k steps
Update default horizon to 2 for the HistoryWrapper
Upgrade to Stable-Baselines3 (SB3) >= 1.6.0
Upgrade to sb3-contrib >= 1.6.0

New Features

Support setting PyTorch's device with thye --device flag (@Gregwar)
Add --max-total-trials parameter to help with distributed optimization. (@ernestum)
Added vec_env_wrapper support in the config (works the same as env_wrapper)
Added Huggingface hub integration
Added RecurrentPPO support (aka ppo_lstm)
Added autodownload for "official" sb3 models from the hub
Added Humanoid-v3, Ant-v3, Walker2d-v3 models for A2C (@pseudo-rnd-thoughts)
Added MsPacman models

Bug fixes

Fix Reacher-v3 name in PPO hyperparameter file
Pinned ale-py==0.7.4 until new SB3 version is released
Fix enjoy / record videos with LSTM policy
Fix bug with environments that have a slash in their name (@ernestum)
Changed optimize_memory_usage to False for DQN/QR-DQN on Atari games,
if you want to save RAM, you need to deactivate handle_timeout_termination
in the replay_buffer_kwargs

Documentation

Other

When pruner is set to "none", use NopPruner instead of diverted MedianPruner (@qgallouedec)

Contributors

Gregwar, ernestum, and 2 other contributors

Assets 2

25 Mar 14:21

araffin

v1.5.0

2fe4418

SB3 v1.5.0: Support for Weight and Biases experiment tracking

Release 1.5.0 (2022-03-25)

Support for Weight and Biases experiment tracking

Breaking Changes

Upgrade to Stable-Baselines3 (SB3) >= 1.5.0
Upgrade to sb3-contrib >= 1.5.0
Upgraded to gym 0.21

New Features

Verbose mode for each trial (when doing hyperparam optimization) can now be activated using the debug mode (verbose == 2)
Support experiment tracking via Weights and Biases via the --track flag (@vwxyzjn)
Support tracking raw episodic stats via RawStatisticsCallback (@vwxyzjn, see #216)

Bug fixes

Policies saved during during optimization with distributed Optuna load on new systems (@JKTerry)
Fixed script for recording video that was not up to date with the enjoy script

Contributors

JKTerry and vwxyzjn

Assets 2

19 Jan 14:01

araffin

v1.4.0

41983ab

SB3 v1.4.0: TRPO, ARS and multi env training for off-policy algorithms

Breaking Changes

Dropped python 3.6 support
Upgrade to Stable-Baselines3 (SB3) >= 1.4.0
Upgrade to sb3-contrib >= 1.4.0

New Features

Added mujoco hyperparameters
Added MuJoCo pre-trained agents
Added script to parse best hyperparameters of an optuna study
Added TRPO support
Added ARS support and pre-trained agents

Documentation

Replace front image

Assets 2

23 Oct 16:00

araffin

v1.3.0

8607c67

SB3 v1.3.0: rliable plots and bug fixes

WARNING: This version will be the last one supporting Python 3.6 (end of life in Dec 2021). We highly recommended you to upgrade to Python >= 3.7.

Breaking Changes

Upgrade to panda-gym 1.1.1
Upgrade to Stable-Baselines3 (SB3) >= 1.3.0
Upgrade to sb3-contrib >= 1.3.0

New Features

Added support for using rliable for performance comparison

Bug fixes

Fix training with Dict obs and channel last images

Other

Updated docker image
constrained gym version: gym>=0.17,<0.20
Better hyperparameters for A2C/PPO on Pendulum

Assets 2

08 Sep 11:40

araffin

v1.2.0

7b4465b

SB3 v1.2.0

Breaking Changes

Upgrade to Stable-Baselines3 (SB3) >= 1.2.0
Upgrade to sb3-contrib >= 1.2.0

Bug fixes

Fix --load-last-checkpoint (@SammyRamone)
Fix TypeError for gym.Env class entry points in ExperimentManager (@schuderer)
Fix usage of callbacks during hyperparameter optimization (@SammyRamone)

Other

Added python 3.9 to Github CI
Increased DQN replay buffer size for Atari games (@nikhilrayaprolu)

Contributors

schuderer, SammyRamone, and nikhilrayaprolu

Assets 2

02 Jul 10:08

araffin

v1.1.0

96f1a59

SB3 v1.1.0

Breaking Changes

Upgrade to Stable-Baselines3 (SB3) >= 1.1.0
Upgrade to sb3-contrib >= 1.1.0
Add timeout handling (cf SB3 doc)
HER is now a replay buffer class and no more an algorithm
Removed PlotNoiseRatioCallback
Removed PlotActionWrapper
Changed 'lr' key in Optuna param dict to 'learning_rate' so the dict can be directly passed to SB3 methods (@justinkterry)

New Features

Add support for recording videos of best models and checkpoints (@mcres)
Add support for recording videos of training experiments (@mcres)
Add support for dictionary observations
Added experimental parallel training (with utils.callbacks.ParallelTrainCallback)
Added support for using multiple envs for evaluation
Added --load-last-checkpoint option for the enjoy script
Save Optuna study object at the end of hyperparameter optimization and plot the results (plotly package required)
Allow to pass multiple folders to scripts/plot_train.py
Flag to save logs and optimal policies from each training run (@justinkterry)

Bug fixes

Fixed video rendering for PyBullet envs on Linux
Fixed get_latest_run_id() so it works in Windows too (@NicolasHaeffner)
Fixed video record when using HER replay buffer

Documentation

Updated README (dict obs are now supported)

Other

Added is_bullet() to ExperimentManager
Simplify close() for the enjoy script
Updated docker image to include latest black version
Updated TD3 Walker2D model (thanks @modanesh)
Fixed typo in plot title (@scottemmons)
Minimum cloudpickle version added to requirements.txt (@amy12xx)
Fixed atari-py version (ROM missing in newest release)
Updated SAC and TD3 search spaces
Cleanup eval_freq documentation and variable name changes (@justinkterry)
Add clarifying print statement when printing saved hyperparameters during optimization (@justinkterry)
Clarify n_evaluations help text (@justinkterry)
Simplified hyperparameters files making use of defaults
Added new TQC+HER agents
Add panda-gymenvironments (@qgallouedec)

Assets 2

17 Mar 14:28

araffin

v1.0

0378524

Stable-Baselines3 v1.0 - 100+ pre-trained models

Blog post: https://araffin.github.io/post/sb3/

Breaking Changes

Upgrade to SB3 >= 1.0
Upgrade to sb3-contrib >= 1.0

New Features

Added 100+ trained agents + benchmark file
Add support for loading saved model under python 3.8+ (no retraining possible)
Added Robotics pre-trained agents (@sgillen)

Bug fixes

Bug fixes for HER handling action noise
Fixed double reset bug with HER and enjoy script

Documentation

Added doc about plotting scripts

Other

Updated HER hyperparameters

Assets 2

27 Feb 19:33

araffin

v0.11.1

f71d490

Big refactor - SB3 upgrade - Last before v1.0 Pre-release

Pre-release

Breaking Changes

Removed LinearNormalActionNoise
Evaluation is now deterministic by default, except for Atari games
sb3_contrib is now required
TimeFeatureWrapper was moved to the contrib repo
Replaced old plot_train.py script with updated plot_training_success.py
Renamed n_episodes_rollout to train_freq tuple to match latest version of SB3

New Features

Added option to choose which VecEnv class to use for multiprocessing
Added hyperparameter optimization support for TQC
Added support for QR-DQN from SB3 contrib

Bug fixes

Improved detection of Atari games
Fix potential bug in plotting script when there is not enough timesteps
Fixed a bug when using HER + DQN/TQC for hyperparam optimization

Documentation

Improved documentation (@cboettig)

Other

Refactored train script, now uses a ExperimentManager class
Replaced make_env with SB3 built-in make_vec_env
Add more type hints (utils/utils.py done)
Use f-strings when possible
Changed PPO atari hyperparameters (removed vf clipping)
Changed A2C atari hyperparameters (eps value of the optimizer)
Updated benchmark script
Updated hyperparameter optim search space (commented gSDE for A2C/PPO)
Updated DQN hyperparameters for CartPole
Do not wrap channel-first image env (now natively supported by SB3)
Removed hack to log success rate
Simplify plot script

Assets 2

Releases: DLR-RM/rl-baselines3-zoo

RL-Zoo3 v1.6.2: The RL Zoo is now a package!

Highlights

Breaking Changes

New Features

Uh oh!

SB3 v1.6.1: Progress bar and custom yaml file

Breaking Changes

New Features

Bug fixes

Contributors

Uh oh!

SB3 v1.6.0: Huggingface hub integration, Recurrent PPO (PPO LSTM)

Release 1.6.0 (2022-08-05)

Breaking Changes

New Features

Bug fixes

Documentation

Other

Contributors

Uh oh!

SB3 v1.5.0: Support for Weight and Biases experiment tracking

Release 1.5.0 (2022-03-25)

Breaking Changes

New Features

Bug fixes

Contributors

Uh oh!

SB3 v1.4.0: TRPO, ARS and multi env training for off-policy algorithms

Breaking Changes

New Features

Documentation

Uh oh!

SB3 v1.3.0: rliable plots and bug fixes

Breaking Changes

New Features

Bug fixes

Other

Uh oh!

SB3 v1.2.0

Breaking Changes

Bug fixes

Other

Contributors

Uh oh!

SB3 v1.1.0

Breaking Changes

New Features

Bug fixes

Documentation

Other

Uh oh!

Stable-Baselines3 v1.0 - 100+ pre-trained models

Breaking Changes

New Features

Bug fixes

Documentation

Other

Uh oh!

Big refactor - SB3 upgrade - Last before v1.0

Breaking Changes

New Features

Bug fixes

Documentation

Other

Uh oh!