Releases: DLR-RM/rl-baselines3-zoo
Releases · DLR-RM/rl-baselines3-zoo
RL-Zoo3 v1.6.2: The RL Zoo is now a package!
Highlights
You can now install the RL Zoo via pip: pip install rl-zoo3 and it has a basic command line interface (rl_zoo3 train|enjoy|plot_train|all_plots) that has the same interface as the scripts (train.py|enjoy.py|...).
You can use the RL Zoo from outside, for instance with the experimental Stable Baselines3 Jax version (SBX).
File: train.py (you can use python train.py --algo sbx_tqc --env Pendulum-v1 afterward)
import rl_zoo3
import rl_zoo3.train
from rl_zoo3.train import train
from sbx import TQC
# Add new algorithm
rl_zoo3.ALGOS["sbx_tqc"] = TQC
rl_zoo3.train.ALGOS = rl_zoo3.ALGOS
rl_zoo3.exp_manager.ALGOS = rl_zoo3.ALGOS
if __name__ == "__main__":
train()Breaking Changes
- RL Zoo is now a python package
- low pass filter was removed
New Features
- RL Zoo cli:
rl_zoo3 trainandrl_zoo3 enjoy
SB3 v1.6.1: Progress bar and custom yaml file
Breaking Changes
- Upgraded to Stable-Baselines3 (SB3) >= 1.6.1
- Upgraded to sb3-contrib >= 1.6.1
New Features
- Added
--yaml-fileargument option fortrain.pyto read hyperparameters from custom yaml files (@JohannesUl)
Bug fixes
- Added
custom_objectparameter on record_video.py (@Affonso-Gui) - Changed
optimize_memory_usagetoFalsefor DQN/QR-DQN on record_video.py (@Affonso-Gui) - In
ExperimentManager_maybe_normalizesettrainingtoFalsefor eval envs,
to prevent normalization stats from being updated in eval envs (e.g. in EvalCallback) (@pchalasani). - Only one env is used to get the action space while optimizing hyperparameters and it is correctly closed (@SammyRamone)
- Added progress bar via the
-Pargument using tqdm and rich
SB3 v1.6.0: Huggingface hub integration, Recurrent PPO (PPO LSTM)
Release 1.6.0 (2022-08-05)
Breaking Changes
- Change default value for number of hyperparameter optimization trials from 10 to 500. (@ernestum)
- Derive number of intermediate pruning evaluations from number of time steps (1 evaluation per 100k time steps.) (@ernestum)
- Updated default --eval-freq from 10k to 25k steps
- Update default horizon to 2 for the
HistoryWrapper - Upgrade to Stable-Baselines3 (SB3) >= 1.6.0
- Upgrade to sb3-contrib >= 1.6.0
New Features
- Support setting PyTorch's device with thye
--deviceflag (@Gregwar) - Add
--max-total-trialsparameter to help with distributed optimization. (@ernestum) - Added
vec_env_wrappersupport in the config (works the same asenv_wrapper) - Added Huggingface hub integration
- Added
RecurrentPPOsupport (akappo_lstm) - Added autodownload for "official" sb3 models from the hub
- Added Humanoid-v3, Ant-v3, Walker2d-v3 models for A2C (@pseudo-rnd-thoughts)
- Added MsPacman models
Bug fixes
- Fix
Reacher-v3name in PPO hyperparameter file - Pinned ale-py==0.7.4 until new SB3 version is released
- Fix enjoy / record videos with LSTM policy
- Fix bug with environments that have a slash in their name (@ernestum)
- Changed
optimize_memory_usagetoFalsefor DQN/QR-DQN on Atari games,
if you want to save RAM, you need to deactivatehandle_timeout_termination
in thereplay_buffer_kwargs
Documentation
Other
- When pruner is set to
"none", useNopPrunerinstead of divertedMedianPruner(@qgallouedec)
SB3 v1.5.0: Support for Weight and Biases experiment tracking
Release 1.5.0 (2022-03-25)
Support for Weight and Biases experiment tracking
Breaking Changes
- Upgrade to Stable-Baselines3 (SB3) >= 1.5.0
- Upgrade to sb3-contrib >= 1.5.0
- Upgraded to gym 0.21
New Features
- Verbose mode for each trial (when doing hyperparam optimization) can now be activated using the debug mode (verbose == 2)
- Support experiment tracking via Weights and Biases via the
--trackflag (@vwxyzjn) - Support tracking raw episodic stats via
RawStatisticsCallback(@vwxyzjn, see #216)
Bug fixes
- Policies saved during during optimization with distributed Optuna load on new systems (@JKTerry)
- Fixed script for recording video that was not up to date with the enjoy script
SB3 v1.4.0: TRPO, ARS and multi env training for off-policy algorithms
Breaking Changes
- Dropped python 3.6 support
- Upgrade to Stable-Baselines3 (SB3) >= 1.4.0
- Upgrade to sb3-contrib >= 1.4.0
New Features
- Added mujoco hyperparameters
- Added MuJoCo pre-trained agents
- Added script to parse best hyperparameters of an optuna study
- Added TRPO support
- Added ARS support and pre-trained agents
Documentation
- Replace front image
SB3 v1.3.0: rliable plots and bug fixes
WARNING: This version will be the last one supporting Python 3.6 (end of life in Dec 2021). We highly recommended you to upgrade to Python >= 3.7.
Breaking Changes
- Upgrade to panda-gym 1.1.1
- Upgrade to Stable-Baselines3 (SB3) >= 1.3.0
- Upgrade to sb3-contrib >= 1.3.0
New Features
- Added support for using rliable for performance comparison
Bug fixes
- Fix training with Dict obs and channel last images
Other
- Updated docker image
- constrained gym version: gym>=0.17,<0.20
- Better hyperparameters for A2C/PPO on Pendulum
SB3 v1.2.0
Breaking Changes
- Upgrade to Stable-Baselines3 (SB3) >= 1.2.0
- Upgrade to sb3-contrib >= 1.2.0
Bug fixes
- Fix
--load-last-checkpoint(@SammyRamone) - Fix
TypeErrorforgym.Envclass entry points inExperimentManager(@schuderer) - Fix usage of callbacks during hyperparameter optimization (@SammyRamone)
Other
- Added python 3.9 to Github CI
- Increased DQN replay buffer size for Atari games (@nikhilrayaprolu)
SB3 v1.1.0
Breaking Changes
- Upgrade to Stable-Baselines3 (SB3) >= 1.1.0
- Upgrade to sb3-contrib >= 1.1.0
- Add timeout handling (cf SB3 doc)
HERis now a replay buffer class and no more an algorithm- Removed
PlotNoiseRatioCallback - Removed
PlotActionWrapper - Changed
'lr'key in Optuna param dict to'learning_rate'so the dict can be directly passed to SB3 methods (@justinkterry)
New Features
- Add support for recording videos of best models and checkpoints (@mcres)
- Add support for recording videos of training experiments (@mcres)
- Add support for dictionary observations
- Added experimental parallel training (with
utils.callbacks.ParallelTrainCallback) - Added support for using multiple envs for evaluation
- Added
--load-last-checkpointoption for the enjoy script - Save Optuna study object at the end of hyperparameter optimization and plot the results (
plotlypackage required) - Allow to pass multiple folders to
scripts/plot_train.py - Flag to save logs and optimal policies from each training run (@justinkterry)
Bug fixes
- Fixed video rendering for PyBullet envs on Linux
- Fixed
get_latest_run_id()so it works in Windows too (@NicolasHaeffner) - Fixed video record when using
HERreplay buffer
Documentation
- Updated README (dict obs are now supported)
Other
- Added
is_bullet()toExperimentManager - Simplify
close()for the enjoy script - Updated docker image to include latest black version
- Updated TD3 Walker2D model (thanks @modanesh)
- Fixed typo in plot title (@scottemmons)
- Minimum cloudpickle version added to
requirements.txt(@amy12xx) - Fixed atari-py version (ROM missing in newest release)
- Updated
SACandTD3search spaces - Cleanup eval_freq documentation and variable name changes (@justinkterry)
- Add clarifying print statement when printing saved hyperparameters during optimization (@justinkterry)
- Clarify n_evaluations help text (@justinkterry)
- Simplified hyperparameters files making use of defaults
- Added new TQC+HER agents
- Add
panda-gymenvironments (@qgallouedec)
Stable-Baselines3 v1.0 - 100+ pre-trained models
Blog post: https://araffin.github.io/post/sb3/
Breaking Changes
- Upgrade to SB3 >= 1.0
- Upgrade to sb3-contrib >= 1.0
New Features
- Added 100+ trained agents + benchmark file
- Add support for loading saved model under python 3.8+ (no retraining possible)
- Added Robotics pre-trained agents (@sgillen)
Bug fixes
- Bug fixes for
HERhandling action noise - Fixed double reset bug with
HERand enjoy script
Documentation
- Added doc about plotting scripts
Other
- Updated
HERhyperparameters
Big refactor - SB3 upgrade - Last before v1.0
Breaking Changes
- Removed
LinearNormalActionNoise - Evaluation is now deterministic by default, except for Atari games
sb3_contribis now requiredTimeFeatureWrapperwas moved to the contrib repo- Replaced old
plot_train.pyscript with updatedplot_training_success.py - Renamed
n_episodes_rollouttotrain_freqtuple to match latest version of SB3
New Features
- Added option to choose which
VecEnvclass to use for multiprocessing - Added hyperparameter optimization support for
TQC - Added support for
QR-DQNfrom SB3 contrib
Bug fixes
- Improved detection of Atari games
- Fix potential bug in plotting script when there is not enough timesteps
- Fixed a bug when using HER + DQN/TQC for hyperparam optimization
Documentation
- Improved documentation (@cboettig)
Other
- Refactored train script, now uses a
ExperimentManagerclass - Replaced
make_envwith SB3 built-inmake_vec_env - Add more type hints (
utils/utils.pydone) - Use f-strings when possible
- Changed
PPOatari hyperparameters (removed vf clipping) - Changed
A2Catari hyperparameters (eps value of the optimizer) - Updated benchmark script
- Updated hyperparameter optim search space (commented gSDE for A2C/PPO)
- Updated
DQNhyperparameters for CartPole - Do not wrap channel-first image env (now natively supported by SB3)
- Removed hack to log success rate
- Simplify plot script