Stable-Baselines3 v2.2.1: Support for options at reset, bug fixes and better error messages
SB3 Contrib (more algorithms): https://github.com/Stable-Baselines-Team/stable-baselines3-contrib
RL Zoo3 (training framework): https://github.com/DLR-RM/rl-baselines3-zoo
Stable-Baselines Jax (SBX): https://github.com/araffin/sbx
To upgrade:
pip install stable_baselines3 sb3_contrib --upgrade
or simply (rl zoo depends on SB3 and SB3 contrib):
pip install rl_zoo3 --upgrade
Note
Stable-Baselines3 (SB3) v2.2.0 was yanked after a breaking change was found in GH#1751.
Please use SB3 v2.2.1 and not v2.2.0.
Breaking Changes:
- Switched to
ruff
for sorting imports (isort is no longer needed), black and ruff version now require a minimum version - Dropped
x is False
in favor ofnot x
, which means that callbacks that wrongly returned None (instead of a boolean) will cause the training to stop (@iwishiwasaneagle)
New Features:
- Improved error message of the
env_checker
for env wrongly detected as GoalEnv (compute_reward()
is defined) - Improved error message when mixing Gym API with VecEnv API (see GH#1694)
- Add support for setting
options
at reset with VecEnv via theset_options()
method. Same as seeds logic, options are reset at the end of an episode (@ReHoss) - Added
rollout_buffer_class
androllout_buffer_kwargs
arguments to on-policy algorithms (A2C and PPO)
Bug Fixes:
- Prevents using squash_output and not use_sde in ActorCritcPolicy (@PatrickHelm)
- Performs unscaling of actions in collect_rollout in OnPolicyAlgorithm (@PatrickHelm)
- Moves VectorizedActionNoise into
_setup_learn()
in OffPolicyAlgorithm (@PatrickHelm) - Prevents out of bound error on Windows if no seed is passed (@PatrickHelm)
- Calls
callback.update_locals()
beforecallback.on_rollout_end()
in OnPolicyAlgorithm (@PatrickHelm) - Fixed replay buffer device after loading in OffPolicyAlgorithm (@PatrickHelm)
- Fixed
render_mode
which was not properly loaded when usingVecNormalize.load()
- Fixed success reward dtype in
SimpleMultiObsEnv
(@NixGD) - Fixed check_env for Sequence observation space (@corentinlger)
- Prevents instantiating BitFlippingEnv with conflicting observation spaces (@kylesayrs)
- Fixed ResourceWarning when loading and saving models (files were not closed), please note that only path are closed automatically,
the behavior stay the same for tempfiles (they need to be closed manually),
the behavior is now consistent when loading/saving replay buffer
SB3-Contrib
- Added
set_options
forAsyncEval
- Added
rollout_buffer_class
androllout_buffer_kwargs
arguments to TRPO
RL Zoo
- Removed
gym
dependency, the package is still required for some pretrained agents. - Added
--eval-env-kwargs
totrain.py
(@Quentin18) - Added
ppo_lstm
to hyperparams_opt.py (@technocrat13) - Upgraded to
pybullet_envs_gymnasium>=0.4.0
- Removed old hacks (for instance limiting offpolicy algorithms to one env at test time)
- Updated docker image, removed support for X server
- Replaced deprecated
optuna.suggest_uniform(...)
byoptuna.suggest_float(..., low=..., high=...)
SBX (SB3 + Jax)
- Added
DDPG
andTD3
algorithms
Others:
- Fixed
stable_baselines3/common/callbacks.py
type hints - Fixed
stable_baselines3/common/utils.py
type hints - Fixed
stable_baselines3/common/vec_envs/vec_transpose.py
type hints - Fixed
stable_baselines3/common/vec_env/vec_video_recorder.py
type hints - Fixed
stable_baselines3/common/save_util.py
type hints - Updated docker images to Ubuntu Jammy using micromamba 1.5
- Fixed
stable_baselines3/common/buffers.py
type hints - Fixed
stable_baselines3/her/her_replay_buffer.py
type hints - Buffers do no call an additional
.copy()
when storing new transitions - Fixed
ActorCriticPolicy.extract_features()
signature by adding an optionalfeatures_extractor
argument - Update dependencies (accept newer Shimmy/Sphinx version and remove
sphinx_autodoc_typehints
) - Fixed
stable_baselines3/common/off_policy_algorithm.py
type hints - Fixed
stable_baselines3/common/distributions.py
type hints - Fixed
stable_baselines3/common/vec_env/vec_normalize.py
type hints - Fixed
stable_baselines3/common/vec_env/__init__.py
type hints - Switched to PyTorch 2.1.0 in the CI (fixes type annotations)
- Fixed
stable_baselines3/common/policies.py
type hints - Switched to
mypy
only for checking types - Added tests to check consistency when saving/loading files
Documentation:
- Updated RL Tips and Tricks (include recommendation for evaluation, added links to DroQ, ARS and SBX).
- Fixed various typos and grammar mistakes
Full changelog: v2.1.0...v2.2.1