Stable-Baselines3 v2.0.0: Gymnasium Support
Warning
Stable-Baselines3 (SB3) v2.0 will be the last one supporting python 3.7 (end of life in June 2023).
We highly recommended you to upgrade to Python >= 3.8.
SB3 Contrib (more algorithms): https://github.com/Stable-Baselines-Team/stable-baselines3-contrib
RL Zoo3 (training framework): https://github.com/DLR-RM/rl-baselines3-zoo
Stable-Baselines Jax (SBX): https://github.com/araffin/sbx
To upgrade:
pip install stable_baselines3 sb3_contrib rl_zoo3 --upgrade
or simply (rl zoo depends on SB3 and SB3 contrib):
pip install rl_zoo3 --upgrade
Breaking Changes:
- Switched to Gymnasium as primary backend, Gym 0.21 and 0.26 are still supported via the
shimmy
package (@carlosluis, @arjun-kg, @tlpss) - The deprecated
online_sampling
argument ofHerReplayBuffer
was removed - Removed deprecated
stack_observation_space
method ofStackedObservations
- Renamed environment output observations in
evaluate_policy
to prevent shadowing the input observations during callbacks (@npit) - Upgraded wrappers and custom environment to Gymnasium
- Refined the
HumanOutputFormat
file check: now it verifies if the object is an instance ofio.TextIOBase
instead of only checking for the presence of awrite
method. - Because of new Gym API (0.26+), the random seed passed to
vec_env.seed(seed=seed)
will only be effective after thenenv.reset()
call.
New Features:
- Added Gymnasium support (Gym 0.21 and 0.26 are supported via the
shimmy
package)
SB3-Contrib
- Fixed QRDQN update interval for multi envs
RL Zoo
- Gym 0.26+ patches to continue working with pybullet and TimeLimit wrapper
- Renamed
CarRacing-v1
toCarRacing-v2
in hyperparameters - Huggingface push to hub now accepts a
--n-timesteps
argument to adjust the length of the video - Fixed
record_video
steps (before it was stepping in a closed env) - Dropped Gym 0.21 support
Bug Fixes:
- Fixed
VecExtractDictObs
does not handle terminal observation (@WeberSamuel) - Set NumPy version to
>=1.20
due to use ofnumpy.typing
(@troiganto) - Fixed loading DQN changes
target_update_interval
(@tobirohrer) - Fixed env checker to properly reset the env before calling
step()
when checking
forInf
andNaN
(@lutogniew) - Fixed HER
truncate_last_trajectory()
(@lbergmann1) - Fixed HER desired and achieved goal order in reward computation (@JonathanKuelz)
Others:
- Fixed
stable_baselines3/a2c/*.py
type hints - Fixed
stable_baselines3/ppo/*.py
type hints - Fixed
stable_baselines3/sac/*.py
type hints - Fixed
stable_baselines3/td3/*.py
type hints - Fixed
stable_baselines3/common/base_class.py
type hints - Fixed
stable_baselines3/common/logger.py
type hints - Fixed
stable_baselines3/common/envs/*.py
type hints - Fixed
stable_baselines3/common/vec_env/vec_monitor|vec_extract_dict_obs|util.py
type hints - Fixed
stable_baselines3/common/vec_env/base_vec_env.py
type hints - Fixed
stable_baselines3/common/vec_env/vec_frame_stack.py
type hints - Fixed
stable_baselines3/common/vec_env/dummy_vec_env.py
type hints - Fixed
stable_baselines3/common/vec_env/subproc_vec_env.py
type hints - Upgraded docker images to use mamba/micromamba and CUDA 11.7
- Updated env checker to reflect what subset of Gymnasium is supported and improve GoalEnv checks
- Improve type annotation of wrappers
- Tests envs are now checked too
- Added render test for
VecEnv
andVecEnvWrapper
- Update issue templates and env info saved with the model
- Changed
seed()
method return type fromList
toSequence
- Updated env checker doc and requirements for tuple spaces/goal envs
Documentation:
- Added Deep RL Course link to the Deep RL Resources page
- Added documentation about
VecEnv
API vs Gym API - Upgraded tutorials to Gymnasium API
- Make it more explicit when using
VecEnv
vs Gym env - Added UAV_Navigation_DRL_AirSim to the project page (@heleidsn)
- Added
EvalCallback
example (@sidney-tio) - Update custom env documentation
- Added
pink-noise-rl
to projects page - Fix custom policy example,
ortho_init
was ignored - Added SBX page
Full Changelog: v1.8.0...v2.0.0