Skip to content

Bug fixes, better image support and last release before v1.0

Pre-release
Pre-release
Compare
Choose a tag to compare
@araffin araffin released this 27 Feb 19:31
· 369 commits to master since this release
b2c94a6

Breaking Changes:

  • evaluate_policy now returns rewards/episode lengths from a Monitor wrapper if one is present,
    this allows to return the unnormalized reward in the case of Atari games for instance.
  • Renamed common.vec_env.is_wrapped to common.vec_env.is_vecenv_wrapped to avoid confusion
    with the new is_wrapped() helper
  • Renamed _get_data() to _get_constructor_parameters() for policies (this affects independent saving/loading of policies)
  • Removed n_episodes_rollout and merged it with train_freq, which now accepts a tuple (frequency, unit):
  • replay_buffer in collect_rollout is no more optional
  # SB3 < 0.11.0
  # model = SAC("MlpPolicy", env, n_episodes_rollout=1, train_freq=-1)
  # SB3 >= 0.11.0:
  model = SAC("MlpPolicy", env, train_freq=(1, "episode"))

New Features:

  • Add support for VecFrameStack to stack on first or last observation dimension, along with
    automatic check for image spaces.
  • VecFrameStack now has a channels_order argument to tell if observations should be stacked
    on the first or last observation dimension (originally always stacked on last).
  • Added common.env_util.is_wrapped and common.env_util.unwrap_wrapper functions for checking/unwrapping
    an environment for specific wrapper.
  • Added env_is_wrapped() method for VecEnv to check if its environments are wrapped
    with given Gym wrappers.
  • Added monitor_kwargs parameter to make_vec_env and make_atari_env
  • Wrap the environments automatically with a Monitor wrapper when possible.
  • EvalCallback now logs the success rate when available (is_success must be present in the info dict)
  • Added new wrappers to log images and matplotlib figures to tensorboard. (@zampanteymedio)
  • Add support for text records to Logger. (@lorenz-h)

Bug Fixes:

  • Fixed bug where code added VecTranspose on channel-first image environments (thanks @qxcv)
  • Fixed DQN predict method when using single gym.Env with deterministic=False
  • Fixed bug that the arguments order of explained_variance() in ppo.py and a2c.py is not correct (@thisray)
  • Fixed bug where full HerReplayBuffer leads to an index error. (@megan-klaiber)
  • Fixed bug where replay buffer could not be saved if it was too big (> 4 Gb) for python<3.8 (thanks @hn2)
  • Added informative PPO construction error in edge-case scenario where n_steps * n_envs = 1 (size of rollout buffer),
    which otherwise causes downstream breaking errors in training (@decodyng)
  • Fixed discrete observation space support when using multiple envs with A2C/PPO (thanks @ardabbour)
  • Fixed a bug for TD3 delayed update (the update was off-by-one and not delayed when train_freq=1)
  • Fixed numpy warning (replaced np.bool with bool)
  • Fixed a bug where VecNormalize was not normalizing the terminal observation
  • Fixed a bug where VecTranspose was not transposing the terminal observation
  • Fixed a bug where the terminal observation stored in the replay buffer was not the right one for off-policy algorithms
  • Fixed a bug where action_noise was not used when using HER (thanks @ShangqunYu)
  • Fixed a bug where train_freq was not properly converted when loading a saved model

Others:

  • Add more issue templates
  • Add signatures to callable type annotations (@ernestum)
  • Improve error message in NatureCNN
  • Added checks for supported action spaces to improve clarity of error messages for the user
  • Renamed variables in the train() method of SAC, TD3 and DQN to match SB3-Contrib.
  • Updated docker base image to Ubuntu 18.04
  • Set tensorboard min version to 2.2.0 (earlier version are apparently not working with PyTorch)
  • Added warning for PPO when n_steps * n_envs is not a multiple of batch_size (last mini-batch truncated) (@decodyng)
  • Removed some warnings in the tests

Documentation:

  • Updated algorithm table
  • Minor docstring improvements regarding rollout (@stheid)
  • Fix migration doc for A2C (epsilon parameter)
  • Fix clip_range docstring
  • Fix duplicated parameter in EvalCallback docstring (thanks @tfederico)
  • Added example of learning rate schedule
  • Added SUMO-RL as example project (@LucasAlegre)
  • Fix docstring of classes in atari_wrappers.py which were inside the constructor (@LucasAlegre)
  • Added SB3-Contrib page
  • Fix bug in the example code of DQN (@AptX395)
  • Add example on how to access the tensorboard summary writer directly. (@lorenz-h)
  • Updated migration guide
  • Updated custom policy doc (separate policy architecture recommended)
  • Added a note about OpenCV headless version
  • Corrected typo on documentation (@mschweizer)
  • Provide the environment when loading the model in the examples (@lorepieri8)