Skip to content

Hotfix for PPO/A2C + gSDE, internal refactoring and bug fixes

Pre-release
Pre-release
Compare
Choose a tag to compare
@araffin araffin released this 10 Jun 17:01
· 470 commits to master since this release
494ebfd

Breaking Changes:

  • render() method of VecEnvs now only accept one argument: mode

  • Created new file common/torch_layers.py, similar to SB refactoring

    • Contains all PyTorch network layer definitions and feature extractors: MlpExtractor, create_mlp, NatureCNN
  • Renamed BaseRLModel to BaseAlgorithm (along with offpolicy and onpolicy variants)

  • Moved on-policy and off-policy base algorithms to common/on_policy_algorithm.py and common/off_policy_algorithm.py, respectively.

  • Moved PPOPolicy to ActorCriticPolicy in common/policies.py

  • Moved PPO (algorithm class) into OnPolicyAlgorithm (common/on_policy_algorithm.py), to be shared with A2C

  • Moved following functions from BaseAlgorithm:

    • _load_from_file to load_from_zip_file (save_util.py)
    • _save_to_file_zip to save_to_zip_file (save_util.py)
    • safe_mean to safe_mean (utils.py)
    • check_env to check_for_correct_spaces (utils.py. Renamed to avoid confusion with environment checker tools)
  • Moved static function _is_vectorized_observation from common/policies.py to common/utils.py under name is_vectorized_observation.

  • Removed {save,load}_running_average functions of VecNormalize in favor of load/save.

  • Removed use_gae parameter from RolloutBuffer.compute_returns_and_advantage.

Bug Fixes:

  • Fixed render() method for VecEnvs
  • Fixed seed() method for SubprocVecEnv
  • Fixed loading on GPU for testing when using gSDE and deterministic=False
  • Fixed register_policy to allow re-registering same policy for same sub-class (i.e. assign same value to same key).
  • Fixed a bug where the gradient was passed when using gSDE with PPO/A2C, this does not affect SAC

Others:

  • Re-enable unsafe fork start method in the tests (was causing a deadlock with tensorflow)
  • Added a test for seeding SubprocVecEnv and rendering
  • Fixed reference in NatureCNN (pointed to older version with different network architecture)
  • Fixed comments saying "CxWxH" instead of "CxHxW" (same style as in torch docs / commonly used)
  • Added bit further comments on register/getting policies ("MlpPolicy", "CnnPolicy").
  • Renamed progress (value from 1 in start of training to 0 in end) to progress_remaining.
  • Added policies.py files for A2C/PPO, which define MlpPolicy/CnnPolicy (renamed ActorCriticPolicies).
  • Added some missing tests for VecNormalize, VecCheckNan and PPO.

Documentation:

  • Added a paragraph on "MlpPolicy"/"CnnPolicy" and policy naming scheme under "Developer Guide"
  • Fixed second-level listing in changelog