Hotfix for PPO/A2C + gSDE, internal refactoring and bug fixes
Pre-releaseBreaking Changes:
-
render()
method ofVecEnvs
now only accept one argument:mode
-
Created new file common/torch_layers.py, similar to SB refactoring
- Contains all PyTorch network layer definitions and feature extractors:
MlpExtractor
,create_mlp
,NatureCNN
- Contains all PyTorch network layer definitions and feature extractors:
-
Renamed
BaseRLModel
toBaseAlgorithm
(along with offpolicy and onpolicy variants) -
Moved on-policy and off-policy base algorithms to
common/on_policy_algorithm.py
andcommon/off_policy_algorithm.py
, respectively. -
Moved
PPOPolicy
toActorCriticPolicy
in common/policies.py -
Moved
PPO
(algorithm class) intoOnPolicyAlgorithm
(common/on_policy_algorithm.py
), to be shared with A2C -
Moved following functions from
BaseAlgorithm
:_load_from_file
toload_from_zip_file
(save_util.py)_save_to_file_zip
tosave_to_zip_file
(save_util.py)safe_mean
tosafe_mean
(utils.py)check_env
tocheck_for_correct_spaces
(utils.py. Renamed to avoid confusion with environment checker tools)
-
Moved static function
_is_vectorized_observation
from common/policies.py to common/utils.py under nameis_vectorized_observation
. -
Removed
{save,load}_running_average
functions ofVecNormalize
in favor ofload/save
. -
Removed
use_gae
parameter fromRolloutBuffer.compute_returns_and_advantage
.
Bug Fixes:
- Fixed
render()
method forVecEnvs
- Fixed
seed()
method forSubprocVecEnv
- Fixed loading on GPU for testing when using gSDE and
deterministic=False
- Fixed
register_policy
to allow re-registering same policy for same sub-class (i.e. assign same value to same key). - Fixed a bug where the gradient was passed when using
gSDE
withPPO
/A2C
, this does not affectSAC
Others:
- Re-enable unsafe
fork
start method in the tests (was causing a deadlock with tensorflow) - Added a test for seeding
SubprocVecEnv
and rendering - Fixed reference in NatureCNN (pointed to older version with different network architecture)
- Fixed comments saying "CxWxH" instead of "CxHxW" (same style as in torch docs / commonly used)
- Added bit further comments on register/getting policies ("MlpPolicy", "CnnPolicy").
- Renamed
progress
(value from 1 in start of training to 0 in end) toprogress_remaining
. - Added
policies.py
files for A2C/PPO, which define MlpPolicy/CnnPolicy (renamed ActorCriticPolicies). - Added some missing tests for
VecNormalize
,VecCheckNan
andPPO
.
Documentation:
- Added a paragraph on "MlpPolicy"/"CnnPolicy" and policy naming scheme under "Developer Guide"
- Fixed second-level listing in changelog