You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The calculation of reward is the score of the training model minus the score of the AI.
The model is weak at the beginning of the training, so the score will be less than the AI and cause the reward to be negative.
When the training model becomes stronger by training, reward and Avg.reward will become positive.
An episode ends when one of the players reach 21 points in pong.
Our reward is defined as: Reward= Nwin - Nlose Nwin is the number of wining rounds in one episode. Nlose is the number of losing rounds in one episode.
When you run python main.py --train_pg, you will start to train a policy gradient model with "noob" weights.
So you will get Nwin = 0 and Nlose = 21 at the begining.
Thus, the reward is -21, which is negative.
When I run
python main.py --train_pg
the reward or Avg.reward is negative. What' s wrong with it?
The text was updated successfully, but these errors were encountered: