This repository contains both the code and the report on the implementation and understanding of the Proximal Policy Optimization (PPO) algorithm in the game Super Mario Bros
Proximal Policy Optimization (PPO) is a type of Reinforcement Learning algorithm that has gained significant attention in recent years. It addresses some of the challenges faced by earlier policy gradient methods, providing more stable and consistent training.
In this repository, PPO is applied to train an agent to navigate the challenges and adversaries in Super Mario Bros. For a detailed understanding and hands-on examples, it's highly recommended to read the report provided.
- Python 3.x
- pip
Follow these steps to get up and running:
-
Clone the repository:
git clone <https://github.com/Alex-Hawking/PPO_Super_Mario_Bros.git> cd <PPO_Super_Mario_Bros>
-
Create a virtual environment:
python3 -m venv venv
(Would highly recommend)
-
Activate the virtual environment:
- Linux/Mac:
source venv/bin/activate
- Windows:
.\venv\Scripts\activate
- Linux/Mac:
-
Install the required packages:
pip install -r requirements.txt
-
Directory structure
Ensure your directory is structured as below:
├── PPO_Super_Mario_Bros/ │ ├── main.py │ ├── src/ │ ├── model/ │ | ├── checkpoints/
-
Device setup
Ensure you device is correctly set by following the steps at the stop of
src/agent.py
After you've installed the prerequisites and have you directory set up correctly, you can run the agent:
python main.py
This will begin training a model using the default hyperparameters. However you can make changes to the hyperparameters and functionality of the model by changing variables located at the top of main.py
.
I have included a partially trained model in the checkpoints folder, it should be able to complete level 1 :)
To understand what these do and how they work I would recommending reading the short report I wrote on PPO and its implementation in Super Mario Bros.