Connect 4 is a classic board game.
Connect 4 is a two-player board game. Players take turns dropping colored counters into grid. Each player plays a single color only. The pieces fall straight down to the lowest available space in the column.
The goal is to form a horizontal, vertical, or diagonal line of 4 of your counters.
You can play with your teammate here: https://boardgames.io/en/connect4.
Your task is to build a Reinforcement Learning agent that plays Connect 4
- You must build a Reinforcement Learning agent.
- Rules-based agents aren't allowed!
- You can only write code in
main.py
and you can only store data in a dictionary (saved in a.pkl
file bysave_dictionary()
*)- In the competition, your agent will call the
choose_move()
function inmain.py
to select a move - Any code not in
main.py
will not be used. - Check your submission is valid with
check_submission()
- In the competition, your agent will call the
- Submission deadline: 4pm GMT, Sunday.
- You can update your code after submitting, but not after the deadline.
- Check your submission is valid with
check_submission()
*save_dictionary()
is a function in game_mechanics.py
The competition will consist of your AI playing other teams' AIs 1-v-1 in a knockout tournament.
Each 1-v-1 matchup consists of a single game with the first player to play chosen randomly. If the game is a draw, another game will be played with the other player starting first.
The competition & discussion will be in Gather Town at 5pm GMT on Sunday (1 hour after submission deadline)!
We strongly suggest you use a feature lookup table. Read feature_vectors.md
for a short recap of feature vectors.
Unlike typical Connect 4, where there are 7 columns and 6 rows, we’re using 8 columns. The board is a 6x8 numpy
array.
The first axis is the row index and the 2nd axis is the column index.
The below image shows this visually. The numbers in square brackets (in the top-left) show how different elements in the array can be referenced.
The pieces are integers in this array. An empty space is 0
. Your pieces are denoted 1
. Your opponent's pieces are denoted -1
.
Since there are 3 ** 48 = 1e23
possible states, we strongly suggest you use feature vectors to reduce the state space. See feature_vectors.md
in this repl for more a refresher on feature vectors.
The index (0 -> 7) of the column to drop your counter into. In the above diagram, this is the number on the 2nd Axis. The index of a column that is full (there are no spaces left) is an invalid action.
You received +1
for winning, -1
for losing and 0
for a draw. You receive 0
for all other moves.
train()
Write this to train your value function dictionary from experience in the environment. Use TD learning.
Output the trained dictionary so it can be saved.
to_feature_vector()
Write this to convert a state into a feature vector. These features are used to represent the state in the value function lookup table.
Input is the state (np array) and output is a tuple which you design! The better the features you pick out, the faster your agent will learn and better it can be at Connect-4.
Too detailed of a feature vector and it'll take a long time to train. Not enough detail and your agent will hit a ceiling since too many varied states will look identical. E.g. if your feature was just "number of pieces played by me", there are many different states with the same number of pieces played (and thus the same value function).
Env
class
The environment class controls the game and runs the opponent It should be used for training your agent.
See example usage in
play_connect_4_game()
.
The opponent's
choose_move
function is input at initialisation (when Env(opponent_choose_move)
is called). The first player is chosen at random when Env.reset()
is called. Every time you call Env.step()
, 2 moves are taken - yours and then your opponent's. Your opponent sees a 'flipped' version of the board, where his pieces are shown as 1
's and yours are shown as -1
's.
Both
Env.step()
and Env.reset()
have verbose
arguments which print debugging info to console when set to True
.
choose_move()
This acts greedily given the state and value function dictionary.
In the competition, the
choose_move()
function is called to make your next move. Takes the state as input and outputs an action.
Also has a
verbose
mode, which when set to True
prints to console the possible actions, their corresponding features if taken and the values of those feature vectors. Useful for debugging.
choose_move_randomly()
Like above, but randomly picks from non-full columns.
Takes the state as input and outputs an action.
play_connect_4_game()
Plays 1 game of Connect 4, which can be visualsed either in the console (if verbose=True
) or rendered visually (if render = True
). Outputs the return for your agent.
Inputs:
your_choose_move
: Function that takes the state and outputs the action for your agent.
opponent_choose_move
: Function that takes the state and outputs the action for the opponent.
game_speed_multiplier
: controls the gameplay speed. High numbers mean fast games, low numbers mean slow games.
render
: whether to render the game visually.
verbose
: whether to print to console each move and the corresponding board states.
There are a load of functions in game_mechanics.py
. The useful functions are clearly indicated and are explained in their docstrings. Feel free to use these, but don't change them. This is because the original game_mechanics.py
file will be used in the competition.
If you want to tweak one, copy-paste it to main.py
and rename it.
- Discuss which features should be in the feature vector - what corresponds with a good Connect 4 game state? What about a bad one?
- Write
train()
, borrowing from past exercises - Iterate, iterate, iterate on that
to_feature_vector()
function