Reinforcement learning. How to implement? #1555

ghost · 2023-09-19T09:25:24Z

ghost
Sep 19, 2023

jt70 · 2023-09-19T22:06:30Z

jt70
Sep 19, 2023

I'm working on reinforcement learning using Burn (on my own, I'm not affiliated with the Burn project). I'm personally not aware of any existing Burn implementations. I can reply here when I have something deployed, in case anyone is interested.

0 replies

ghost · 2023-09-19T22:09:43Z

ghost
Sep 19, 2023

Yes, I am interested. Thanks!

0 replies

kevincox · 2023-11-09T22:49:50Z

kevincox
Nov 9, 2023

Basics

I tried to implement this. And while I am completely new to burn I think it was mostly possible. However I hit one major issue. burn::train::TrainStep::step and burn::train::ValidStep::step are &self and have no mutable objects passed in. This means that there is nowhere to stash the access to my environment simulator.

However for now I am working around this by creating a new environment each time. This is a little slow but given a decently large batch size isn't a significant overhead. This could still be optimized using interior mutably if needed and some sort of cache.

Parallelism

One other issue is that the built-in training loop doesn't have support for parallelization. (The tutorial parallelizes the data loaders but these are basically noops in my env). I think this can be worked around by writing our own training loop and calculating gradients in parallel then applying them all. But I need to do a bit of research and testing to ensure that doing that produces the right result compared to doing it serially.

Another option may be using an older version of the model in the dataloader. That way I can evaluate a "trace" of my simulation in parallel using the ability for loaders to run in parallel. I can then save the results and scores of the different steps of the simulation and return those as the batch. The training step would then re-evaluate the model and compute the loss and gradients with the latest model. The downside here is that it means evaluating the model twice for each step of simulation (once in the dataloader and once to evaluate the gradients). This may be mitigated by using a simpler model to generate the training data. (Basically converting my problem into classification by using a different engine for prediction to generate training data + labels)

I'll also have to experiment to see if passing the same device in multiple times to the devices parameter will allow parallelism.

0 replies

nathanielsimard · 2023-11-10T13:46:42Z

nathanielsimard
Nov 10, 2023
Maintainer

@kevincox Passing the same device multiple times is supposed to create parallelism, if it doesn't we should fix it!

0 replies

jt70 · 2023-11-10T17:39:20Z

jt70
Nov 10, 2023

I'm taking a crack at Cartpole / DQN today. I think I'll have a prototype up today or tomorrow.

0 replies

jt70 · 2023-11-12T20:47:51Z

jt70
Nov 12, 2023

Hey guys, good news! Cartpole/DQN is working. I will clean things up a bit a publish something today.

Episode 4090, reward 500

0 replies

antimora · 2023-11-12T21:13:14Z

antimora
Nov 12, 2023
Collaborator

@jt70

Hey guys, good news! Cartpole/DQN is working. I will clean things up a bit a publish something today.

Episode 4090, reward 500

Great! Looking forward to seeing it!

0 replies

jt70 · 2023-11-13T02:31:44Z

jt70
Nov 13, 2023

Code is here - https://github.com/jt70/deep_thinker

Presently, it's only DQN / Cartpole. It should give reasonable results after about 3500-4000 episodes. Obviously, this is just a prototype to get things started. If you'd like to see any particular algorithms / environments, please open an issue.

My initial goals are more examples, code cleanup, documentation, rendering, and more algorithms, probably various DQN improvements and PPO to start with.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reinforcement learning. How to implement? #1555

{{title}}

Replies: 8 comments

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

Reinforcement learning. How to implement? #1555

ghost Sep 19, 2023

Replies: 8 comments

jt70 Sep 19, 2023

ghost Sep 19, 2023

kevincox Nov 9, 2023

Basics

Parallelism

nathanielsimard Nov 10, 2023 Maintainer

jt70 Nov 10, 2023

jt70 Nov 12, 2023

antimora Nov 12, 2023 Collaborator

jt70 Nov 13, 2023

ghost
Sep 19, 2023

jt70
Sep 19, 2023

ghost
Sep 19, 2023

kevincox
Nov 9, 2023

nathanielsimard
Nov 10, 2023
Maintainer

jt70
Nov 10, 2023

jt70
Nov 12, 2023

antimora
Nov 12, 2023
Collaborator

jt70
Nov 13, 2023