Replies: 8 comments
-
I'm working on reinforcement learning using Burn (on my own, I'm not affiliated with the Burn project). I'm personally not aware of any existing Burn implementations. I can reply here when I have something deployed, in case anyone is interested. |
Beta Was this translation helpful? Give feedback.
-
Yes, I am interested. Thanks! |
Beta Was this translation helpful? Give feedback.
-
BasicsI tried to implement this. And while I am completely new to burn I think it was mostly possible. However I hit one major issue. However for now I am working around this by creating a new environment each time. This is a little slow but given a decently large batch size isn't a significant overhead. This could still be optimized using interior mutably if needed and some sort of cache. ParallelismOne other issue is that the built-in training loop doesn't have support for parallelization. (The tutorial parallelizes the data loaders but these are basically noops in my env). I think this can be worked around by writing our own training loop and calculating gradients in parallel then applying them all. But I need to do a bit of research and testing to ensure that doing that produces the right result compared to doing it serially. Another option may be using an older version of the model in the dataloader. That way I can evaluate a "trace" of my simulation in parallel using the ability for loaders to run in parallel. I can then save the results and scores of the different steps of the simulation and return those as the batch. The training step would then re-evaluate the model and compute the loss and gradients with the latest model. The downside here is that it means evaluating the model twice for each step of simulation (once in the dataloader and once to evaluate the gradients). This may be mitigated by using a simpler model to generate the training data. (Basically converting my problem into classification by using a different engine for prediction to generate training data + labels) I'll also have to experiment to see if passing the same device in multiple times to the |
Beta Was this translation helpful? Give feedback.
-
@kevincox Passing the same device multiple times is supposed to create parallelism, if it doesn't we should fix it! |
Beta Was this translation helpful? Give feedback.
-
I'm taking a crack at Cartpole / DQN today. I think I'll have a prototype up today or tomorrow. |
Beta Was this translation helpful? Give feedback.
-
Hey guys, good news! Cartpole/DQN is working. I will clean things up a bit a publish something today.
|
Beta Was this translation helpful? Give feedback.
-
Great! Looking forward to seeing it! |
Beta Was this translation helpful? Give feedback.
-
Code is here - https://github.com/jt70/deep_thinker Presently, it's only DQN / Cartpole. It should give reasonable results after about 3500-4000 episodes. Obviously, this is just a prototype to get things started. If you'd like to see any particular algorithms / environments, please open an issue. My initial goals are more examples, code cleanup, documentation, rendering, and more algorithms, probably various DQN improvements and PPO to start with. |
Beta Was this translation helpful? Give feedback.
-
Beta Was this translation helpful? Give feedback.
All reactions