Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

I'm DIYing a katago-like project. Can I get some advice on backend choices/multithreading? #1014

Open
Garbage123King opened this issue Jan 17, 2025 · 1 comment

Comments

@Garbage123King
Copy link

Garbage123King commented Jan 17, 2025

1、I use libtorch as the backend for forward propagation because it is easy to use, but I have encountered some difficulties. Due to caching or synchronization issues, the speed of forward propagation slows down by more than 50 times, from 0.0001 seconds per iteration to 0.01 seconds per iteration, making self-play nearly impossible. Should I continue working with libtorch, or should I switch to CUDA sooner?

2、I use a single thread to handle forward propagation requests from all 128 threads. This thread uses a queue with a mutex. Each of the 128 threads performs its own game simulation and waits for the neural network thread to return results during each simulation. The neural network thread performs forward propagation when the queue is empty or the batch size is >=128, and then returns the results to each thread using promise.set_value. I have measured that the multithreading part of my code doesn’t introduce much delay, and the main delay is still due to what I mentioned in point 1, where forward propagation has a significant delay. However, I would still like to ask: Should I modify my multithreading approach?

@Garbage123King Garbage123King changed the title I'm DIYing a katago-like project. Can I get some advice on backend choices/multithreading? katago use pytorch to train but doesn't use a libtorch backend? Jan 17, 2025
@lightvector
Copy link
Owner

I think the old version of your question was more useful. :)

KataGo has various custom backends partly because it was following what Leela Zero did, and because having a few different backends of different kinds makes it possible to run on different hardware in different modes without having to install other dependencies. There's not necessarily a big advantage to doing all that work if you have something like libtorch working.

@Garbage123King Garbage123King changed the title katago use pytorch to train but doesn't use a libtorch backend? I'm DIYing a katago-like project. Can I get some advice on backend choices/multithreading? Jan 17, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants