-
Notifications
You must be signed in to change notification settings - Fork 186
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
neutrino: implement parallel header download via header checkpoints #71
Comments
Hello all. I would be taking this issue as part of my summer of Bitcoin project. In this section of my proposal, I explained my current knowledge of the sync process. You can go through it here:
|
We can partition and fetch headers in parallel but writing headers to file would still require an order and so maintain a synchronous process |
Correct, an internal message queue would be a nautral fit here. I think you should check out some of the recent PRs that start to batch write the fetched fitlers on disk. |
@Chinwendu20 as @Roasbeef points out there are a couple of PRs that can be useful here: #273 and #274. I would like to know your approach here but we could start with something simple:
The logic for sending/receiving messages from a peer is defined in btcd. Neutrino uses the As you can see, most of the logic is controlled by queues (channels) We can make the changes for being able to download multiple headers in parallel, with the task manager etc and then decide what to do while the tasks are being completed. Should we fetch all the headers in parallel, store them in memory and validate them when they are downloaded? For sure we can have a nice performance boost by not storing them one by one but "in batches" when they have been validated. Those look like enough changes for me in a first draft PR but I am more than happy to know how you would like to approach it |
Thanks a lot @positiveblue I have gone through the work by @aakselrod @Roasbeef @halseth on parallelizing filter header download using the work manager and workers. It is a pattern that I am following to get this done. However, in that scenario, checkpoints help spread the workload for download across different peers because the hashes within a range can be verified as double hashing the hashes within that range should give the checkpoint. I am wondering in this scenario how checkpoints would help in parallelizing the header as I do not think the same applies to block headers, |
Block headers also have known checkpoints: https://github.com/btcsuite/btcd/blob/v0.23.4/chaincfg/params.go#L304-L331 |
Thanks @Roasbeef . I know that already. I was referring to the verification procedure. The checkpoints in filter headers help in verification as double hashing the hashes within that range should give the checkpoint . I was wondering how checkpoints in block headers would also help in verification process for block headers as unlike filter headers, there is no causality between headers within a checkpoint range such that hashing headers within a checkpoint range should givevthe hash of the filter checkpoint. All this to say that for the block headers, I do not see how fetching headers within a checkpoint range would help in verification as it would be no different between doing that abd fetching within a random range because of the points metioned above. |
I think since every header should have the header hash of the previous block header perhaps we can use that for verification using the checkpoints. I think I would work with this and see how it goes. |
Never mind about this, I think I know the role checkpoints would play in parallelizing the header. Since every block header has the block hash of the previous header, checkpoints could then be used to verify a range of headers gotten from a peer. |
Yep, I think you're on the right track: all headers have the prev header hash, so given all the headers in a range within a checkpoint interval, one can hash then in series to verify that you end up with the checkpoint itself. There'a also code in |
Thank you |
Hmm would this involve creating a different method of constructing block locators? As I can see block locators are created from already existing blocks in the database. There would be no way of doing this if we are sending a request that should contain this block locator in parallel as we would be creating block locators for heights that do not exist in the DB. |
I guess I was just follow this. Lines 2328 to 2336 in d501210
|
Could you please refer me to the code in btcd that does this? Verifies headers against checkpoint as it is being received as I cannot find it. While testing this out. I just realized that the difference between a header checkpoint and the next could be more than 2000 headers which is the maximum number of headers that a message can contain. Initially, I thought that I could just partition the headers sending a pushgetheaders message to fetch a range of headers from one checkpoint to the next checkpoint but that would not be possible. I guess I would just partitition them in a range of the maximum header per message so the nextcheckpoint would not be the end hash in some cases. So I would just fetch batch of headers and verify them using this function This process makes me wonder how checkpoints were intended to help in parellizing headers in the first place. From the handleheaderMsg it is not only the header hash that is used to carry out verification. There also other components of the header needed by the checksanity function. These components are not simply present in the Checkpoint struct. |
Never mind about this, headers at a checkpoint height do not undergo verification rather their hashes only are sufficient enough to verify other headers after them. I guess that is how checkpoints can help in parallelizing headers as headers at a particular checkpoint height do not need to be in store to obtain their hash as their hash is already available in the checkpoint struct. |
Block locators just describe the next set of intervals to fetch. Assuming you know the current height, and the next checkpoint, then you can create your own block locators. Check out the |
Thanks a lot |
Yeah, I was wrong about this, the checksanity carries out verification on the header that requires components of the header such as timestamps. So we need more data about the header at a particular checkpoint height to efficiently parallelize. I think we can do this by:
|
Not sure I follow. Given the set of checkpoints, one can generate a |
Currently, we sync the chain purely serially from a single peers. However, due to the existence of header checkpoints, it's actually possible for us to partition up the header space between the checkpoints and download them all in parallel from several distinct peers. This has the potential to speed up initial sync a good bit as we can start to cut off slow peers, and re-assign ranges to responsive peers.
Note that we'll likely want to address #66 before this, as that should refactor the code significantly and also switch over to pure usage of
getheaders
.The text was updated successfully, but these errors were encountered: