-
Notifications
You must be signed in to change notification settings - Fork 180
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Access] Add support for pebbleDB to execution data tracker/pruner #6277
base: master
Are you sure you want to change the base?
[Access] Add support for pebbleDB to execution data tracker/pruner #6277
Conversation
…e execution data tracker impl, refactored badger impl
… test for badger impl
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #6277 +/- ##
==========================================
- Coverage 41.42% 41.40% -0.03%
==========================================
Files 2024 2032 +8
Lines 144439 144742 +303
==========================================
+ Hits 59839 59928 +89
- Misses 78403 78624 +221
+ Partials 6197 6190 -7
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
…of github.com:The-K-R-O-K/flow-go into UlyanaAndrukhiv/6017-pebble-for-tracker-updates
…updated integration tests to avoid code dublication
…hub.com:The-K-R-O-K/flow-go into UlyanaAndrukhiv/6017-pebble-for-tracker-updates
…hub.com:The-K-R-O-K/flow-go into UlyanaAndrukhiv/6017-pebble-for-tracker-updates
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good!
@zhangchiqing since you've been working closely with pebble/badger lately, can you help review this one. It's refactoring the execution data pruner to support both badger and pebble |
Great work! @UlyanaAndrukhiv Since we're adding both Pebble and Badger implementations for the execution data tracker, this PR has become quite extensive. I suggest we start with the Badger solution first, focusing on the implementation for the tracker and the pruning logic. This PR also references the Pebble implementation I created in this pull request. However, that PR is still under review, and there's a high likelihood that the Pebble implementation will need refactoring. The patterns we're referring to here might become outdated, so it's better to wait and postpone the Pebble implementation for now. We could initially implement the pruner in Badger, refactoring it to use Badger batch updates instead of transactions. Since Pebble doesn't support transactions, using Badger batch updates could make it easier for us to switch to the Pebble implementation later. DesignWe might need to revisit the design of the Tracker and Pruner. The original tracker and pruner were implemented a long time ago, and they face several challenges if we switch to Badger batch updates. We should take a step back and reconsider the design first. For example: We tried to address this by keeping track of the highest indexed height for each CID (RetrieveTrackerLatestHeight / UpsertTrackerLatestHeight), but this introduces complexity, and I'm unsure if it's concurrency-safe without database transactions. Is it possible to eliminate the extra index to simplify things? If we keep the LatestHeight index for each CID, we need to be cautious about dirty writes that might corrupt data. For instance, while we're pruning a CID, we might also be concurrently indexing a new height with a certain CID referring to the deleted CID, which could corrupt the newly indexed data. We probably don't want to solve this problem by blocking indexing with a lock during pruning, because pruning might take a long time. Therefore, I think we need to address these challenges before proceeding with the implementation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just realized I forgot to submit my reivew, and my comments was in pending.
return builder.ExecutionDataBlobstore.DeleteBlob(context.TODO(), c) | ||
}), | ||
) | ||
if executionDataDBMode == execution_data.ExecutionDataDBModeBadger { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we implement a CheckExistingExecutionDataDBMode(executionDataDBMode, trackerDir)
function or something similar to check if the folder has consistent data with the DB mode?
This could prevent from accidentally using existing badger db data as pebble, which might corrupt the database.
// | ||
// No errors are expected during normal operation. | ||
func (s *ExecutionDataTracker) trackBlobs(blockHeight uint64, cids ...cid.Cid) error { | ||
cidsPerBatch := s.batchItemLimit(storage.CidsPerBatch, 2, storage.BlobRecordKeyLength+storage.LatestHeightKeyLength+8) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems this never change and can be calculated during initialization?
break | ||
} | ||
|
||
dInfo := &storage.DeleteInfo{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
better not to create until it's actually needed.
return err | ||
} | ||
|
||
if err := s.db.View(func(txn *badger.Txn) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think View
is a read-only op, but pruning is a write op
// - c: The CID of the blob to be tracked. | ||
// | ||
// No errors are expected during normal operation. | ||
func (s *ExecutionDataTracker) trackBlob(tx *badger.Txn, blockHeight uint64, c cid.Cid) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we implement this without using badger transaction, instead using batch updates. This allows it easy to switch to pebble implementation which also uses batch updates.
|
||
// iterate over blob records, calling pruneCallback for any CIDs that should be pruned | ||
// and cleaning up the corresponding tracker records | ||
for it.Seek(blobRecordPrefix); it.ValidForPrefix(blobRecordPrefix); it.Next() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we reuse badger's traverse function to implement?
It's better that we abstract the lowlevel database operation, it would make it easy to switch to pebble.
return nil | ||
} | ||
|
||
err = operation.UpsertTrackerLatestHeight(c, blockHeight)(tx) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need to keep track of the latest height of each cid?
Closes: #6260
Context
In this pull request:
ExecutionDataTracker
DB code into commonExecutionDataTracker
interface.badger
implementation.pebble
version of the storage object.