-
Notifications
You must be signed in to change notification settings - Fork 502
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ingest/pipeline: Create functional producer for BufferedStorageBackend #5412
Comments
I think there are some subtleties to this interface change that we should consider:
Given these potential issues, I think for the MVP we should avoid changing the LedgerBackend interface. In the future, as we see more uses of the ingestion library we can come up with some helper functions which will reduce boilerplate. |
publishing can return a channel to propagate completion status to the caller, an error is sent on channel and then closed or if no errors and finished publishing for requested range, then channel is closed.
yes, to avoid these re-entrancy problems with LegerBackend instance needed to drive publishing, I think can skip adding the notion of publishing on to
I think if we can provide this sdk mechanism up front for automating the streaming of ledger tx-meta it will be worthwhile for demonstrating the DX during the MVP timeframe as it lowers resistance for app development(DX) to adopt CDP approach of transforming network data to derived models in a pipeline. Apps avoid investing in boilerplate(ledgerbackend, GetLedger iteration, etc) and they get stellar tx-meta 'source of origin' operator(publisher) to use in their pipeline out-of-box. |
@chowbao , @tamirms @urvisavla are there any known BufferedStorage settings based on benchmarks that we feel good to provide as a default constant in the sdk, i.e. clients can use it for sanity check/reference and to quickly get moving initially, tuning later if they need:
could encapsulate functionally in sdk as |
Here's a summary of the recommended configuration for buffer size and number of workers based on my analysis:
You can find the detailed numbers and results here. One thing to note is that these tests were run on my local machine so actual times may vary depending on hardware but the relative config recommendation should remain same. As for retry_limit and retry_wait, these values aren't dependent on other parameters so imo a reasonable values of retry limit=3 to 5 and retry_wait=30s should be good. Let me know if you need any additional info. Thanks! |
…ew feedback on api best practice
What problem does your feature solve?
BufferedStorageBackend provides ledger close meta(LCM) individually per
GetLedger
, but there is no more efficient way to participate as a streaming producer of LCM's.What would you like to see?
Follow design proposal on Functional Processors Lib
Provide a ‘Producer’ function for BufferedStorage Backend.
The function will be used as ‘producer’ operator in a pipeline, emitting tx-meta LCM over a callback fn and acts as a closures to encapsulate private instance of BufferedStorageBackend and avoid any unintended side effects.
The method will return immediate, creating an async worker routine in background to continue processing.
Visualization of where the producer function fits in the larger CDP design for data transformation pipeline:
Relates to:
What alternatives are there?
new streaming ingestion app use cases would have to implement the same locally.
The text was updated successfully, but these errors were encountered: