Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Async Summarization (DDS) #7615

Closed
3 tasks done
DLehenbauer opened this issue Sep 28, 2021 · 9 comments
Closed
3 tasks done

Async Summarization (DDS) #7615

DLehenbauer opened this issue Sep 28, 2021 · 9 comments
Assignees
Labels
epic An issue that is a parent to several other issues feature-request New feature or requests kr perf
Milestone

Comments

@DLehenbauer
Copy link
Contributor

DLehenbauer commented Sep 28, 2021

Background

Summarization involves capture the container's state at a specific instant in time. Normally, this process runs on a special 'read-only' summarization client, which ensures that the container's state is not modified while the snapshot is in processes.

However, when a container is initially attached to the service, summarization happens on the user's client. To avoid the possibility of the application mutating the container's state while this capture is in progress, we synchronously invoke 'snapshotCore' on the hierarchy of DataStores and DDSes without yielding the current JavaScript turn.

Motivations

Yielding the UI thread

Synchronously invoking 'snapshotCore' on the DataStore/DDS hierarchy presents some challenges to customers with moderate to large containers. For these customers, summarization is a large synchronous task that blocks the UI thread, resulting in noticeable UI glitches with the majority of the time spent in 'JSON.stringify()'.

Summary blob reuse

Another challenge is that customers use our 'uploadBlob()' API to move portions of their summary data outside of the ITree returned by 'snapshotCore'. The primary motivations for using 'uploadBlob()' are the desire to reuse chunks from previous summaries and to choose their own serialization format.

While feasible today, storing summary data via 'uploadBlob()' is complex blobs must finish uploading in advance of the summary request in order to be able to reference them via their IFluidHandle.

Data Migration

The third motivation is the ability to migrate data to a new summary format without downloading it all into the client's memory at one time. To do this, the client must be able to asynchronously download, covert, and upload chunks of the summary.

Design Options

SELECTED OPTION: Separate attachment from summarization

One solution is to decouple the attachment scenario from the summarization scenario by offering two API entry points:

  • 'attachCore(): ITree' is synchronous and invoked once on a user's client when the container is first uploaded to the service.
  • 'summarizeCore(): ITree' is asynchronous and periodically invoked on the read-only summarization client.

Both APIs would perform essentially the same work, with the only difference being that summarizeCore() would give the DDS implementor an the option of yielding. Many DDSes will likely implement summarizeCore() by invoking attachCore().

Separate capture from serialization/upload

Another way to split the API surface is to have separate methods for capturing the current snapshot vs. serializing/uploading the snapshot:

  • 'captureCore(): TState' synchronously returns a TState (the DDS specifies the TState type)
  • 'summarizeCore(TState): ISummaryTreeWithStats' asynchronously processes TState to produce an ISummaryTreeWithStats.

DDSes may choose to synchronously summarization in the first method (i.e., TState = ISummaryTreeWithStats), thereby making 'summarizeCore()' a no-op. However, DDSes built on persistent data structures may chose to return their current immutable state from 'captureCore()' and process it in 'summarizeCore()'.

The Fluid runtime must:

  • Synchronously walk the graph of DS/DDSes, collecting all the TState results.
  • Asynchronously invoke 'summarizeCore' for each TState, collecting the Promise results.
  • Await the join of all the promises and produce the final ISummaryTree.

Same as above, but in a single method

This is identical to the above option, except that instead of having the Fluid runtime collect the TSnapshot instance to pass them to the summarizeCore, we combine 'snapshotCore()' and 'summarizeCore()' into a single method:

async summarizeCore(): ITree {
    // DDS must capture it's current current state before yielding
    const snapshot: TSnapshot = this.getSnapshot();

    // Once state is captured, 'summarizeCore()' may yield.
    const blob = await this.runtime.uploadBlob(...);

    return { ... };
}

Again, the runtime must synchronously walk the graph of DS/DDSes, invoking 'summarizeCore()'. However, this time we skip the first step (passing TSnapshot from one method to another), and instead gather promises in the first pass which the runtime awaits and reassembles.

(See also #6571)

Work items

@DLehenbauer DLehenbauer self-assigned this Sep 28, 2021
@anthony-murphy anthony-murphy added this to the October 2021 milestone Oct 4, 2021
@DLehenbauer DLehenbauer changed the title Async Summarization (Placeholder) Async Summarization Oct 18, 2021
@vladsud
Copy link
Contributor

vladsud commented Oct 19, 2021

I do not like the last approach, as it's very easy to make a mistake (insert an await somewhere, maybe indirectly by refactoring code) and break timing of operations. So I think it should be explicit.

I believe second approach (snapshotCore / summarizeCore) is preferred one, but it it can't be implemented while dds/data store attachment process (end-to-end) has to be synchronous (as it is today, as it's part of something synchronous, like setting a key on a map) - making it support asynchrony is possible (via reserving a spot in pending manager and pausing all future ops from being sent), but I'm not sure about cost / if this is right approach (though I'll point out that blob upload has some of the same problems - see #7833). My hope is that with the work @andre4i is doing in the space of named component creation, we could eventually get rid of this requirement (of attachment process having same shape as it has today), but the solution is likely 2 months out.

We could try to meet half way there, i.e. separate the process of object creation and naming even in today's system (without waiting for more complex work that Andrei is doing), but given this is not back-compat change, we will need new behavior to ship dark and take couple versions to saturate. If Whiteboard is Ok waiting, that would be my preference, as it helps move this item in right directly while helping Andrei make progress on his side as well.

If this is not acceptable, then we need to go (in the meantime) with first approach (two independent APIs - one async & one sync).

And to capture requirements for 2nd solution (snapshotCore / summarizeCore), for future - we should scope it to one of the following cases:

  1. DDS captures in ITree in snapshotCore and has summarizeCore be essentially an identity function (i.e. TSnapshot is essentially ITree).
  2. DDS has more complex logic, but guarantees that it can correctly generate snapshot at the moment in time when snapshotCore() was called even in presence of remote ops and local changes in between snapshotCore and summarizeCore.

@DLehenbauer
Copy link
Contributor Author

We've decided on the approach where we separate capturing the current DDS state from the process of serializing and uploading it.

Background

A Fluid container holds the root of a graph of IChannel instances (DataStores have an IChannel, DDSes are an IChannel). During summarization, the container runtime visits the graph of IChannel instances, invoking 'IChannel.summarize()' on each, and aggregating the returned ISummaryTrees. The final composite ISummaryTree is then serialized and uploaded to the service.

Attachment works similarly, except that it begins with the 'bindFluidDataStore()' entry point, which invokes 'generateAttachMessage()' on each DataStore, eventually bottoming out in the same 'summarize()' method.

Work Required

The required work for this item is to break the graph visitation into two passes. The first pass visits the IChannels, instructing the channel to capture whatever state it will need in order to summarize itself. Because attachment occurs on the user's client (vs. the designated summarization client), this pass must remain synchronous to avoid local mutations while the snapshot is being captured.

The second pass revisits the same graph of IChannels, requesting that they produce an ISummaryTree from their previously captured state. This second pass is asynchronous, which allows allows the DDS implementer to upload or download blobs, and/or periodically yield.

What the DDS captures during the first pass is up to the DDS. DDSes using functional/persistent/immutable data structures may perform a cheap clone during this phase, while other DDSes may continue to synchronously produce an ITree in the first pass and simply return it in the second pass.

Note: Earlier in the thread, I suggested a couple of potential APIs for exposing the two phases to the DDS, but leave the final API shape to the implementer. :-)

@scarlettjlee scarlettjlee changed the title Async Summarization Async Summarization (October) Oct 21, 2021
@scarlettjlee scarlettjlee added the epic An issue that is a parent to several other issues label Oct 22, 2021
@scarlettjlee scarlettjlee changed the title Async Summarization (October) Async Summarization Oct 22, 2021
@scarlettjlee scarlettjlee removed this from the October 2021 milestone Oct 22, 2021
@vladsud
Copy link
Contributor

vladsud commented Oct 24, 2021

I think we should have an item to apply this design to ContainerRuntime, i.e. how it summarizes data stores themselves.
This will ensure that system is symmetrical, and that it does not become another road block on the path to #3469.

That said, this can be done as the very last item, as it will not block the remaining of the work and ability of clients like Whiteboard to leverage async DDS summarization flow.

@nmsimons nmsimons added this to the November 2021 milestone Oct 27, 2021
@nmsimons nmsimons added the kr label Oct 27, 2021
@curtisman curtisman added feature-request New feature or requests perf and removed feature-request New feature or requests labels Nov 1, 2021
@anthony-murphy anthony-murphy added the design-required This issue requires design thought label Nov 2, 2021
@scarlettjlee
Copy link
Contributor

@pleath @anthony-murphy @vladsud, please review if interested.

@vladsud
Copy link
Contributor

vladsud commented Nov 2, 2021

What to review? Daniel's latest post? I've replied after it.

@scarlettjlee
Copy link
Contributor

What to review? Daniel's latest post? I've replied after it.

If you're done, that's good. Thanks.

@anthony-murphy
Copy link
Contributor

Looks good to me. Might want @agarwal-navin to evaluate GC impact, which i think already does something similar to the two pass summary. wonder if there is the potential for consolidation here

@agarwal-navin
Copy link
Contributor

This same problem of serializing the data during summaries applies when getting the GC data (getGCData API) as well right? That also involves mostly similar serialization as in summaries. Although that can be changed to only serialize handles in data but that's not what happens today.
Do we need to do this 2 pass approach for getGCData too?

@scarlettjlee
Copy link
Contributor

scarlettjlee commented Nov 4, 2021

From discussion with Navin and Daniel, there are several ways to deal with GC:

  1. Keep it functioning the same way. If the old snapshotCore goes away, it can call the two new methods (although it would become async). DDSs can override getGCDataCore if they need a more performant implementation.
  2. Break the GC methods up in two passes the same way as summarize.
  3. Change default getGCDataCore to not call summarize, instead create new method that will walk tree to collect handles (similar to FluidSerializer.replaceHandles) Update getGCData in SharedObject / DDS to not use full summarize #4547
  4. Consolidate summarizing + GC since they happen together. <- I don't have a good sense for this but it seems like a huge change and maybe not a good idea.

@scarlettjlee scarlettjlee changed the title Async Summarization Async Summarization (DDS) Nov 15, 2021
@anthony-murphy anthony-murphy removed the design-required This issue requires design thought label Nov 15, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
epic An issue that is a parent to several other issues feature-request New feature or requests kr perf
Projects
None yet
Development

No branches or pull requests

7 participants