Skip to content

Commit

Permalink
[resharding] Add sections for receipt handling (#578)
Browse files Browse the repository at this point in the history
This is complete except for section `Handling buffered receipts that
target parent shard` which is still being discussed.
  • Loading branch information
shreyan-gupta authored Nov 25, 2024
1 parent 910cf6c commit 8ab6927
Showing 1 changed file with 91 additions and 0 deletions.
91 changes: 91 additions & 0 deletions neps/nep-0568.md
Original file line number Diff line number Diff line change
Expand Up @@ -207,10 +207,101 @@ The change we propose is to move the initial state download point to one in the

### Delayed Receipt Handling

The delayed receipts queue contains all incoming receipts that could not be executed as part of a block due to resource constraints like compute cost, gas limits etc. The entries in the delayed receipt queue can belong to any of the accounts as part of the shard. During a resharding event, we ideally need to split the delayed receipts across both the child shards according to the associated account_id with the receipt.

The singleton trie key `DelayedReceiptIndices` holds the start_index and end_index associated with the delayed receipt entries for the shard. The trie key `DelayedReceipt { index }` contains the actual delayed receipt associated with some account_id. These are processed in a fifo queue order during chunk execution.

Note that the delayed receipt trie keys do not have the `account_id` prefix. In ReshardingV2, we followed the trivial solution of iterating through all the delayed receipt queue entries and assigning them to the appropriate child shard, however due to constraints on the state witness size limits and instant resharding, this approach is no longer feasible for ReshardingV3.

For ReshardingV3, we decided to handle the resharding by duplicating the entries of the delayed receipt queue across both the child shards. This is great from the perspective of state witness size and instant resharding as we only need to access the delayed receipt queue root entry in the trie, however it breaks the assumption that all delayed receipts in a shard belong to the accounts within that shard.

To resolve this, with the new protocol version, we changed the implementation of runtime to discard executing delayed receipts that don't belong to the account_id on that shard.

Note that no delayed receipts are lost during resharding as all receipts get executed exactly once based on which of the child shards does the associated account_id belong to.

### PromiseYield Receipt Handling

Promise Yield were introduced as part of NEP-519 to enable defer replying to caller function while response is being prepared. As part of Promise Yield implementation, it introduced three new trie keys, `PromiseYieldIndices`, `PromiseYieldTimeout` and `PromiseYieldReceipt`.

* `PromiseYieldIndices`: This is a singleton key that holds the start_index and end_index of the keys in `PromiseYieldTimeout`
* `PromiseYieldTimeout { index }`: Along with the receiver_id and data_id, this stores the expires_at block height till which we need to wait to receive a response.
* `PromiseYieldReceipt { receiver_id, data_id }`: This is the receipt created by the account.

An account can call the `promise_yield_create` host function that increments the `PromiseYieldIndices` along with adding a new entry into the `PromiseYieldTimeout` and `PromiseYieldReceipt`.

The `PromiseYieldTimeout` is sorted as per the time of creation and has an increasing value of expires_at block height. In the runtime, we iterate over all the expired receipts and create a blank receipt to resolve the entry in `PromiseYieldReceipt`.

The account can call the `promise_yield_resume` host function multiple times and if this is called before the expiry period, we use this to resolve the promise yield receipt. Note that the implementation allows for multiple resolution receipts to be created, including the expiry receipt, but only the first one is used for the actual resolution of the promise yield receipt.

We use this implementation quirk to facilitate resharding implementation. The resharding strategy for the three trie keys are:

* `PromiseYieldIndices`: Duplicate across both child shards.
* `PromiseYieldTimeout { index }`: Duplicate across both child shards.
* `PromiseYieldReceipt { receiver_id, data_id }`: Since this key has the account_id prefix, we can split the entries across both child shards based on the prefix.

After duplication of the `PromiseYieldIndices` and `PromiseYieldTimeout`, when the entries of `PromiseYieldTimeout` eventually get dequeued at the expiry height of the following happens:

* If the promise yield receipt associated with the dequeued entry IS NOT a part of the child trie, we create a timeout resolution receipt and it gets ignored.
* If the promise yield receipt associated with the dequeued entry IS part of the child trie, the promise yield implementation continues to work as expected.

This means we don't have to make any special changes in the runtime for handling resharding of promise yield receipts.

### Buffered Receipt Handling

Buffered Receipts were introduced as part of NEP-539, cross-shard congestion control. As part of the implementation, it introduced two new trie keys, `BufferedReceiptIndices` and `BufferedReceipt`.

* `BufferedReceiptIndices`: This is a singleton key that holds the start_index and end_index of the keys in `BufferedReceipt` for each shard_id.
* `BufferedReceipt { receiving_shard, index }`: This holds the actual buffered receipt that needs to be sent to the receiving_shard.

Note that the targets of the buffered receipts belong to external shards and during a resharding event, we would need to handle both, the set of buffered receipts in the parent shard, as well as the set of buffered receipts in other shards that target the parent shard.

#### Handling buffered receipts in parent shard

Since buffered receipts target external shards, it is fine to assign buffered receipts to either of the child shards. For simplicity, we assign all the buffered receipts to the child shard with the lower index, i.e. copy `BufferedReceiptIndices` and `BufferedReceipt` to the child shard with lower index while keeping `BufferedReceiptIndices` as empty for child shard with higher index.

#### Handling buffered receipts that target parent shard

TODO(shreyan)

### Congestion Control

Along with having buffered receipts, each chunk also publishes a CongestionInfo to the chunk header that has information about the congestion of the shard as of processing block.

```rust
pub struct CongestionInfoV1 {
/// Sum of gas in currently delayed receipts.
pub delayed_receipts_gas: u128,
/// Sum of gas in currently buffered receipts.
pub buffered_receipts_gas: u128,
/// Size of borsh serialized receipts stored in state because they
/// were delayed, buffered, postponed, or yielded.
pub receipt_bytes: u64,
/// If fully congested, only this shard can forward receipts.
pub allowed_shard: u16,
}
```

After a resharding event, we need to properly initialize the congestion info for the child shards. Here's how we handle each of the fields

#### delayed_receipts_gas

Since the resharding strategy for delayed receipts is to duplicate them across both the child shards, we simply copy the value of `delayed_receipts_gas` across both shards.

#### buffered_receipts_gas

Since the resharding strategy for buffered receipts is to assign all the buffered receipts to the lower index child, we copy the `buffered_receipts_gas` from parent to lower index child and set `buffered_receipts_gas` to zero for upper index child.

#### receipt_bytes

This field is harder to deal with as it contains the information from both delayed receipts and buffered receipts. To calculate this field properly, we would need the distribution of the receipt_bytes across both delayed receipts and buffered receipts. The current solution is to start storing metadata about the total `receipt_bytes` for buffered receipts in the trie. This way we have the following:

* For child with lower index, receipt_bytes is the sum of both delayed receipts bytes and congestion control bytes, hence `receipt_bytes = parent.receipt_bytes`
* For child with upper index, receipt_bytes is just the bytes from delayed receipts, hence `receipt_bytes = parent.receipt_bytes - buffered_receipt_bytes`

#### allowed_shard

This field is calculated by a round-robin mechanism which can be independently calculated for both the child shards. Since we are changing the [ShardId semantics](#shardid-semantics), we need to change implementation to use `ShardIndex` instead of `ShardID` which is just an assignment for each shard_id to the contiguous index `[0, num_shards)`.

### ShardId Semantics

Currently, shard IDs are represented as numbers within the range `[0,n)`, where n is the total number of shards. These shard IDs are sorted in the same order as the account ID ranges assigned to them. While this approach is straightforward, it complicates resharding operations, particularly when splitting a shard in the middle of the range. Such a split requires reindexing all subsequent shards with higher IDs, adding complexity to the process.
Expand Down

0 comments on commit 8ab6927

Please sign in to comment.