Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Flip 298: Utilize Dynamic Protocol State for Version Beacon (coordinating upgrades of the Execution Stack) #296

Open
wants to merge 16 commits into
base: main
Choose a base branch
from
Open
File renamed without changes.
171 changes: 171 additions & 0 deletions protocol/20241031-execution-stack-versioning.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,171 @@
---
status: draft
flip: 296
AlexHentschel marked this conversation as resolved.
Show resolved Hide resolved
authors: Alex Hentschel ([email protected])
sponsor: Jordan Schalm ([email protected]), Yurii Oleksyshyn ([email protected])
updated: 2024-10-31
---


# [FLIP 296] Utilize Dynamic Protocol State for Version Beacon (coordinating upgrades of the Execution Stack)
AlexHentschel marked this conversation as resolved.
Show resolved Hide resolved

AlexHentschel marked this conversation as resolved.
Show resolved Hide resolved
## Objective
![Overview](20241031-execution-stack-versioning/Execution_Stack_Versioning_goal.png)

- Versioning the **Execution Stack** used by ENs, VNs, ANs

Outlook: ANs can decide on whose blocks’ execution states they can run scripts for across Execution HCUs


## Terminology

See blog post [[1](https://forum.flow.com/t/protocol-version-upgrade-mechanisms-discussion/5717)] for further details:

**Software Version** - The version identifier of a binary distribution of Flow Node software.
By convention, we use semver-ish tag in Git and Docker releases.

Software has bugs and is frequently incomplete (e.g. API returning ‘not yet implemented’).
The software version is a meaningful reference to describe what the software does in the real world.

However, we also desire a compact identifier [which we will call the ‘Component Version’] of how a Flow node *should* behave.

<img src='https://github.com/user-attachments/assets/b88f92ad-c230-417e-bf32-6c9c18e09d61' width='200'>

**Component Version:** version identifier for a component of the flow protocol. It references one specific behaviour of a sub-system (e.g. Execution Stack or HotStuff) of Flow, as prescribed by the protocol.

In the nutshell, for every block there is one and only one correct way of how to process that block, and how to evolve the execution state. For distributed BFT systems, we need this notion of ‘correct behaviour’, which is inherently implementation agnostic. We want to explicitly express that up to a certain view $v$, we want the protocol to behave in one way and for higher views differently.

### Relationships between **Software and Component Version**

- Conceptually, for every block, each component of Flow has one and only one component version.

- A software version can implement multiple Component Versions.
E.g. AN supporting script execution across HCU boundaries

❗Don’t couple the software version to the component version! We know there will be scenarios where we want one software to implement multiple
Component Versions and at that point, any one-to-one coupling of Software and Component Version will necessarily break. Instead, for each software
version, we conceptually have a _list_ of Component Versions that this software supports (even if that list only contains a single element most of the time).


### Reasons we want to move away from existing Version Beacon:

Current Version Beacon:
AlexHentschel marked this conversation as resolved.
Show resolved Hide resolved

1. requires that nodes have (potentially long) history (have seen version beacon service event, which is not guaranteed for nodes joining at epoch boundaries)

Better: each block specifies which component version is to be used for processing it

2. based on height and hence not usable for upgrading most consensus-related aspects (any many other protocol aspects).

Better: using View instead of height for triggering behaviour changes is generally applicable and more robust (view monotonously increases over time, while height might also decrease).


## Dynamic Protocol State already implements better mechanism

💡 In a nutshell, the Protocol State tracks information about each block, including a mechanism to transfer information from the Execution state to the Protocol State in a BFT manner.

- Flow’s Protocol State to tracks and and provides simple access to information about each blocks (such as epoch number, staking phase, staked nodes allowed to participate as of this block, nodes public keys, etc) 👉[code](https://github.com/onflow/flow-go/blob/3496c0f02d51602994d4fe60b32fcb00aab084f4/state/protocol/protocol_state.go#L91).
- The Protocol State now also tracks the Component Versions of the most critical consensus component (at the moment: its own version) 👉[code](https://github.com/onflow/flow-go/blob/3496c0f02d51602994d4fe60b32fcb00aab084f4/state/protocol/protocol_state.go#L100).

☑️ The Protocol State already tracks its own Component Version. You can take a look at these places in the code:
- Protocol State reports its own [version](https://github.com/onflow/flow-go/blob/3496c0f02d51602994d4fe60b32fcb00aab084f4/state/protocol/kvstore.go#L30-L43) as part of every block
- mechanism for [scheduling version upgrades (at future view)](https://github.com/onflow/flow-go/blob/a6b157ce2770be9356e1cf35d1b0fff63f5e4a76/state/protocol/protocol_state/kvstore/upgrade_statemachine.go#L78-L142) exists
- mechanism [enforcing that node supports and uses correct](https://github.com/onflow/flow-go/blob/a6b157ce2770be9356e1cf35d1b0fff63f5e4a76/state/protocol/protocol_state/state/protocol_state.go#L235-L248) version as specified by the protocol

# Roadmap: Dynamic Protocol State for coordinating Execution Stack upgrades (including Cadence changes)

Biggest change (and possibly only significant change):

- Dynamic Protocol State should ingest Version Beacon Service Event and track’s the Execution Stack’s Component Version

![Illustration of Process](20241031-execution-stack-versioning/Execution_Stack_Versioning_(2).png)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a lot going on in this diagram. Maybe add a description of what is important to this proposal.

Regarding this interface: maybe make this more abstract and explain it. I had a hard time understanding this core piece of the proposal. What is a "KVStoreReader"? What is a "ViewBasedActivator"? Maybe improve that naming and make it less based on current implementation details (Go/flow-go).

Maybe also make it clearer that the idea is that for each component, there will be two functions:

  • A function returning the current required version for this component
  • A function returning future/upcoming versioning for this component

It is not very clear that the example shows the versioning of the component named "ExectionStack".

turbolent marked this conversation as resolved.
Show resolved Hide resolved

Example:

![Overview](20241031-execution-stack-versioning/Execution_Stack_Versioning_(5).png)



## Semantic versioning
turbolent marked this conversation as resolved.
Show resolved Hide resolved

$$
\textnormal{Software Version :}\quad \underbrace{\texttt{major}\,.\,\texttt{minor}\,}_{\textnormal{Component Version}}.\,\texttt{patch}
$$

- Software with identical $\texttt{major}.\texttt{minor}$ is cross-compatible irrespective of patch version,
because they all implement the same specification (represented by the Component Version). Hence, the $\texttt{patch}$ version is not
part of the Component Version, because it does not influence the conceptual behaviour. Though, it represents implementation details,
so it is part of the Software Versio.
- In addition, we introduce compatibility requirement from semantic versioning:

$\textnormal{Component Version :} \quad \texttt{major}\,.\,\texttt{minor}$
Comment on lines +257 to +259
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I disagree with this mapping from semver to component versions.

In semver, only major version changes are breaking changes. All other changes must be backward-compatible.

MAJOR version when you make incompatible API changes
MINOR version when you add functionality in a backward compatible manner
PATCH version when you make backward compatible bug fixes

We need to coordinate a component version upgrade only when that component is upgraded in a backward-incompatible manner (major version increment in semver). Otherwise a rolling upgrade suffices.

We may choose to require coordination of a backward-compatible component version upgrade (minor version increment in semver). Maybe we want to coordinate the release of a feature at a specific time. But by doing so, we are turning a backward-compatible upgrade into a backward-incompatible upgrade. Which is fine, but now it is a major version upgrade in semver terms.

So, if some component is using semver to internally version itself, then only major version changes should correspond to component version increments.

Copy link
Member Author

@AlexHentschel AlexHentschel Nov 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to coordinate a component version upgrade only when that component is upgraded in a backward-incompatible manner (major version increment in semver). Otherwise a rolling upgrade suffices.

Maybe I missunderstand - added new section Discussion of Possible Versioning Schemes, which in part discusses your point if I understood it correctly.

In a nutshell, a pure feature addition is still something that needs to be coordinated, because we need to agree when the new feature becomes usable. Nevertheless, one can make an argument for differentiating between major breaking changes and pure feature adds I think. That's all that I want to say here.
Nevertheless, in the end, I agree with you that for many scenarios SemVer odes not make sense to me. I tried to explain that better in the new section Discussion of Possible Versioning Schemes. Please take a look. Curious about your thoughts.


- Protocol specifications with the same $\texttt{major}$ must be fully backwards compatible (in practise, mostly additive changes)


## Limitations

For the core protocol, changes are often not backwards compatible. Furthermore, maintaining backwards compatibility can cause
subtle edge cases and drives implementation complexity.

Lastly, the benefits from backwards compatibility are strong in case we want to process inputs of mixed versions over a prolonged period of time.
In contrast, the protocol abruptly switches at a specific view from one behaviour to another.


For Flow, controlling complexity and codifying compatibility is essential.
The **compatibility expectations, associated risks and additional complexity** of semantic versioning are not beneficial in *some* areas
of the core protocol. For example the **Component Version** of the Protocol State is specified by a **single integer** (👉[code](https://github.com/onflow/flow-go/blob/c1a1cc0e05f0d323ab2f83dd5d74d8ad486d451e/state/protocol/kvstore.go#L30-L43)).


**Component Version** guidelines

- Use semantic versioning for areas where
- benefit from backwards compatibility outweigh the implementation and complexity cost
- we want to maintain backwards compatability over a longer period of time and across multiple upgrades
jordanschalm marked this conversation as resolved.
Show resolved Hide resolved
- In areas where we can’t easily provide backwards compatibility (e.g. for security or BFT reasons), we should make this explicit by using a single-integer for the Component Version.

## Next Steps

Challenge: missing seed for starting engineering work:

- complex topic, spanning three areas of Flow [execution, Protocol, Data Availability]
- everyone is worried they are missing something, but we have limited time / priority to flesh out a holistic vision for versioning each and every aspect of the protocol
- keep talking about it, bigger picture remains hazy,
so ICs keep extending our existing but insufficiently general solution (existing Version Beacon, solely based on service events, where tracking and complying to service events is entirely left to the implementation)

***Approach*:**

**We work towards transitioning the _existing_ Version Beacon to use the Protocol State.**

- Decide now what convention we use for:
- Component Version format of Execution Stack? (e.g. $\texttt{major}.\texttt{minor}$ or single integer? Currently, the Version Beacon uses semver,
but Bastian thinks this might be unnecessary complex for Cadence)
- One Component Version for Cadence only?

or one Component Version for Cadence+FVM combined?

or two separate Component Versions (one for Cadence and one for FVM)
turbolent marked this conversation as resolved.
Show resolved Hide resolved

- Start by including Component Version for Execution Stack into Dynamic Protocol State

Breakdown of steps:
![Execution Stack Versioning (3).png](20241031-execution-stack-versioning/Execution_Stack_Versioning_(3).png)

turbolent marked this conversation as resolved.
Show resolved Hide resolved

## Further reading

- [[1](https://forum.flow.com/t/protocol-version-upgrade-mechanisms-discussion/5717)] Jordan’s [**Protocol Version Upgrade Mechanisms Discussion**](https://forum.flow.com/t/protocol-version-upgrade-mechanisms-discussion/5717) (flow forum post)
- [Core-Protocol WG Meeting on Versioning, May 23, 2024](https://github.com/onflow/Flow-Working-Groups/blob/main/core_protocol_working_group/meetings/2024-05-23_Versioning_sub-working-group.md)
- [*[Brainstorming] HCU-style upgrades for all node roles* ](https://www.notion.so/Brainstorming-HCU-style-upgrades-for-all-node-roles-b6b0ab084075432782cd0407b73479c7?pvs=21)

# Question and Answers:

### Do we want to track a Component Version for _every_ component of Flow?

**Answer**: For most components, we do _not_ track their version explicitly. Necessary updates are infrequent and not time sensitive, so that we
can just bundle all changes across many components and ship them all together as part of a major upgrade (aka Spork).

However, for very few components, upgrades are frequent and time sensitive (e.g. security fixes in Cadence), so that we cannot
wait for the major upgrade (aka Spork). In that case, we want to deploy the upgrades into the running network and need to specify
what should happen (i.e. the component version) and when that is going to change. Only for those components we want to track their component version
via the protocol state.
turbolent marked this conversation as resolved.
Show resolved Hide resolved
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.