Skip to content

supervisor fma: add FM3b #278

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
23 changes: 20 additions & 3 deletions security/fma-supervisor.md
Original file line number Diff line number Diff line change
Expand Up @@ -157,8 +157,7 @@ And all Failure Modes are some subtype. Incorrect responses are much worse than
- Once paused, unpause should be manual to start with.
- Optionally: Implement a matching Batcher pause to avoid publishing invalid blocks in batches where avoidable.

## FM3: The Supervisor Issues a Reset to the Sequencer

## FM3a: The Supervisor Issues a Reset to the Sequencer
- Description
- The Supervisor manages Nodes in Managed Mode, meaning they listen to the Supervisor for signals for what derivation activities to take next.
- One such activity is to reset the node to specific heads.
Expand All @@ -175,8 +174,26 @@ And all Failure Modes are some subtype. Incorrect responses are much worse than
- With respect to the Node, an arbitrary amount of re-sync may be required.
- Recovery to the network depends entirely on the impact of the node outage.

## FM4a: Managed Nodes are at Different Heights
## FM3b: The Supervisor Issues a *repeating* Reset to the Sequencer
- Description:
- Similar to FM3a, suppose the Supervisor is lacking some knowledge (input data or protocol rules) which cause it to consider the Node
inconsistent with the Supervisor Database.
- Because the Supervisor Database is the default source of truth, a reset is issues to the Node, pushing it back.
- Because the Node is behaving correctly, the derivation behaves identically to how it had previously.
- The Supervisor continues to reset the Node, and the two loop indefinitely.
- Risk Assessment
- High Impact, High Likelihood.
- If this happens to a Sequencer, the network will be effectively stalled, since no new blocks can come out.
- Situations like this have been discovered organically, for example the Sequencing Window Lapse will create this loop scenario,
because the Supervisor is unable to understand the legitimate source of the unsafe data inconsistency.
- Mitigations
- These failures represent gaps in our modeling of derivation between the Supervisor and Sequencer. Additional validation exercises
(threat modeling, testing) should be done around edge-behaviors of the protocol.
- Recovery
- When this issue occurs on a network, the `admin_rewind` RPC command may come in useful. This will allow the operator to prune the old
data from the Supervisor, which can allow the Node to correctly advance and inform the Supervisor.

## FM4a: Managed Nodes are at Different Heights
- Description
- The Supervisor is managing one Managed Node per chain, and using the data reported from their derivation to calculate cross-safety.
- While the Node for Chain A has derived to some height (L1 block 100), the Node for Chain B is still processing earlier L1 blocks (L1 block 90).
Expand Down