-
Notifications
You must be signed in to change notification settings - Fork 205
Description
SP does not properly translate version in VersionedRotBootInfo message
Problem
The VersionedRotBootInfo message retrieves unattested boot and image state from the RoT using a versioned protocol for backward/forward compatibility. MGS uses one-based versions while RoT uses zero-based versions, requiring translation by the SP.
The SP has two related issues:
- No version translation: Passes MGS version requests directly to RoT without converting one-based to zero-based
- No version limiting: Doesn't clamp requests to versions the SP can handle, allowing RoT to return response variants the SP cannot deserialize
This causes no production issues currently since MGS only requests HIGHEST_KNOWN_VERSION
. However, adding a new RotBootInfo variant will cause deserialization failures during mixed-version deployments unless both issues are fixed.
Impact
Mixed-version deployments (master SP + new RoT) will experience failures:
HIGHEST_KNOWN_VALUE
requests return deserialization errors- Firmware updates and health checks affected
The proposed SP fix (fix-sp-rbi-translation) addresses both issues by properly translating and limiting version requests.
Testing
Note that in these tests, versions above and below the implemented versions are
used to test the corner conditions. Only the responses to requesting the
RotBootInfo::HIGHEST_KNOWN_VERSION
would impact a production environment.
In the case of the current MGS main branch, this value is '3'. When a new
variant is introduced, it will be incremented to '4'.
The tables below are from running tests between different versions.
- main branch of
management-gateway-service
(current production deployment) - master branch of Hubris (current production SP/RoT)
- fix is a Hubris branch that implements the proposed fix in the SP image (fix-sp-rbi-translation)
- bdl is an MGS or Hubris branch that has both the SP fix and a new
RotBootInfo
variant implemented (boot-decision-log)
NOTE: In these tables, only the MGS versioning is used. The actual RoT version is one less than the MGS version.
Current Implementation
MGS | SP | RoT | MGS Request Version | RoT Response/Error |
---|---|---|---|---|
main | master | master | HIGHEST_KNOWN_VALUE | V3 |
main | master | master | 0 | Error response from SP: update: RoT boot info version is not supported |
main | master | master | 1 | V1 |
main | master | master | 2 | V2 |
main | master | master | 3 | V3 |
main | master | master | 4 | V3 |
main | master | master | 5 | V3 |
This is our current implementation, no problems since V3 is the highest version available, but V1 should return an error.
From the RoT's point of view, there is no version zero (MGS V1).
** CRITICAL ISSUE: Adding new RoT response with current SP implementation**
During update, the RoT and SP will temporarily have mismatched versions.
Because the SP is not clamping the version to the SP's own highest known version, the response can be a variant that the SP cannot deserialize.
MGS | SP | RoT | MGS Request Version | RoT Response/Error |
---|---|---|---|---|
main | master | bdl | HIGHEST_KNOWN_VALUE | ❌ Error response from SP: sprot: failed to deserialize message |
main | master | bdl | 0 | Error response from SP: update: RoT boot info version is not supported |
main | master | bdl | 1 | V1 |
main | master | bdl | 2 | V2 |
main | master | bdl | 3 | ❌ Error response from SP: sprot: failed to deserialize message |
main | master | bdl | 4 | ❌ Error response from SP: sprot: failed to deserialize message |
main | master | bdl | 5 | ❌ Error response from SP: sprot: failed to deserialize message |
This is the critical case we want to avoid - The HIGHEST_KNOWN_VALUE request fails completely, which would break update mechanisms and health checks that rely on this call.
Current Implementation + Proposed SP Fix
With the proposed fix and only the SP updated, the deprecated RoT V1 variant is properly "not supported"
Here, the SP is refusing to map a requested V0 from MGS because there is no -1 in the
RoT's version: u8.
The serialization failures that occur without this fix are avoided.
The fixed SP version does not impact any production code but it's changed
behavior can be probed with faux-mgs ... rot-boot-info -v$N
MGS | SP | RoT | MGS Request Version | RoT Response/Error |
---|---|---|---|---|
main | fix | master | HIGHEST_KNOWN_VALUE | V3 |
main | fix | master | 0 | Error response from SP: unsupported request for this SP component |
main | fix | master | 1 | Error response from SP: update: RoT boot info version is not supported |
main | fix | master | 2 | V2 |
main | fix | master | 3 | V3 |
main | fix | master | 4 | V3 |
main | fix | master | 5 | V3 |
Current MGS + Proposed SP Fix + New RoT Response
A "fixed" SP that does not know about the new message variant and an RoT that
has the new variant, still works with the older MGS because MGS requests it's
highest known version (3) which results in a V3 response.
The SP is also clamping the version to V3, so a faux-mgs probe of V4 and V5 also
return V3.
MGS | SP | RoT | MGS Request Version | RoT Response/Error |
---|---|---|---|---|
main | fix | bdl | HIGHEST_KNOWN_VALUE | V3 |
main | fix | bdl | 0 | Error response from SP: unsupported request for this SP component |
main | fix | bdl | 1 | Error response from SP: update: RoT boot info version is not supported |
main | fix | bdl | 2 | V2 |
main | fix | bdl | 3 | V3 |
main | fix | bdl | 4 | V3 |
main | fix | bdl | 5 | V3 |
⚠️ MGS Compatibility Issue with New Response
If both the SP and RoT know about the new variant, then production code update
still works (returning a V3 response to MGS), but one can use faux-mgs
to
elicit a V4 response that MGS does not understand.
The RPC timeout occurs because MGS cannot deserialize the V4 response
and adopts a retry strategy that ultimately fails when the overall
timeout expires.
MGS | SP | RoT | MGS Request Version | RoT Response/Error |
---|---|---|---|---|
main | bdl | bdl | HIGHEST_KNOWN_VALUE | V3 |
main | bdl | bdl | 0 | Error response from SP: unsupported request for this SP component |
main | bdl | bdl | 1 | Error response from SP: update: RoT boot info version is not supported |
main | bdl | bdl | 2 | V2 |
main | bdl | bdl | 3 | V3 |
main | bdl | bdl | 4 | |
main | bdl | bdl | 5 |
New MGS + Proposed SP Fix + New RoT Response
In the eventual stable configuration where all versions have been updated, we
see that the expected responses are received from all queries.
With all components updated to support the new BootDecisionLog variant,
full V4 support is achieved. The SP's version clamping results in a V4 response
when V5 and higher requests are made.
MGS | SP | RoT | MGS Request Version | RoT Response/Error |
---|---|---|---|---|
bdl | bdl | bdl | HIGHEST_KNOWN_VALUE | V4 |
bdl | bdl | bdl | 0 | Error response from SP: unsupported request for this SP component |
bdl | bdl | bdl | 1 | Error response from SP: update: RoT boot info version is not supported |
bdl | bdl | bdl | 2 | V2 |
bdl | bdl | bdl | 3 | V3 |
bdl | bdl | bdl | 4 | V4 |
bdl | bdl | bdl | 5 | V4 |