-
Notifications
You must be signed in to change notification settings - Fork 48
[40/n] blueprint planner logic + sled agent code to honor mupdate overrides #8456
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: sunshowers/spr/main.wip-20n-blueprint-planner-logic-for-mupdate-overrides
Are you sure you want to change the base?
Conversation
Created using spr 1.3.6-beta.1
let mut sleds_with_override = BTreeSet::new(); | ||
for sled_id in self.input.all_sled_ids(SledFilter::InService) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thinking about this, I'm wondering what would happen here if a sled goes away in the middle of this process, disappearing from inventory. In that case the remove_mupdate_override field never gets cleared from the blueprint.
We would do:
- expunge the sled in the first blueprint
- when executed, the sled policy will be updated to Expunged in the planning input
- then next planning cycle it'll no longer be in the InService set
So I think we'll eventually converge -- it'll just take a couple cycles. (A TODO is to add a test for this.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tested this with the new inventory-hidden
and inventory-visible
subcommands in reconfigurator-cli.
let old_image_source = self.zones.set_zone_image_source( | ||
&zone_id, | ||
BlueprintZoneImageSource::InstallDataset, | ||
)?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
RFD 556 says:
Wherever the planner uses the target release, it is instead ignored if its generation number is not greater than min_release_generation (if set).
As discussed in Tuesday's watercooler it's a bit more complex than that -- what we want to do is to only use the install dataset on sleds that have been mupdated, since on other sleds the install dataset may be woefully out of date.
I think I want to make the claim that this code may actually be sufficient as it stands. I don't think we need to try and do any other redirects other than this one (which is admittedly edge-triggered), as long as we prevent new zones from being set up at all while the system is recovering from the mupdate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We decided to not proceed with adding new zones until the mupdate override has been completely cleared.
// override that was set in the above branch. We can remove the | ||
// override from the blueprint. | ||
self.set_remove_mupdate_override(None); | ||
// TODO: change zone sources from InstallDataset to Artifact |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure we'll need to do this here; the normal upgrade path should change zone sources already, right? (It just needs to not do that while a mupdate override is in place.)
Although maybe sled-agent should do something like "if I'm changing from install dataset with hash X to artifact with hash X, don't actually bounce the zone".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure we'll need to do this here; the normal upgrade path should change zone sources already, right? (It just needs to not do that while a mupdate override is in place.)
Good question -- we do this one zone at a time currently, and I guess this would be an opportunity to do a bulk replace. (But why not always do a level-triggered bulk replace?)
Although maybe sled-agent should do something like "if I'm changing from install dataset with hash X to artifact with hash X, don't actually bounce the zone".
Yeah, this is reasonable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Resolved in #8486, and TODO removed.
); | ||
} | ||
|
||
// TODO: Do the same for RoT/SP/host OS. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this will be as simple as:
- clear any
PendingMgsUpdate
s for this sled - change the host phase 2 in the
OmicronSledConfig
to the equivalent ofInstallDataset
for zones (this doesn't exist yet but will be coming soon)
What I'm less sure about is what happens if there are PendingMgsUpdate
s in the current target blueprint concurrently with a mupdate happening to that sled. Maybe wicket and Nexus end up dueling? If the mupdate completes and changes the contents of any of the target slots Nexus's prechecks should start failing, but if the mupdate happens to not change the target slots, maybe the prechecks still pass and Nexus starts trying to update it again as soon as it comes online?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't done this yet -- worth discussing in the watercooler tomorrow?
// If do_plan_mupdate_override returns Waiting, we don't plan *any* | ||
// additional steps until the system has recovered. | ||
self.do_plan_add()?; | ||
self.do_plan_decommission()?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we could still decommission things if there's a mupdate override in place? This only acts on sleds or disks that an operator has explicitly told us is gone, and is basically a followup to do_plan_expunge()
. (Maybe this step should be ordered before do_sled_add()
anyway? I don't think there are any dependencies between them...)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep -- done.
// generation table -- one of the invariants of the target | ||
// release generation is that it only moves forward. | ||
// | ||
// In this case we warn but set the value. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't seem right; we should probably bail out of planning entirely in this case, right? This seems like an "I don't know what's going on in the world" kind of thing that in a simpler system we'd assert on?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah -- done.
In #8456 we'll block the `do_plan_add` step on whether the system is currently recovering from a mupdate override. But there's no reason to block the `do_plan_decommission` step on that. This is easiest expressed by moving decommission to before add.
Created using spr 1.3.6-beta.1
) `HostPhase2DesiredContents` is analogous to `OmicronZoneImageSource`, but for OS images: either keep the current contents of the boot disk or set it to a specific artifact from the TUF repo depot. "Keep the current contents" should show up in three cases, just like `OmicronZoneImageSource::InstallDataset`: 1. It's the default value for deserializing, so we can load old configs that didn't have this value 2. RSS uses it (no TUF repo depot involved at this point) 3. The planner will use this variant as a part of removing a mupdate override (this work is still in PR itself: #8456 (comment))
Created using spr 1.3.6-beta.1
@jgallagher this is ready for you to look at again -- have added clearing the pending MGS update. I'm going to try and land this simultaneously with the sled-agent changes to clear the mupdate override, though, because by itself it will cause the planner to not do anything after the mupdate occurs. |
Created using spr 1.3.6-beta.1
Entry::Occupied(entry) => Some(Box::new(entry.remove())), | ||
}; | ||
|
||
// TODO: Do the same for host OS. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You were right - #8570 has now landed, so we can fixup the host OS phase 2 contents here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added host phase 2 logic.
} | ||
|
||
// Now we need to determine whether to also perform other actions like | ||
// updating or adding zones. We have to be careful here: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is an excellent comment; thanks!
Created using spr 1.3.6-beta.1
This PR implements logic within sled-agent to clear mupdate overrides. Includes tests, database storage, and displayers. This logic by itself does not introduce behavior changes, since the code to actually set this field is in #8456.
Created using spr 1.3.6-beta.1
Created using spr 1.3.6-beta.1
OmicronZoneImageSource::Artifact { hash } => { | ||
// TODO: implement mupdate override here. | ||
// | ||
// Search both artifact datasets. This iterator starts with the | ||
// dataset for the boot disk (if it exists), and then is followed | ||
// by all other disks. | ||
let search_paths = | ||
internal_disks.all_artifact_datasets().collect(); | ||
OmicronZoneFileSource { | ||
// TODO: with mupdate overrides, return InstallDataset here | ||
location: OmicronZoneImageLocation::Artifact { | ||
hash: Ok(*hash), | ||
}, | ||
file_source: ZoneImageFileSource { | ||
file_name: hash.to_string(), | ||
search_paths, | ||
}, | ||
match self.mupdate_override.boot_disk_override.as_ref() { | ||
Ok(Some(_)) => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This match block implements the logic to honor mupdate overrides.
Testing notestl;dr: it works! Ran this test on berlin:
Did a mupdate on berlin sled 14 ( Updated chicken switches:
Ran the blueprint planner, saw the expected results:
After enabling the blueprint, the mupdate_override.json disappeared from sled 14:
But the mupdate override was not cleared, as expected:
Uploaded the same TUF repo as the target release:
A new blueprint was generated:
And then a number of blueprints were generated afterwards. (Along the way we found that Sled Agent had a stale zone manifest cache due to the way rkadm works -- not an issue in production, but filed as https://github.com/oxidecomputer/rackletteadm/issues/70.) Disabled the planner:
Next, mupdated sled 14 to a different commit, then when sled-agent came up, ensured that the mupdate override was honored:
Two blueprints were generated, first:
Then, after sled-agent cleared the mupdate override, saw another blueprint:
Uploaded this repo, and then saw noop zone image switches:
|
@@ -68,6 +70,32 @@ impl ResolverStatusExt for ResolverStatus { | |||
) -> OmicronZoneFileSource { | |||
match image_source { | |||
OmicronZoneImageSource::InstallDataset => { | |||
match &self.mupdate_override.boot_disk_override { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably worth one more look at the new code here, particularly at this file.
Created using spr 1.3.6-beta.1
…8688) This PR prepares the sled-agent config reconciler to honor mupdate overrides. In particular, the logic that requires bouncing zones needs to be updated to consider zone image locations after mupdate overrides have been considered, not before. Doing this required some refactoring. The biggest one is that looking up zone image sources is no longer done through the image resolver and/or within sled-agent's `services.rs`, but rather by gathering and querying the resolver status within the config reconciler. This is a nice improvement overall, particularly because it means we grab the lock once per reconciler run rather than each time we need to look up a zone's image source. We do not actually honor the mupdate override yet, though all the pieces are now in place. We'll combine the PRs to honor the override and update the blueprint logic into one, within #8456.
// matching up the zone manifest with the target release to compute | ||
// the number of versions running at a given time), but that's a | ||
// non-trivial optimization that we should probably defer until we | ||
// see its necessity. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not proposing we put this in this PR, but just want to confirm: I think the thing we've been talking about where the planner shouldn't proceed with an upgrade until all zones have been converted back to non-InstallDataset
-sources could be a third condition here, right? In particular, step 2 of this comment uses the phrase "recovering from a MUPdate", and any zone still configured to run out of an install dataset means we're still in that state.
Do you think there's more involved (other than testing ofc) than adding a third check at this point?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I think you're right -- the paragraph starting at "There's some potential to relax this in the future" actually hints at this third condition, and we should make this explicit in the PR.
let previous = BlueprintHostPhase2DesiredSlots { | ||
slot_a: self.slot_a.value().clone(), | ||
slot_b: self.slot_b.value().clone(), | ||
}; | ||
self.slot_a.set_value(slot_a); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should ScalarEditor::set_value()
return the old value? (If it did, would that clean up this method a bit?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I'd considered that, I think the issue I ran into was that the value could be owned or borrowed so we'd have to return Cow<'a, T>
. But maybe that's okay.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ehh, I don't know if it's worth it. Up to you.
// Some other reason -- sled remains ineligible. | ||
} | ||
NoopConvertSledStatus::Eligible(eligible) => { | ||
// Transition to Eligible with the new override. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// Transition to Eligible with the new override. | |
// Transition to Ineligible with the new override. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, fixed.
} | ||
NoopConvertSledStatus::Eligible(eligible) => { | ||
// Transition to Eligible with the new override. | ||
let zones = mem::replace( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thoughts on mem::take()
instead of mem::replace()
? I can never decide whether it feels too magical.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this case, yeah, mem::take
makes sense -- we're replacing it immediately anyway.
// However, the blueprint's remove_mupdate_override remains in | ||
// place until all zones' image sources can be noop-converted to | ||
// Artifact. We do this to minimize the number of different | ||
// versions of software that exist. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this bit of the comment right? I think the TODO
down on line 918 means it's not true yet.
I'm not sure this is where I'd implement this bit (see my earlier comment about adding a third check to whether it's okay to proceed to subsequent steps).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed this comment. Also agreed that a third top-level check makes more sense.
@@ -64,14 +85,18 @@ set sled 98e6b7c2-2efa-41ca-b20a-0a4d61102fe6 mupdate override: unset -> 6123eac | |||
set sled 2b8f0cb3-0295-4b3c-bc58-4fe88b57112c mupdate override: unset -> error | |||
|
|||
|
|||
> # Also set its SP update, which will not be cleared. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be cleared if there's a mupdate override error? IIRC a mupdate override error means sled-agent won't launch any zones or write new host OS images; should Nexus keep trying to update its SP?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've thought of Sled Agent and SP as relatively independent domains, so I'm not sure. Discuss in today's update watercooler?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure. I think in practice this doesn't really matter: a mupdate and a Nexus-driven MGS update are going to interfere with each other without support involvement in some way anyway, and after performing a mupdate all of the prechecks that a pending MGS update performs will fail, so Nexus won't try to do anything.
However, since we wouldn't add a new pending MGS update until we recovered from the sled mupdate, it seems a little cleaner to clear any existing ones?
> # remove_mupdate_override to be unset. But no further planning steps will | ||
> # happen because the target release generation is not new enough. | ||
> # | ||
> # TODO: we want to block remove_mupdate_override unsets until the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same note as above; wondering if it's fine to keep this as-is and only change the planner's "can we proceed" check.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, reworded the TODO.
Start implementing the blueprint planner logic for mupdate overrides, as well as Sled Agent logic for honoring them.
There's still a lot of work to do:
Move zone image sources back to artifacts once we know they've been distributed to a sled. (How do we know they've been distributed to a sled?)Done in [24/n] [reconfigurator-planning] support no-op image source updates #8486.