-
Notifications
You must be signed in to change notification settings - Fork 46
Add host phase 1 flash hashes to inventory #8624
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great. Just some minor comments/questions.
hash: *p.hash, | ||
}, | ||
); | ||
bail_unless!( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this mean that if there is ever a duplicate we don't ever return inventory for this collection? Is this essentially an assert, in that we don't expect it to ever happen, but we also don't want to crash nexus if it does?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes and yes; we have a bunch of these through inventory insertion that are effectively "assert-but-don't-crash" that some internal invariant is satisfied.
|
||
for (baseboard_id, phase1) in phase1_hashes { | ||
let selection = nexus_db_schema::schema::hw_baseboard_id::table | ||
.select(( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this return fields that we pass in as part of the selection, but only filters on the hw_baseboard_table
for the given columns? I ask because, most of these columns don't exist on hw_baseboard_id
making this a touch confusing to me as a relative SQL noob. It's not making me a bigger fan of SQL ;)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah; FWIW this is also basically "copy and paste and change the columns" from where @davepacheco originally did this work; I think it'd be very straightforward in SQL but is pretty obscured by diesel. This turns into something like, this, where we fill in all the $column
values directly except the foreign key into hw_baseboard_id
, which we look up via SELECT
:
INSERT INTO inv_host_phase_1_flash_hash VALUES (
SELECT $collection_id, hw_baseboard_id.id, $time_collected, $source, $slot, $hash
FROM hw_baseboard_id
WHERE part_number = $part_number AND serial_number = $serial_number
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! Just a couple of minor nits
nexus/db-model/src/inventory.rs
Outdated
impl_enum_type!( | ||
HwHostPhase1SlotEnum: | ||
|
||
#[derive(Copy, Clone, Debug, AsExpression, FromSqlRow, PartialEq)] | ||
pub enum HwHostPhase1Slot; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thoughts on just calling it something like M2Slot
so the enum can be used everywhere we need something like this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry if this is rambly - thinking out loud. I definitely want to use something like this in other places (even on this PR!). I worry slightly about conflating two strongly-related-but-slightly-different things:
- The SP exposes exactly 2 host phase 1 slots.
- gimlet and cosmo have two M.2 drives, each of which is directly associated with one of the two host phase 1 slots
Things I have vague worry might change in the future:
- Will future sleds keep using the M.2 form factor? (This is just a naming concern, but it's why in sled-agent I'd started to use "internal disk" instead of "M.2". Maybe this is overthinking it.)
- The M.2 drives have multiple partitions, and even multiple boot partitions.
sled-agent
currently treats partition 0 as the boot partition and partitions 1, 2, and 3 as reserved, but that doesn't match reality: partition 1 is also a boot partition, andpilot
knows how to write to it. (We use this as a backup in manufacturing sometimes; if something gets screwed up in partition 0, we can still boot if there's a valid OS image in partition 1.)sled-agent
eventually needs to learn what's going on here (or we need to change the OS behavior); at the moment we could theoretically boot out of partition 1 instead of 0 and be very confused.
Point 2 is a really long-winded way of saying "I'm not sure M2Slot
is sufficient to represent the sled-agent side of this", and we might want to expand that to something that holds both the M2Slot and which partition it was. But maybe even still we keep using this enum for both phase 1 and "which disk is it"?
I think I'm coming around to "just call this M2Slot
" since that's what we use everywhere else. If we need to change it in the future, we'll have a better idea then of what to rename it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Renamed to HwM2Slot
in bdbd14d
Reconfigurator needs to know the current contents of the host phase 1 hash slots for OS upgrades; this change adds those hashes to inventory. It uses the new MGS endpoints landed in #8593.
This has the side effect of making inventory collections that use
sp-sim
noticeably slower (by a handful of seconds), becausesp-sim
tries to act like a real SP and take a moment to actually compute the hashes. This slowdown identified a handful of flaky tests that appear to be assuming#[nexus_test]
would guarantee there's been an inventory collection, but it doesn't - inventory just used to be fast enough that it happened to be true. To fix these, I added a helper method to wait for an inventory collection, and call it from the tests that shook out as flaky. I'm a little worried there are others that CI hasn't caught yet, but hopefully if so it'll be an easy fix for them when they pop up.As written this will conflict with #8613; I'd prefer to let that (and #8617 and #8618) land first, then I can update this PR to show the new fields in the inventory output. But I think that change will be small, and this can be reviewed as-is.