Skip to content

Conversation

meetrajvala
Copy link
Contributor

@meetrajvala meetrajvala commented Dec 23, 2024

Currently cos_gpu_installer image reference is being fetch at runtime using cos-extensions list command. This PR makes required changes to set it as part of build steps and stores the image reference value in file. It also updates the required logger references after latest pull from main branch.

@meetrajvala
Copy link
Contributor Author

/gcbrun

@meetrajvala
Copy link
Contributor Author

/gcbrun

Comment on lines +23 to +25
set_cos_gpu_installer_image_reference() {
cos-extensions list -- --gpu-installer >> "${CS_PATH}"/"${GPU_INSTALLER_IMAGE_REF}"
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm thinking about an alternative approach:
You're basically sharing a config to the launcher process, is it possible to pass it as a kernel cmd arg instead of writing to a file? This also makes gpu installation more measurable in some sense. Awaiting for @alexmwu and @jkl73 's inputs on this.

Copy link
Contributor

@jkl73 jkl73 Dec 31, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it possible to pass it as a kernel cmd arg instead

This means we determine the version of gpu installer during the image building time. This is my original thought as well, but I remember @meetrajvala said there was a reason to do this during the runtime, because the version may change? @meetrajvala could you elaborate more here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yawangwang Command line will have length constraint of 4096 chars, but currently we are at less than 50% (~1900 chars) so it should not be a big concern. But is it ok to add config required only for userspace program (and not for the customizing the kernel behavior) to kernel command line ? I agree that it would make the gpu driver installation bit measurable in some sense, though current implementation also addresses it by putting file under oem partition which will be sealed in the later step.

@jkl73 earlier we were discussing about hardcoding the installer version, which would not be very appropriate as we need to keep updating installer version when base COS image changes (via image-family flag). But now we are deriving it using the cos-extensions command (which would always return the specific supported installer image reference corresponds to current base COS image), so we are good.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have existing examples of adding userspace program configs to kernel command line (e.g., systemd, launcher), so it should be ok to add gpu installer config as well. Also the main purpose of storing files under oem partition is to encrypt/protect it using dm-crypt from tampering by the operator. IMO it doesn't provide a very explicit way of measurement.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The whole oem partition is sealed and measured, and its hash is part of the cmdline. So I think technically the gpu ref file is still measured and bind to the specific CS image

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for your info. My original thought was to provide an explicit way of measurement (the presence of gpu installer ref in kernel cmd), but as you pointed out, since the hash of oem partition is also measured into kernel cmd, I'm also okay with the current approach.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The goal here is to know exactly what installer we run for a given version of CS. Explicit measurement is not required, since we won't surface what version of the installer we are using to customers. It's mostly nice to have for debugging purposes, but I don't see a huge reason to change this implementation.

Copy link
Contributor

@alexmwu alexmwu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should also pin the GPU installer image digest per-image (similar to how you pin the ref in the OEM partition), since we need to verify the image hasn't changed since we don't own the installer repo. Then we should check the digest matches when we pull it down at boot time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants