diff --git a/ships/0043-multi-arch-image-builds.md b/ships/0043-multi-arch-image-builds.md new file mode 100644 index 0000000..1dc20ba --- /dev/null +++ b/ships/0043-multi-arch-image-builds.md @@ -0,0 +1,374 @@ + + +--- +title: multi-arch-image-builds +authors: + - "@adambkaplan" +reviewers: + - TBD +approvers: + - TBD +creation-date: 2025-05-21 +last-updated: 2025-08-19 +status: implementable +see-also: + - "ships/0039-build-scheduler-opts.md" +replaces: + - https://github.com/shipwright-io/community/pull/275 +superseded-by: [] +--- + +# SHIP-0043: Multi-arch Image Builds + +## Release Signoff Checklist + +- [x] Enhancement is `implementable` +- [x] Design details are appropriately documented from clear requirements +- [x] Test plan is defined +- [ ] Graduation criteria for dev preview, tech preview, GA +- [ ] User-facing documentation is created in [docs](/docs/) + +## Open Questions [optional] + +TBD + +## Summary + +This extends Shipwright to orchestrate multi architecture container image builds. It aims to solve +the following challenges: + +* Scheduling builds on native Kubernetes nodes for a given os + architecture. +* Providing a parameter to tools that support multi-arch builds through emulation or + cross-compilation. + +## Motivation + +### Background + +#### OCI Image Indexes + +The Open Container Initiative (OCI) provides the industry standards for container image +specifications and formats. It is the successor to Docker’s “v2” specification for container +images, and is designed to be backwards compatible. The specification includes an +[“image index” standard](https://github.com/opencontainers/image-spec/blob/main/image-index.md#image-index-property-descriptions) +for containers that can be run on multiple CPU and operating system architectures. This is +equivalent to the Docker v2 “manifest list,” and the two terms are used interchangeably. For +consistency in this proposal, “image index” will be used moving forward. + +#### Multi-Arch Worker Nodes + +Many Kubernetes distributions - starting with v1.30 and perhaps earlier - allow clusters to have +worker nodes with different OS and CPU architectures. Clusters expose the node OS and CPU +architecture through [default node labels](https://kubernetes.io/docs/reference/node/node-labels/). +Shipwright began accomodating these scenarios with +[SHIP-0039](https://github.com/shipwright-io/community/blob/main/ships/0039-build-scheduler-opts.md), +whose features were incrementally released in Builds v0.14 and v0.15. However, these features only +let developers create single images for a single architecture. Creating an image index for multiple +architectures requires significant orchestration effort outside of Shipwright. + +#### Multi-Arch Capabilities in Build Toolchains + +Many popular container build tools, such as `buildkit`, `buildah`, +[cloud native buildpacks](https://buildpacks.io/docs/for-app-developers/how-to/special-cases/build-for-arm/), +and `ko` support multi-arch builds through cross-compilation or qemu-style CPU emulation. These +tools often expose a `--platform` command line option to build the image with a different os + +architecture than the underlying host. The industry appears to have standardized on the `/` +naming conventions for “platform” (ex: `linux/amd64`). These are identical to the namings used for +Kubernetes node labels. + +Support for generating an OCI image index varies by tool. Some - like ko and buildah - do provide +support for creating image indexes. These typically require the build to run in the same process; +“fan out” support to run these builds in parallel is typically not supported or is more challenging +to set up in a containerized environment (ex: `podman farm` command). + +### Goals + +* Provide a mechanism for developers to request a multi-arch image build, or build for a specific + OS + CPU architecture. + +### Non-Goals + +* Generalized matrix builds for Shipwright. This is a capability provided by Tekton. +* Adding retries for failed builds. This is out of scope to simplify the design. Such a feature can + be considered in a follow-up enhancement. +* Scheduling builds on nodes with specialized hardware (ex: GPUs). This is already supported in + Shipwright through the scheduler options in v0.15. +* Management of container resources (CPU, memory) at the Build/BuildRun level. At present, these + can only be defined at the ClusterBuildStrategy/BuildStrategy level. See + [build#1894](https://github.com/shipwright-io/build/issues/1894). + +## Proposal + +### User Stories + +- As a developer, I want to build containers for x86 and ARM so I can share my app with my team + using different CPU architectures (Apple Silicon vs. Windows x86) +- As a cluster admin, I want multi-arch builds to be scheduled on native nodes if my Kubernetes + cluster has multiple CPU architecture worker nodes. +- As a platform engineer, I want to provide a standard way for my teams to run multi-arch container + builds. + +### Implementation Notes + +#### “Image Platform” Concept + +An image platform is the combination of operating system (“os”), CPU architecture (“arch”), and +other container image “platform” attributes as defined in the OCI [Image Index specification](https://github.com/opencontainers/image-spec/blob/main/image-index.md#image-index-property-descriptions). +Developers can specify the desired platform(s) for a container image build as a JSON/YAML object +with the following attributes: + +- `os`: operating system. Required +- `arch`: CPU architecture. Required + +The JSON/YAML representation is intended to future-proof the API for additional “features” defined +in the OCI image index spec, or other items that Shipwright can support at a later date (ex: cpu +arch variant, os.version). + +A shorter single-string format for “platform” is not allowed within the Kubernetes YAML. However, +it can be supported when invoked from the command line (see below). + +#### Multi-arch in `Build` and `BuildRun` objects + +The `Build` and `BuildRun` APIs will add a new `multiArch` JSON/YAML object to `spec.output`. This +object will contain the following fields: + +- `platforms`: list platforms to build, using the above "image platform" structure. Required. + +Below is an example multi-arch Linux image build for x86, ARM, Power, and Z: + +```yaml +apiVersion: shipwright.io/v1beta1 +kind: Build +spec: + ... + output: + image: + multiArch: + platforms: + - arch: amd64 + os: linux + - arch: s390x + os: linux + - arch: arm64 + os: linux + - arch: ppcle64 + os: linux +``` + +#### `BuildRun` Controller Reconciliation + +##### Validations + +The following validations should be run if `spec.output.mutliArch.platforms` is not empty. + +For each image platform referened in the `platforms` array, the `BuildRun` controller should +verify that at least one node with the respective `kubernetes.io/os` and `kubernetes.io/arch` label +value is present. If no such node exists, the controller should set the `BuildRun`'s status to +failed with an appropriate message and reason code. + +The controller should also check that `spec.nodeSelector` does not have any values set for +`kubernetes.io/os` or `kubernetes.io/arch` in the label matcher. If any value is present, the +controller should likewise set the `BuildRun`'s status to failed with an appropriate message and +error code. + +##### Tekton `PipelineRun` Generation + +If all checks above pass, the `BuildRun` controller will generate a Tekton `PipelineRun` to +execute the build. This will require significant refactoring of the existing codebase, which +currently generates a single `TaskRun` that is effectively “single-threaded.” + +![Multi-arch pipeline](assets/0043-multiarch-pipeline.png) + +The containers in the generated `PipelineRun` will be executed in three phases: + +**Phase 1: Obtain Source** + +The first phase will gather the source code. The mechanism for the generated `TaskRun` will vary +depending on the values in spec.source for the Build/BuildRun. + +Code from git will invoke the current Shipwright git clone process as a TaskRun with the following +containers: + +- The main “git clone” container that exists today. +- A second “image push” container, leveraging the existing Shipwright container that supports + “managed push”. This container will package the source code into an OCI artifact, which is then + pushed to the same registry as the output image. A tag suffix pattern (-src) will be used to + ensure the source code artifact is persisted on most image registries. This may require + significant enhancements to the current “image push” container. + +Code from “local source” will likewise invoke a TaskRun as above to receive source code from a +remote machine. It will push the source code to an OCI artifact, as above. + +Code from an OCI artifact will not invoke any TaskRun during this phase. The push of source code to +the image registry has already been completed. + +**Phase 2: Fan Out Builds** + +The generated `PipelineRun` will then define a set of Tekton `TaskRuns` that can be run in parallel +- one for each platform in `spec.output.multiArch.platforms`. Each `TaskRun` will set the appropriate +`nodeSelector` to schedule the build on a node with the appropriate `kubernetes.io/os` and +`kubernetes.io/arch` label. + +The TaskRun containers will do the following: + +- Pull the source code from the referenced OCI artifact. This will utilize existing Shipwright + logic for pulling source code from OCI artifacts. +- Execute the build per the referenced build strategy. +- Push the output container image to the image registry, with the `--` tag suffix. +- Publish the output container image digest as a `TaskRun` result value. + +All other fields used to control the build pod definition - such as resources and volumes - will be +inherited from the parent Build/BuildRun object as they are today. + +**Phase 3: Assemble Index Image** + +The last phase of the generated `PipelineRun` will create a `TaskRun` that assembles the OCI image +index, based on the results of the prior build `TaskRuns`. + +If any platform failed to build, the PipelineRun should fail and subsequently mark the BuildRun as +failed. + +#### CLI Enhancements + +The CLI will add the `--platform` flag to the Build and BuildRun oriented commands. These shall set +the respective value for `spec.output.multiArch`. The `--platform` option can be set multiple +times, and will accept platforms in their single-line `/` format. + +Example experience: + +```sh +shp build create sample-go --strategy=source-to-image \ + --output quay.io/adambkaplan/sample-go:v1 \ + --platform=linux/amd64 \ + --platform=linux/arm64 + +shp build run sample-go --platform linux/amd64 \ + --platform linux/arm64 \ + --platform linux/s390x +``` + +### Test Plan + +Testing will primarily be at the unit level, where the configuration of the Tekton `TaskRun` can be +verified. Current integration tests use KinD clusters, which can't feasibly mimic a Kubernetes +cluster with multiple architecture worker nodes. + +End-to-end testing is not feasible unless the project obtains access to a real Kubernetes cluster +with multiple CPU architecture worker nodes. + +### Release Criteria + +#### Removing a deprecated feature [if necessary] + +N/A + +#### Upgrade Strategy [if necessary] + +The new multiArch field in `Build` and `BuildRun` objects will be optional (fields within it will +be required). Current builds should work as expected. + +### Risks and Mitigations + +TBD + +> What are the risks of this proposal and how do we mitigate? Think broadly. For example, consider +> both security and how this will impact the larger Shipwright ecosystem. + +> How will security be reviewed and by whom? How will UX be reviewed and by whom? + +## Drawbacks + +### Verbose API for “Platform” + +Developers are used to a single string representation of “platform” - ex “linux/amd64”. This is +provided through the command line, but not in the YAML. + +Using a verbose API in the YAML allows us to future-proof builds with more complex manifest list +definitions. The OCI image index spec already has fields that are allowed to be defined on image +index entries which are excluded from this initial API for the sake of simplicity: + +- `variant` - some (but not all?) build tools support this. Ex: podman, buildah +- `os.version` - use is not really observed in the field today. +- `features` - this is a catch-all for future extensions to the oci image spec. This might be + relevant for AI workloads if a container image requires specific hardware to execute. + +### Limited Mechanism for Multi-arch Builds + +This proposal only supports multi-arch builds on nodes with the respective OS and CPU architecture. +There are other potential mechanisms for building multi-arch container images: + +- Use cross-compilation if the programming language/SDK supports it. +- Use emulation if the container build tool supports it. +- Execute builds on virtual machines with appropriate OS + CPU architecture. + + +## Alternatives + +### String Shorthand for “Platform” in YAML + +Many developers who do multi-arch builds with Podman or Buildkit are familiar with a single string +representation of “platform”. While convenient, this adds additional complexity to the storage and +serialization of data in Kubernetes. The shorthand also locks the API to a convention that may not +be universally understood by all build tools. + +Using the shorthand in the CLI provides this capability in spirit, and closer to where developers +directly interact with builds. + +### Use Kata Containers for Multi-Arch Builds + +Kata Containers and Confidential Containers support a deployment known as +["peer pods"](https://confidentialcontainers.org/docs/architecture/design-overview/#clouds-and-nesting), +where a remote virtual machine can be provisioned and managed in Kubernetes just like a pod. This +can _hypothetically_ be used to securely run builds on machines with different CPU architectures. +On Kubernetes, Kata peer pods are scheduled by specifying a [runtime class](https://kubernetes.io/docs/concepts/containers/runtime-class/) +for the pod. + +The Konflux CI project experimented with this approach for multi-arch container builds, with +[mixed results](https://groups.google.com/g/konflux/c/A0m0JWjYwnc/m/mUOSFkAsAwAJ?utm_medium=email&utm_source=footer). +The proof of concept proved that Tekton can schedule these build pods just fine, however there are +fundamental challenges scaling build workloads using dynamic virtual machines. Creating VMs and +required components for Kata containers is also challenging for a given CPU architecture. + +Incorporating runtime class features into any multi-arch build capability would add significant +complexity and may require APIs for cluster administrators (see below). + +### Control Multi-Arch Method with New APIs + +A previous version of this proposal introduced cluster-level APIs that determined how a multi-arch +build could execute. This would allow an administrator to specify that some OS/CPU architecture +builds could be run on natively on respective Kubernetes nodes, whereas others could use +alternative mechanisms for producing the image for a given OS + CPU arch. These were known as the +`ImagePlatformClass` APIs. + +This feature was primarily motivated by the Kata + Confidential containers approach above. Due to +the technical challenges identified above, it does not make sense for Shipwright to provide new +CRDs for this purpose. In addition, the many/many relationship between the `ImagePlatform` and +`ImagePlatformClass` CRDs may have been challenging to implement. This design can be revisited at a +later date, should the Kata + Confidential containers approach prove feasible and scalable. + +### Tekton Matrix Builds + +Tekton has support for matrixed tasks, and work is underway to improve this feature to support +node selectors and other pod template features. In theory the matrix can be used to run Shipwright +builds per architecture, especially if we release the Triggers project which provides a "Shipwright +build in Tekton pipeline" capability. + +Our mission is to provide a simplified, opinionated experience for building container images. +Multi-arch is a clear use case specific to container image builds. Tekton deliberately provides +general-purpose solutions that emphasize flexibility. Using the matrix tasks feature exposes +developers to significant amounts of complexity. + +## Infrastructure Needed [optional] + +None for unit tests. + +End to end testing may require a Kubernetes cluster with multiple CPU architecture worker nodes. + +## Implementation History + +- 2025-05-21: Initial Draft +- 2025-08-19: Revised draft with narrower scope diff --git a/ships/assets/0043-multiarch-pipeline.png b/ships/assets/0043-multiarch-pipeline.png new file mode 100644 index 0000000..bd84aed Binary files /dev/null and b/ships/assets/0043-multiarch-pipeline.png differ