Description
We might have some potential to slim down product images. This can reduce build time, image size and attack surface. For example, the Hive Dockerfile has a comment about Hadoop:
Line 102 in 1965d50
Now that we build from source, it might be worth digging into the build processes to:
a) Limit which components we build. It doesn't make sense to build stuff that's never copied to the final image.
b) Revalidate if all the components that are copied into the final image are really needed in production. With Hive, for example, we switched the build to only build the metastore, which significantly reduced the attack surface. Some products consist of multiple components and plugins, which might not all be needed to run the platform.
c) While we're at it, try to generate an SBOM for each component that is copied into the final image (next to the component itself). For most components that should already be the case, see #814
We want to focus on products that are mostly affected by vulnerabilities right now:
- Trino
- Hive
- HBase
Acceptance criteria:
- Document what could be removed and the impacts of the removal
- Document what can't be removed and why it can't be removed