Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
21 changes: 0 additions & 21 deletions .devcontainer/devcontainer.json

This file was deleted.

3 changes: 0 additions & 3 deletions .devcontainer/devcontainer.json.license

This file was deleted.

10 changes: 0 additions & 10 deletions .devcontainer/post_create.sh

This file was deleted.

15 changes: 0 additions & 15 deletions .dockerignore

This file was deleted.

1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@
.venv
__pycache__
sbom.spdx.json
sbom.used_files.txt
linux_cmd
linux
linux.*
1 change: 0 additions & 1 deletion .vscode/launch.json
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,6 @@
"--src-tree", "../linux",
"--output-tree", "../linux/kernel-build",
"--root-output-in-tree", "vmlinux",
"--output", "sbom.spdx.json",
"--debug"
]
}
Expand Down
58 changes: 33 additions & 25 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,26 +10,38 @@ A script to generate an SPDX-format Software Bill of Materials (SBOM) for the `v
The eventual goal is to integrate the `sbom/` directory into the `linux/scripts/` directory in the official [linux](https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/) kernel source tree.

## Getting Started

To test the script install [Docker](https://docs.docker.com/engine/install/ubuntu/#installation-methods) and run:
```bash
docker compose up
```
This will:
- Build a Docker image based on the included [Dockerfile](./Dockerfile).
- Clone the Linux kernel repository during the image build.
- Compile the kernel out-of-tree into `linux/kernel_build`.
- Start a container with this repository mounted as volume.
- Run the [sbom.py](sbom/sbom.py) script inside the container:
```bash
python3 sbom/sbom.py \
--src-tree ../linux \
--output-tree ../linux/kernel_build \
--root-output-in-tree vmlinux \
--output sbom.spdx.json
```
- Starting from `vmlinux` the script builds the **cmd graph**, a directed acyclic graph (DAG) where nodes are filenames and edges represent build dependencies extracted from `.<filename>.cmd` files.
- Based on the cmd graph, the final `sbom.spdx.json` file is created and saved in this repository’s root directory.
1. Clone the repository
2. Activate the venv and install build dependencies
```bash
python3 -m venv .venv
source .venv/bin/activate
pip install pre-commit reuse ruff
pre-commit install
```
3. provide a linux src and output tree, e.g., by downloading precompiled testdata from [KernelSbom-TestData](https://fileshare.tngtech.com/library/98e7e6f8-bffe-4a55-a8d2-817d4f3e51e8/KernelSbom-TestData/)
```bash
test_archive="linux-defconfig.tar.gz"
curl -L -o "$test_archive" "https://fileshare.tngtech.com/d/e69946da808b41f88047/files/?p=%2F$test_archive&dl=1"
tar -xzf "$test_archive"
rm "$test_archive"
```
or cloning the [linux](https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git) repo and building your own config
```bash
git clone --depth 1 https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
cd linux
make <config> O=kernel_build
make -j$(nproc) O=kernel_build
```
4. Run the [sbom.py](sbom/sbom.py) script
```bash
python3 sbom/sbom.py \
--src-tree linux \
--output-tree linux/kernel_build \
--root-output-in-tree vmlinux \
--spdx sbom.spdx.json \
--used-files sbom.used_files.txt
```
Starting from `vmlinux` the script builds the **cmd graph**, a directed acyclic graph (DAG) where nodes are filenames and edges represent build dependencies extracted from `.<filename>.cmd` files. Based on the cmd graph, the final `sbom.spdx.json`, `sbom.used_files.txt` files are created and saved in this repository’s root directory.

## Directory Structure

Expand All @@ -40,14 +52,10 @@ This will:
- `sbom_analysis` - Additional scripts for analyzing the outputs produced by the main script.
- [sbom_analysis/cmd_graph_based_kernel_build](sbom_analysis/cmd_graph_based_kernel_build/README.md) - Validation of cmd graph completeness by rebuilding the linux kernel only with files referenced in the cmd graph.
- [sbom_analysis/cmd_graph_visualization](sbom_analysis/cmd_graph_visualization/README.md) - Interactive visualization of the cmd graph
- `testdata_generation` - Describes how the precompiled kernel builds in [KernelSbom-TestData](https://fileshare.tngtech.com/library/98e7e6f8-bffe-4a55-a8d2-817d4f3e51e8/KernelSbom-TestData/) were generated.

The main contribution is the content of the `sbom` directory which eventually should be moved into the `linux/scripts/` directory in the official [linux](https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/) kernel source tree.

## Development & Debugging

For development and debugging, install the [Dev Containers](https://marketplace.visualstudio.com/items?itemName=ms-vscode-remote.remote-containers) extension for [VSCode](https://code.visualstudio.com/). Then, open the Command Palette (F1) and select `Reopen in Dev Container`. This opens your project inside a development container based on the same Dockerfile used above.
Inside the devcontainer, you can run the provided [Python Debugger: sbom](./.vscode/launch.json) launch configuration to step through the script interactively.

## Reuse

when commiting `reuse lint` is executed as a pre-commit hook to check if all files have compliant License headers. If any file is missing a license header add it via
Expand Down
18 changes: 0 additions & 18 deletions docker-compose.yaml

This file was deleted.

7 changes: 7 additions & 0 deletions sbom/lib/sbom/cmd/cmd_graph.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@
from pathlib import Path
from dataclasses import dataclass, field
import pickle
from typing import Iterator

from .deps_parser import parse_deps
from .savedcmd_parser import parse_commands
Expand Down Expand Up @@ -96,3 +97,9 @@ def save_cmd_graph(node: CmdGraphNode, path: Path) -> None:
def load_cmd_graph(path: Path) -> CmdGraphNode:
with open(path, "rb") as f:
return pickle.load(f)


def iter_files_in_cmd_graph(cmd_graph: CmdGraphNode) -> Iterator[Path]:
yield cmd_graph.absolute_path
for child_node in cmd_graph.children:
yield from iter_files_in_cmd_graph(child_node)
1 change: 1 addition & 0 deletions sbom/lib/sbom/cmd/savedcmd_parser.py
Original file line number Diff line number Diff line change
Expand Up @@ -233,6 +233,7 @@ def _parse_sed_command(command: str) -> list[Path]:
(re.compile(r"(.*/)?genheaders\b"), _parse_genheaders_command),
(re.compile(r"^ld\b"), _parse_ld_command),
(re.compile(r"^sed\b"), _parse_sed_command),
(re.compile(r"^(.*/)?objtool\b"), _parse_noop),
]

# If Block pattern to match a simple, single-level if-then-fi block. Nested If blocks are not supported.
Expand Down
50 changes: 37 additions & 13 deletions sbom/sbom.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@
import os
from pathlib import Path
import lib.sbom.spdx as spdx
from lib.sbom.cmd.cmd_graph import CmdGraphNode, build_cmd_graph
from lib.sbom.cmd.cmd_graph import CmdGraphNode, build_cmd_graph, iter_files_in_cmd_graph
import time


Expand All @@ -23,7 +23,8 @@ class Args:
src_tree: str
output_tree: str
root_output_in_tree: str
output: str
spdx: str
used_files: str
debug: bool


Expand All @@ -46,7 +47,14 @@ def parse_args() -> Args:
help="Root build output path relative to --output-tree the SBOM will be based on (default: vmlinux)",
)
parser.add_argument(
"--output", default="sbom.spdx.json", help="Path where to create the SPDX document (default: sbom.spdx.json)"
"--spdx",
default="sbom.spdx.json",
help="Path to create the SPDX document, or 'none' to disable (default: sbom.spdx.json)",
)
parser.add_argument(
"--used-files",
default="sbom.used_files.txt",
help="Path to create the a flat list of all source files used for the kernel build, or 'none' to disable (default: sbom.used_files.txt)",
)
parser.add_argument("-d", "--debug", action="store_true", default=False, help="Debug level (default: False)")

Expand Down Expand Up @@ -117,28 +125,44 @@ def main():
"""Main program"""
# Parse cli arguments
args = parse_args()
src_tree = Path(os.path.realpath(args.src_tree))
output_tree = Path(os.path.realpath(args.output_tree))
root_output_in_tree = Path(args.root_output_in_tree)

# Configure logging
logging.basicConfig(level=logging.DEBUG if args.debug else logging.INFO, format="[%(levelname)s] %(message)s")

# Build cmd graph
logging.info(f"Building cmd graph for {args.root_output_in_tree}")
start_time = time.time()
cmd_graph = build_cmd_graph(
root_output_in_tree=Path(args.root_output_in_tree),
output_tree=Path(os.path.realpath(args.output_tree)),
src_tree=Path(os.path.realpath(args.src_tree)),
)
cmd_graph = build_cmd_graph(root_output_in_tree, output_tree, src_tree)
logging.info(f"Build cmd graph in {time.time() - start_time} seconds")

# Save used files
if args.used_files != "none":
logging.info("Extracting source files from cmd graph")
used_files = [
file_path.relative_to(src_tree)
for file_path in iter_files_in_cmd_graph(cmd_graph)
if file_path.is_relative_to(src_tree) and not file_path.is_relative_to(output_tree)
]
logging.info(f"Found {len(used_files)} source files in cmd graph.")
with open(args.used_files, "w", encoding="utf-8") as f:
f.write("\n".join(str(file_path) for file_path in used_files))
logging.info(f"Saved {args.used_files} successfully")

if args.spdx == "none":
return

# Fill SPDX Document
doc = create_spdx_document(cmd_graph)
logging.info("Generating SPDX Document based on cmd graph")
spdx_doc = create_spdx_document(cmd_graph)

# Save SPDX Document
json_string = doc.to_json()
with open(args.output, "w", encoding="utf-8") as f:
f.write(json_string)
logging.info(f"Saved {args.output} successfully")
spdx_json = spdx_doc.to_json()
with open(args.spdx, "w", encoding="utf-8") as f:
f.write(spdx_json)
logging.info(f"Saved {args.spdx} successfully")


# Call main method
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -55,7 +55,7 @@ def _remove_files(base_path: Path, patterns_to_remove: list[re.Pattern[str]], ig
if __name__ == "__main__":
script_path = Path(__file__).parent
# Paths to the original source and build directories
cmd_graph_path = script_path / "cmd_graph.pickle"
cmd_graph_path = script_path / "../cmd_graph.pickle"
src_tree = (script_path / "../../linux").resolve()
output_tree = (script_path / "../../linux/kernel_build").resolve()
root_output_in_tree = Path("vmlinux")
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -72,7 +72,7 @@ def traverse(node: CmdGraphNode, depth: int = 0):

if __name__ == "__main__":
script_path = Path(__file__).parent
cmd_graph_path = script_path / "cmd_graph.pickle"
cmd_graph_path = script_path / "../cmd_graph.pickle"
src_tree = (script_path / "../../linux").resolve()
output_tree = (script_path / "../../linux/kernel_build").resolve()
root_output_in_tree = Path("vmlinux")
Expand Down
9 changes: 6 additions & 3 deletions .devcontainer/Dockerfile → testdata_generation/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -3,11 +3,12 @@
# SPDX-License-Identifier: GPL-2.0-only

FROM ubuntu:22.04

RUN set -x \
&& apt-get -y update \
&& apt-get install -y --no-install-recommends \
build-essential linux-headers-generic bc \
flex bison python3 python3-pip python3-venv git libelf-dev libssl-dev gawk sudo \
build-essential linux-headers-generic bc zstd ca-certificates \
flex bison python3 git libelf-dev libssl-dev gawk sudo \
&& rm -rf /var/lib/apt/lists/*
WORKDIR /workspace

Expand All @@ -17,7 +18,9 @@ RUN set -x \
&& chown ubuntu:ubuntu /workspace
USER ubuntu

RUN git clone --depth 1 https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
ARG GIT_TAG=v6.17
RUN git clone --depth 1 --branch "$GIT_TAG" https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git

ARG CONFIG=defconfig
RUN set -x \
&& cd linux \
Expand Down
27 changes: 27 additions & 0 deletions testdata_generation/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
<!--
SPDX-FileCopyrightText: 2025 TNG Technology Consulting GmbH

SPDX-License-Identifier: GPL-2.0-only
-->

# Test Data Generation

This directory describes how the precompiled kernel builds in [KernelSbom-TestData](https://fileshare.tngtech.com/library/98e7e6f8-bffe-4a55-a8d2-817d4f3e51e8/KernelSbom-TestData/) were created.

Standard preconfigured kernel builds were obtained via:
- **linux-tinyconfig.tar.gz** `./extract_testdata.sh tinyconfig`
- **linux-defconfig.tar.gz** `./extract_testdata.sh defconfig`
- **linux-allmodconfig.tar.gz** `./extract_testdata.sh allmodconfig`

Additionally, distribution specific configs like **linux-localmodconfig.Ubuntu24.04.tar.gz** were created in different systems via:
```bash
git clone --depth 1 --branch v6.17 https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
cd linux
make localmodconfig O=kernel_build
make -j$(nproc) O=kernel_build
```

After building the kernel, the entire linux directory is archived and uploaded to the FileShare:
```bash
tar -czf linux-<config>.tar.gz linux
```