diff --git a/.devcontainer/devcontainer.json b/.devcontainer/devcontainer.json deleted file mode 100644 index ef0cac3..0000000 --- a/.devcontainer/devcontainer.json +++ /dev/null @@ -1,21 +0,0 @@ -{ - "name": "Devcontainer", - "workspaceMount": "source=${localWorkspaceFolder},target=/workspace/LinuxKernelSbomGenerator,type=bind", - "workspaceFolder": "/workspace/LinuxKernelSbomGenerator", - "build": { - "context": ".", - "dockerfile": "./Dockerfile" - }, - "customizations": { - "vscode": { - "extensions": [ - "ms-python.python", - "charliermarsh.ruff" - ] - } - }, - "mounts": [ - "source=${env:HOME}/.gitconfig,target=/home/ubuntu/.gitconfig,type=bind,consistency=cached" - ], - "postCreateCommand": ".devcontainer/post_create.sh" -} diff --git a/.devcontainer/devcontainer.json.license b/.devcontainer/devcontainer.json.license deleted file mode 100644 index 54a5de0..0000000 --- a/.devcontainer/devcontainer.json.license +++ /dev/null @@ -1,3 +0,0 @@ -SPDX-FileCopyrightText: 2025 TNG Technology Consulting GmbH - -SPDX-License-Identifier: GPL-2.0-only diff --git a/.devcontainer/post_create.sh b/.devcontainer/post_create.sh deleted file mode 100755 index 4c69bed..0000000 --- a/.devcontainer/post_create.sh +++ /dev/null @@ -1,10 +0,0 @@ -# SPDX-FileCopyrightText: 2025 TNG Technology Consulting GmbH -# -# SPDX-License-Identifier: GPL-2.0-only - -ln -s ../linux linux - -python3 -m venv .venv -source .venv/bin/activate -pip install pre-commit reuse ruff -./.venv/bin/pre-commit install diff --git a/.dockerignore b/.dockerignore deleted file mode 100644 index 7be4b5d..0000000 --- a/.dockerignore +++ /dev/null @@ -1,15 +0,0 @@ -# SPDX-FileCopyrightText: 2025 TNG Technology Consulting GmbH -# -# SPDX-License-Identifier: GPL-2.0-only - -Dockerfile -docker-compose.yaml -sbom.spdx.json - -# from .gitignore -.venv -__pycache__ -sbom.spdx.json -linux-cmd -linux -linux.* \ No newline at end of file diff --git a/.gitignore b/.gitignore index 9ffb107..3daeb5c 100644 --- a/.gitignore +++ b/.gitignore @@ -5,6 +5,7 @@ .venv __pycache__ sbom.spdx.json +sbom.used_files.txt linux_cmd linux linux.* diff --git a/.vscode/launch.json b/.vscode/launch.json index 8674886..d372d05 100644 --- a/.vscode/launch.json +++ b/.vscode/launch.json @@ -11,7 +11,6 @@ "--src-tree", "../linux", "--output-tree", "../linux/kernel-build", "--root-output-in-tree", "vmlinux", - "--output", "sbom.spdx.json", "--debug" ] } diff --git a/README.md b/README.md index 4bf1785..4c27bee 100644 --- a/README.md +++ b/README.md @@ -10,26 +10,38 @@ A script to generate an SPDX-format Software Bill of Materials (SBOM) for the `v The eventual goal is to integrate the `sbom/` directory into the `linux/scripts/` directory in the official [linux](https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/) kernel source tree. ## Getting Started - -To test the script install [Docker](https://docs.docker.com/engine/install/ubuntu/#installation-methods) and run: -```bash -docker compose up -``` -This will: -- Build a Docker image based on the included [Dockerfile](./Dockerfile). -- Clone the Linux kernel repository during the image build. -- Compile the kernel out-of-tree into `linux/kernel_build`. -- Start a container with this repository mounted as volume. -- Run the [sbom.py](sbom/sbom.py) script inside the container: - ```bash - python3 sbom/sbom.py \ - --src-tree ../linux \ - --output-tree ../linux/kernel_build \ - --root-output-in-tree vmlinux \ - --output sbom.spdx.json - ``` -- Starting from `vmlinux` the script builds the **cmd graph**, a directed acyclic graph (DAG) where nodes are filenames and edges represent build dependencies extracted from `..cmd` files. -- Based on the cmd graph, the final `sbom.spdx.json` file is created and saved in this repository’s root directory. +1. Clone the repository +2. Activate the venv and install build dependencies + ```bash + python3 -m venv .venv + source .venv/bin/activate + pip install pre-commit reuse ruff + pre-commit install + ``` +3. provide a linux src and output tree, e.g., by downloading precompiled testdata from [KernelSbom-TestData](https://fileshare.tngtech.com/library/98e7e6f8-bffe-4a55-a8d2-817d4f3e51e8/KernelSbom-TestData/) + ```bash + test_archive="linux-defconfig.tar.gz" + curl -L -o "$test_archive" "https://fileshare.tngtech.com/d/e69946da808b41f88047/files/?p=%2F$test_archive&dl=1" + tar -xzf "$test_archive" + rm "$test_archive" + ``` + or cloning the [linux](https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git) repo and building your own config + ```bash + git clone --depth 1 https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git + cd linux + make O=kernel_build + make -j$(nproc) O=kernel_build + ``` +4. Run the [sbom.py](sbom/sbom.py) script + ```bash + python3 sbom/sbom.py \ + --src-tree linux \ + --output-tree linux/kernel_build \ + --root-output-in-tree vmlinux \ + --spdx sbom.spdx.json \ + --used-files sbom.used_files.txt + ``` + Starting from `vmlinux` the script builds the **cmd graph**, a directed acyclic graph (DAG) where nodes are filenames and edges represent build dependencies extracted from `..cmd` files. Based on the cmd graph, the final `sbom.spdx.json`, `sbom.used_files.txt` files are created and saved in this repository’s root directory. ## Directory Structure @@ -40,14 +52,10 @@ This will: - `sbom_analysis` - Additional scripts for analyzing the outputs produced by the main script. - [sbom_analysis/cmd_graph_based_kernel_build](sbom_analysis/cmd_graph_based_kernel_build/README.md) - Validation of cmd graph completeness by rebuilding the linux kernel only with files referenced in the cmd graph. - [sbom_analysis/cmd_graph_visualization](sbom_analysis/cmd_graph_visualization/README.md) - Interactive visualization of the cmd graph +- `testdata_generation` - Describes how the precompiled kernel builds in [KernelSbom-TestData](https://fileshare.tngtech.com/library/98e7e6f8-bffe-4a55-a8d2-817d4f3e51e8/KernelSbom-TestData/) were generated. The main contribution is the content of the `sbom` directory which eventually should be moved into the `linux/scripts/` directory in the official [linux](https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/) kernel source tree. -## Development & Debugging - -For development and debugging, install the [Dev Containers](https://marketplace.visualstudio.com/items?itemName=ms-vscode-remote.remote-containers) extension for [VSCode](https://code.visualstudio.com/). Then, open the Command Palette (F1) and select `Reopen in Dev Container`. This opens your project inside a development container based on the same Dockerfile used above. -Inside the devcontainer, you can run the provided [Python Debugger: sbom](./.vscode/launch.json) launch configuration to step through the script interactively. - ## Reuse when commiting `reuse lint` is executed as a pre-commit hook to check if all files have compliant License headers. If any file is missing a license header add it via diff --git a/docker-compose.yaml b/docker-compose.yaml deleted file mode 100644 index 24d70ec..0000000 --- a/docker-compose.yaml +++ /dev/null @@ -1,18 +0,0 @@ -# SPDX-FileCopyrightText: 2025 TNG Technology Consulting GmbH -# -# SPDX-License-Identifier: GPL-2.0-only - -services: - sbom: - build: - context: . - dockerfile: .devcontainer/Dockerfile - volumes: - - .:/workspace/LinuxKernelSbomGenerator - working_dir: /workspace/LinuxKernelSbomGenerator - command: > - python3 src/sbom.py - --src-tree ../linux - --output-tree ../linux/kernel_build - --root-output-in-tree vmlinux - --output sbom.spdx.json diff --git a/sbom/lib/sbom/cmd/cmd_graph.py b/sbom/lib/sbom/cmd/cmd_graph.py index 6809d86..63a5c86 100644 --- a/sbom/lib/sbom/cmd/cmd_graph.py +++ b/sbom/lib/sbom/cmd/cmd_graph.py @@ -8,6 +8,7 @@ from pathlib import Path from dataclasses import dataclass, field import pickle +from typing import Iterator from .deps_parser import parse_deps from .savedcmd_parser import parse_commands @@ -96,3 +97,9 @@ def save_cmd_graph(node: CmdGraphNode, path: Path) -> None: def load_cmd_graph(path: Path) -> CmdGraphNode: with open(path, "rb") as f: return pickle.load(f) + + +def iter_files_in_cmd_graph(cmd_graph: CmdGraphNode) -> Iterator[Path]: + yield cmd_graph.absolute_path + for child_node in cmd_graph.children: + yield from iter_files_in_cmd_graph(child_node) diff --git a/sbom/lib/sbom/cmd/savedcmd_parser.py b/sbom/lib/sbom/cmd/savedcmd_parser.py index d3a68bd..07cfff1 100644 --- a/sbom/lib/sbom/cmd/savedcmd_parser.py +++ b/sbom/lib/sbom/cmd/savedcmd_parser.py @@ -233,6 +233,7 @@ def _parse_sed_command(command: str) -> list[Path]: (re.compile(r"(.*/)?genheaders\b"), _parse_genheaders_command), (re.compile(r"^ld\b"), _parse_ld_command), (re.compile(r"^sed\b"), _parse_sed_command), + (re.compile(r"^(.*/)?objtool\b"), _parse_noop), ] # If Block pattern to match a simple, single-level if-then-fi block. Nested If blocks are not supported. diff --git a/sbom/sbom.py b/sbom/sbom.py index 672a005..bb58898 100644 --- a/sbom/sbom.py +++ b/sbom/sbom.py @@ -14,7 +14,7 @@ import os from pathlib import Path import lib.sbom.spdx as spdx -from lib.sbom.cmd.cmd_graph import CmdGraphNode, build_cmd_graph +from lib.sbom.cmd.cmd_graph import CmdGraphNode, build_cmd_graph, iter_files_in_cmd_graph import time @@ -23,7 +23,8 @@ class Args: src_tree: str output_tree: str root_output_in_tree: str - output: str + spdx: str + used_files: str debug: bool @@ -46,7 +47,14 @@ def parse_args() -> Args: help="Root build output path relative to --output-tree the SBOM will be based on (default: vmlinux)", ) parser.add_argument( - "--output", default="sbom.spdx.json", help="Path where to create the SPDX document (default: sbom.spdx.json)" + "--spdx", + default="sbom.spdx.json", + help="Path to create the SPDX document, or 'none' to disable (default: sbom.spdx.json)", + ) + parser.add_argument( + "--used-files", + default="sbom.used_files.txt", + help="Path to create the a flat list of all source files used for the kernel build, or 'none' to disable (default: sbom.used_files.txt)", ) parser.add_argument("-d", "--debug", action="store_true", default=False, help="Debug level (default: False)") @@ -117,6 +125,9 @@ def main(): """Main program""" # Parse cli arguments args = parse_args() + src_tree = Path(os.path.realpath(args.src_tree)) + output_tree = Path(os.path.realpath(args.output_tree)) + root_output_in_tree = Path(args.root_output_in_tree) # Configure logging logging.basicConfig(level=logging.DEBUG if args.debug else logging.INFO, format="[%(levelname)s] %(message)s") @@ -124,21 +135,34 @@ def main(): # Build cmd graph logging.info(f"Building cmd graph for {args.root_output_in_tree}") start_time = time.time() - cmd_graph = build_cmd_graph( - root_output_in_tree=Path(args.root_output_in_tree), - output_tree=Path(os.path.realpath(args.output_tree)), - src_tree=Path(os.path.realpath(args.src_tree)), - ) + cmd_graph = build_cmd_graph(root_output_in_tree, output_tree, src_tree) logging.info(f"Build cmd graph in {time.time() - start_time} seconds") + # Save used files + if args.used_files != "none": + logging.info("Extracting source files from cmd graph") + used_files = [ + file_path.relative_to(src_tree) + for file_path in iter_files_in_cmd_graph(cmd_graph) + if file_path.is_relative_to(src_tree) and not file_path.is_relative_to(output_tree) + ] + logging.info(f"Found {len(used_files)} source files in cmd graph.") + with open(args.used_files, "w", encoding="utf-8") as f: + f.write("\n".join(str(file_path) for file_path in used_files)) + logging.info(f"Saved {args.used_files} successfully") + + if args.spdx == "none": + return + # Fill SPDX Document - doc = create_spdx_document(cmd_graph) + logging.info("Generating SPDX Document based on cmd graph") + spdx_doc = create_spdx_document(cmd_graph) # Save SPDX Document - json_string = doc.to_json() - with open(args.output, "w", encoding="utf-8") as f: - f.write(json_string) - logging.info(f"Saved {args.output} successfully") + spdx_json = spdx_doc.to_json() + with open(args.spdx, "w", encoding="utf-8") as f: + f.write(spdx_json) + logging.info(f"Saved {args.spdx} successfully") # Call main method diff --git a/sbom_analysis/cmd_graph_based_kernel_build/cmd_graph_based_kernel_build.py b/sbom_analysis/cmd_graph_based_kernel_build/cmd_graph_based_kernel_build.py index de86903..0fa55f0 100644 --- a/sbom_analysis/cmd_graph_based_kernel_build/cmd_graph_based_kernel_build.py +++ b/sbom_analysis/cmd_graph_based_kernel_build/cmd_graph_based_kernel_build.py @@ -55,7 +55,7 @@ def _remove_files(base_path: Path, patterns_to_remove: list[re.Pattern[str]], ig if __name__ == "__main__": script_path = Path(__file__).parent # Paths to the original source and build directories - cmd_graph_path = script_path / "cmd_graph.pickle" + cmd_graph_path = script_path / "../cmd_graph.pickle" src_tree = (script_path / "../../linux").resolve() output_tree = (script_path / "../../linux/kernel_build").resolve() root_output_in_tree = Path("vmlinux") diff --git a/sbom_analysis/cmd_graph_visualization/cmd_graph_visualization.py b/sbom_analysis/cmd_graph_visualization/cmd_graph_visualization.py index 8bab174..be0975b 100644 --- a/sbom_analysis/cmd_graph_visualization/cmd_graph_visualization.py +++ b/sbom_analysis/cmd_graph_visualization/cmd_graph_visualization.py @@ -72,7 +72,7 @@ def traverse(node: CmdGraphNode, depth: int = 0): if __name__ == "__main__": script_path = Path(__file__).parent - cmd_graph_path = script_path / "cmd_graph.pickle" + cmd_graph_path = script_path / "../cmd_graph.pickle" src_tree = (script_path / "../../linux").resolve() output_tree = (script_path / "../../linux/kernel_build").resolve() root_output_in_tree = Path("vmlinux") diff --git a/.devcontainer/Dockerfile b/testdata_generation/Dockerfile similarity index 67% rename from .devcontainer/Dockerfile rename to testdata_generation/Dockerfile index 85ff773..62f87c6 100644 --- a/.devcontainer/Dockerfile +++ b/testdata_generation/Dockerfile @@ -3,11 +3,12 @@ # SPDX-License-Identifier: GPL-2.0-only FROM ubuntu:22.04 + RUN set -x \ && apt-get -y update \ && apt-get install -y --no-install-recommends \ - build-essential linux-headers-generic bc \ - flex bison python3 python3-pip python3-venv git libelf-dev libssl-dev gawk sudo \ + build-essential linux-headers-generic bc zstd ca-certificates \ + flex bison python3 git libelf-dev libssl-dev gawk sudo \ && rm -rf /var/lib/apt/lists/* WORKDIR /workspace @@ -17,7 +18,9 @@ RUN set -x \ && chown ubuntu:ubuntu /workspace USER ubuntu -RUN git clone --depth 1 https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git +ARG GIT_TAG=v6.17 +RUN git clone --depth 1 --branch "$GIT_TAG" https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git + ARG CONFIG=defconfig RUN set -x \ && cd linux \ diff --git a/testdata_generation/README.md b/testdata_generation/README.md new file mode 100644 index 0000000..0ed9df0 --- /dev/null +++ b/testdata_generation/README.md @@ -0,0 +1,27 @@ + + +# Test Data Generation + +This directory describes how the precompiled kernel builds in [KernelSbom-TestData](https://fileshare.tngtech.com/library/98e7e6f8-bffe-4a55-a8d2-817d4f3e51e8/KernelSbom-TestData/) were created. + +Standard preconfigured kernel builds were obtained via: +- **linux-tinyconfig.tar.gz** `./extract_testdata.sh tinyconfig` +- **linux-defconfig.tar.gz** `./extract_testdata.sh defconfig` +- **linux-allmodconfig.tar.gz** `./extract_testdata.sh allmodconfig` + +Additionally, distribution specific configs like **linux-localmodconfig.Ubuntu24.04.tar.gz** were created in different systems via: +```bash +git clone --depth 1 --branch v6.17 https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git +cd linux +make localmodconfig O=kernel_build +make -j$(nproc) O=kernel_build +``` + +After building the kernel, the entire linux directory is archived and uploaded to the FileShare: +```bash +tar -czf linux-.tar.gz linux +``` diff --git a/.devcontainer/extract_testdata.sh b/testdata_generation/extract_testdata.sh similarity index 100% rename from .devcontainer/extract_testdata.sh rename to testdata_generation/extract_testdata.sh