Skip to content

Importing a containerdisk onto a block volume loses sparseness #3614

@stefanha

Description

@stefanha

What happened:
Importing a containerdisk onto a block volume loses sparseness. When I imported the centos-stream:9 containerdisk, which only uses 2 GB of non-zero data onto an empty 10 GB block volume, all 10 GB were written by CDI. Preallocation was not enabled.

What you expected to happen:
Only the non-zero data should be written to the block volume. This saves space on the underlying storage.

How to reproduce it (as minimally and precisely as possible):
Create a DataVolume from the YAML below and observe the amount of storage allocated. I used KubeSAN as the CSI driver, so the LVM lvs command can be used to see the thin provisioned storage usage. If you don't have thin provisioned storage you could use I/O stats or tracing to determine how much data is being written.

Additional context:
I discussed this with @aglitke and we looked at the qemu-img command that is invoked:

Running qemu-img with args: [convert -t writeback -p -O raw /scratch/disk/disk.img /dev/cdi-block-volume]

Adding the --target-is-zero option should avoid writing every block in the target block volume.

If there are concerns that some new block volumes come uninitialized (blocks not zeroed), then it should be possible to run blkdiscard --zeroout /path/to/block/device before invoking qemu-img with --target-is-zero. I have not tested this, but blkdiscard should zero the device efficiently and fall back to writing zero buffers on old hardware. On modern devices this would still be faster and preserve sparseness compared to writing all zeroes. On old devices it would be slower, depending on how many non-zero the input disk image has.

Environment:

  • CDI version (use kubectl get deployments cdi-deployment -o yaml): 4.17.3
  • Kubernetes version (use kubectl version): v1.30.5
  • DV specification:
apiVersion: cdi.kubevirt.io/v1beta1
kind: DataVolume
metadata:
  annotations:
    cdi.kubevirt.io/storage.bind.immediate.requested: "true"
    cdi.kubevirt.io/storage.import.lastUseTime: "2025-01-22T20:19:34.821435785Z"
    cdi.kubevirt.io/storage.usePopulator: "true"
  creationTimestamp: "2025-01-22T20:19:34Z"
  generation: 1
  labels:
    app.kubernetes.io/component: storage
    app.kubernetes.io/managed-by: cdi-controller
    app.kubernetes.io/part-of: hyperconverged-cluster
    app.kubernetes.io/version: 4.17.3
    cdi.kubevirt.io/dataImportCron: centos-stream-9-test-import-cron-3vs0wc
    instancetype.kubevirt.io/default-instancetype: u1.medium
    instancetype.kubevirt.io/default-preference: centos.stream9
  name: centos-stream-9-test-9515270a7f3d
  namespace: default
  resourceVersion: "217802868"
  uid: d8d56487-5686-47d0-93d0-79de02a8c2c3
spec:
  source:
    registry:
      url: docker://quay.io/containerdisks/centos-stream@sha256:9515270a7f3d3fd053732c15232071cb544d847e56aa2005f27002014b5becaa
  storage:
    resources:
      requests:
        storage: 10Gi
    storageClassName: kubesan
  • Cloud provider or hardware configuration: N/A
  • OS (e.g. from /etc/os-release): N/A
  • Kernel (e.g. uname -a): N/A
  • Install tools: N/A
  • Others: N/A

Metadata

Metadata

Assignees

Labels

good-first-issueIdentifies an issue that has been specifically created or selected for first-time contributors.kind/bug

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions