Skip to content

Add overlaybd sysext #3157

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 3 commits into
base: main
Choose a base branch
from
Draft

Add overlaybd sysext #3157

wants to merge 3 commits into from

Conversation

t-lo
Copy link
Member

@t-lo t-lo commented Jul 29, 2025

This adds overlaybd, a user space container-optimised block device implementing a novel layering block-level image format, which is designed for containers, with security and performance in mind.

Documentation is available at https://containerd.github.io/overlaybd/.

sys-fs/overlaybd ships the low-level user space block device (https://github.com/containerd/overlaybd) as well as low-level tooling for operating it.

app-containers/accelerated-container-image (https://github.com/containerd/accelerated-container-image) ships integration with containerd.

Both are part of the Containerd CNCF project.

BUILDING

  1. Check out this branch
    git checkout t-lo/containerd-overlaybd
  2. Start the SDK
    ./run_sdk_container -t
  3. Build packages and image
     ./build_packages
     ./build_image prod sysext
     ./image_to_vm.sh --from=../build/images/amd64-usr/latest --image_compression_formats=none

Rebuilding

After an initial build only the sysext needs to be rebuilt.
To limit the scope of system-dependent sysext builds it is useful to set

EXTRA_SYSEXTS=(                                                          
  "overlaybd|sys-fs/overlaybd,app-containers/accelerated-container-image"
)                                                                        

in build_library/prod_image_util.sh.

To pick up modifications in overlaybd, run

emerge-amd64-usr --buildpkg overlaybd

Similarly, to include accelerated-container-image changes, use

emerge-amd64-usr --buildpkg accelerated-container-image

Then rebuild the sysext(s):

./build_image sysext

The new flatcar-overlaybd.raw sysext is now available in __build__/images/images/amd64-usr/latest.

TESTING

Testing is manual at this point because kola cannot (yet?) provision arbitrary system extensions to qemu test instances. Also, testing is limited to AMD64 as no overlaybd enabled ARM64 images are available on dockerhub.

  1. Provision flatcar-overlaybd extension on test instance. The sysext bakery's bakery.sh boot is particularly useful. In __build__/images/images/amd64-usr/latest, run
    ~/code/sysext-bakery/bakery.sh boot flatcar-overlaybd.raw
  2. Set up containerd as per https://github.com/containerd/accelerated-container-image/blob/main/docs/QUICKSTART.md#configuration:
     cp /usr/share/containerd/config.toml /etc/
     vim /etc/config.toml
    and paste
    [proxy_plugins.overlaybd]
       type = "snapshot"
       address = "/run/overlaybd-snapshotter/overlaybd.sock"
    then
    systemctl edit containerd.service
    and add
    [Service]
    ExecStart=
    ExecStart=/usr/bin/containerd --config /etc/config.toml
    finally
    systemctl restart containerd
  3. Start redis test image as per
    https://github.com/containerd/accelerated-container-image/blob/main/docs/QUICKSTART.md#run-overlaybd-images
 /opt/overlaybd/snapshotter/ctr rpull registry.hub.docker.com/overlaybd/redis:6.2.1_obd
 ctr run --net-host --snapshotter=overlaybd --rm -t registry.hub.docker.com/overlaybd/redis:6.2.1_obd demo

and on a separate console run

 journalctl --no-pager -f

@t-lo
Copy link
Member Author

t-lo commented Jul 30, 2025

Testing on AMD64 ran into illegal instruction errors when using ISAL; will disable it and leave a comment in the ebuild.

Edit: Resolved by putting ISAL support behind a USE flag.

Copy link

github-actions bot commented Jul 30, 2025

Build action triggered: https://github.com/flatcar/scripts/actions/runs/16813069920

@t-lo
Copy link
Member Author

t-lo commented Jul 30, 2025

When testing I'm seeing multiple

Jul 30 10:15:13 localhost overlaybd-tcmu[1652]: *** buffer overflow detected ***: terminated
Jul 30 10:15:17 localhost systemd[1]: overlaybd-tcmu.service: Main process exited, code=dumped, status=6/ABRT
Jul 30 10:15:17 localhost systemd[1]: overlaybd-tcmu.service: Failed with result 'core-dump'.

This happens in the sudo ctr run --net-host --snapshotter=overlaybd --rm -t registry.hub.docker.com/overlaybd/redis:6.2.1_obd demo step (https://github.com/containerd/accelerated-container-image/blob/main/docs/QUICKSTART.md#run-overlaybd-images).

Systemd restarts overlaybd-tcmu many times and the errors eventually stop, and the test redis starts.

systemd-coredump helpfully hints at an snprintf in overlaybd-tcmu:

Stack trace of thread 2044:
#0  0x00007f35596383a0 abort (libc.so.6 + 0x253a0)
#1  0x00007f3559639349 n/a (libc.so.6 + 0x26349)
#2  0x00007f355972dba9 __fortify_fail (libc.so.6 + 0x11aba9)
#3  0x00007f355972d524 __chk_fail (libc.so.6 + 0x11a524)
#4  0x00007f355972edb5 __snprintf_chk (libc.so.6 + 0x11bdb5)
#5  0x000055d726e426ec snprintf (overlaybd-tcmu + 0x1016ec)
#6  0x000055d726dae950 _Z11cmd_handlerP11tcmu_deviceP11tcmulib_cmd (overlaybd-tcmu + 0x6d950)
#7  0x000055d726daea84 _Z6handlePv (overlaybd-tcmu + 0x6da84)
#8  0x000055d726db5a69 _ZN6photon14ThreadPoolBase4stubEPv (overlaybd-tcmu + 0x74a69)
#9  0x000055d726dba15f _photon_thread_stub (overlaybd-tcmu + 0x7915f)
#10 0x00007f35591d3340 n/a (n/a + 0x0)
#11 0x00007f354109eb80 n/a (n/a + 0x0)
#12 0x00007f35467e36c0 n/a (n/a + 0x0)
#13 0x00007f35477efec0 n/a (n/a + 0x0)
#14 0x00007f354ca6bb40 n/a (n/a + 0x0)
ELF object binary architecture: AMD x86-64

objdump for overlaybd-tmcu at the relevant address _Z11cmd_handlerP11tcmu_deviceP11tcmulib_cmd (overlaybd-tcmu + 0x6d950):

000000000006d5b0 <_Z11cmd_handlerP11tcmu_deviceP11tcmulib_cmd>:
...
6d943: 48 8b 50 08           mov    0x8(%rax),%rdx                                                     
6d947: 4c 8b 40 18           mov    0x18(%rax),%r8                                                     
6d94b: e8 00 40 09 00        call   101950 <_Z20tcmu_emulate_inquiryP11tcmu_deviceP8tgt_portPhP5iovecm>
6d950: 48 8b 74 24 18        mov    0x18(%rsp),%rsi                                                    
6d955: 48 89 ef              mov    %rbp,%rdi                                                          
6d958: 89 c2                 mov    %eax,%edx                                                          

which corresponds to line 131 in overlaybd/src/main.cpp :

127                                                                                   
128     switch (cmd->cdb[0]) {                                                        
129     case INQUIRY:                                                                 
130         photon::thread_yield();                                                   
131         ret = tcmu_emulate_inquiry(dev, NULL, cmd->cdb, cmd->iovec, cmd->iov_cnt);
132         tcmulib_command_complete(dev, cmd, ret);                                  
133         break;                                                                    
134                                                                                   

@t-lo t-lo force-pushed the t-lo/containerd-overlaybd branch from 9c1c61f to d00bba7 Compare July 30, 2025 11:11
chewi added 3 commits July 30, 2025 13:24
* A custom CTR for pulling accelerated container images
* An image converter
* A snapshotter

Signed-off-by: James Le Cuirot <[email protected]>
@t-lo t-lo force-pushed the t-lo/containerd-overlaybd branch from d00bba7 to bac06fe Compare July 30, 2025 11:24
@jepio
Copy link
Member

jepio commented Aug 5, 2025

@krnowak
Copy link
Member

krnowak commented Aug 7, 2025

So the failing code seems to be this one in https://github.com/data-accelerator/photon-libtcmu/blob/main/scsi.cpp#L255-L256:

		len = snprintf(&ptr[4], sizeof(data) - used - 4, "%s",
			       tcmu_dev_get_cfgstring(dev));

data is char[512], ptr points somewhere inside it. At the time of execution, ptr is &data[41]. tcmu_dev_get_cfgstring returns a string of length 101. So it should be working fine. But at glibc side we invoke snprintf, and snprintf invokes __snprintf_chk like so:

  return __builtin___snprintf_chk (__s, __n, __USE_FORTIFY_LEVEL - 1,
                                   __glibc_objsize (__s), __fmt,
                                   __va_arg_pack ());

The failing check is this code inside __snprintf_chk:

  if (__glibc_unlikely (slen < maxlen))
    __chk_fail ();

maxlen is 475 in our case, slen is a result of __glibc_objsize (__s), where __s is the string returned by tcmu_dev_get_cfgstring(dev). The catch here is that this function returns dev->cfgstring and cfgstring in struct tcmu_device is defined as char cfgstring[PATH_MAX] (so an array of 4096 chars).

My guess is that __glibc_objsize returns 4096, so __snprintf_chk bails out.

__glibc_objsize is defined in /usr/include/sys/cdefs.h and its definition depends on fortification level.

Possibly a workaround could be to lower the fortification configuration we use, or do a strdup of the device's cfgstring and pass the result to snprintf.

@krnowak
Copy link
Member

krnowak commented Aug 7, 2025

Another possible workaround could be increasing size of data from 512 to 512 + PATH_MAX and then take care of places where sizeof(data) is used. But this is rather silly.

@t-lo
Copy link
Member Author

t-lo commented Aug 7, 2025

Another possible workaround could be increasing size of data from 512 to 512 + PATH_MAX and then take care of places where sizeof(data) is used. But this is rather silly.

This seems cleaner to me since it would ensure that dev->cfgstring always fits? Even using plain PATH_MAX would suffice, no?

@jepio
Copy link
Member

jepio commented Aug 7, 2025

maxlen is 475 in our case, slen is a result of __glibc_objsize (__s), where __s is the string returned by tcmu_dev_get_cfgstring(dev). The catch here is that this function returns dev->cfgstring and cfgstring in struct tcmu_device is defined as char cfgstring[PATH_MAX] (so an array of 4096 chars).

Is it possible that the string is not null terminated?

@krnowak
Copy link
Member

krnowak commented Aug 7, 2025

maxlen is 475 in our case, slen is a result of __glibc_objsize (__s), where __s is the string returned by tcmu_dev_get_cfgstring(dev). The catch here is that this function returns dev->cfgstring and cfgstring in struct tcmu_device is defined as char cfgstring[PATH_MAX] (so an array of 4096 chars).

Is it possible that the string is not null terminated?

That was my first guess/idea, but no. The string was null terminated, had length of 101 characters. The string was something like overlaybd//absolute/path/to/some/config.json.

@t-lo
Copy link
Member Author

t-lo commented Aug 7, 2025

@krnowak I patched scsi.cpp with

diff --git a/scsi.cpp b/scsi.cpp
index d8c27a9..0113eb2 100644
--- a/scsi.cpp
+++ b/scsi.cpp
@@ -181,7 +181,7 @@ int tcmu_emulate_evpd_inquiry(
 	break;
 	case 0x83: /* Device identification */
 	{
-		char data[512];
+		char data[PATH_MAX+512]; //tcmu_dev_get_cfgstring(dev) may be up to PATH_MAX length
 		char *ptr, *p, *wwn;
 		size_t len, used = 0;
 		uint16_t *tot_len = (uint16_t*) &data[2];

and I can still see the issue - same buffer overflow, same stack trace.

@krnowak
Copy link
Member

krnowak commented Aug 7, 2025

Bah, I'll add some debug symbols for glibc and investigate more tomorrow.

@t-lo
Copy link
Member Author

t-lo commented Aug 7, 2025

So I did a bit of printf debugging (I know...) and it seems that line 255 in scsi.cpp indeed causes the issue.
Just the proposed workaround of using char data[PATH_MAX+512]; doesn't work for some reason.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants