Skip to content

runc 1.1.15 OOMs in Kubernetes e2e tests with containerd, cgroup v2, and cgroupfs driver #4427

Closed
@samuelkarp

Description

@samuelkarp

Description

In containerd/containerd#10795 we're attempting to update containerd 1.6 to use runc 1.1.15. This triggers a Kubernetes e2e CI run, which is consistently failing. The same update to runc 1.1.15 succeeds with containerd 1.7 and main (which will be 2.0).

Steps to reproduce the issue

We're currently attempting to narrow down differences between the test environments used in containerd's various branches, since it seems unlikely the actual containerd version will be the cause of the difference. So far, containerd 1.6's tests are using the cgroupfs driver for managing cgroups as opposed to systemd, while the 1.7 and main branch tests use systemd to manage cgroups.

All tests are running on COS M117 and appear to be using cgroup v2 as of now.

Describe the results you received and expected

Oct 08 17:28:42 tmp-node-e2e-7aba7327-cos-beta-117-18613-0-76 kernel: runc:[2:INIT] invoked oom-killer: gfp_mask=0xcc0(GFP_KERNEL), order=0, oom_score_adj=-998
Oct 08 17:28:42 tmp-node-e2e-7aba7327-cos-beta-117-18613-0-76 kernel: CPU: 0 PID: 52432 Comm: runc:[2:INIT] Tainted: G           O       6.6.44+ #1
Oct 08 17:28:42 tmp-node-e2e-7aba7327-cos-beta-117-18613-0-76 kernel: Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/13/2024
Oct 08 17:28:42 tmp-node-e2e-7aba7327-cos-beta-117-18613-0-76 kernel: Call Trace:
Oct 08 17:28:42 tmp-node-e2e-7aba7327-cos-beta-117-18613-0-76 kernel:  <TASK>
Oct 08 17:28:42 tmp-node-e2e-7aba7327-cos-beta-117-18613-0-76 kernel:  dump_stack_lvl+0x5d/0x80
Oct 08 17:28:42 tmp-node-e2e-7aba7327-cos-beta-117-18613-0-76 kernel:  dump_header+0x52/0x250
Oct 08 17:28:42 tmp-node-e2e-7aba7327-cos-beta-117-18613-0-76 kernel:  oom_kill_process+0x10a/0x220
Oct 08 17:28:42 tmp-node-e2e-7aba7327-cos-beta-117-18613-0-76 kernel:  out_of_memory+0x3bc/0x5a0
Oct 08 17:28:42 tmp-node-e2e-7aba7327-cos-beta-117-18613-0-76 kernel:  ? mem_cgroup_iter+0x1ca/0x240
Oct 08 17:28:42 tmp-node-e2e-7aba7327-cos-beta-117-18613-0-76 kernel:  try_charge_memcg+0x82a/0xa90
Oct 08 17:28:42 tmp-node-e2e-7aba7327-cos-beta-117-18613-0-76 kernel:  charge_memcg+0x3f/0x1f0
Oct 08 17:28:42 tmp-node-e2e-7aba7327-cos-beta-117-18613-0-76 kernel:  __mem_cgroup_charge+0x2f/0x80
Oct 08 17:28:42 tmp-node-e2e-7aba7327-cos-beta-117-18613-0-76 kernel:  do_pte_missing+0x544/0xc40
Oct 08 17:28:42 tmp-node-e2e-7aba7327-cos-beta-117-18613-0-76 kernel:  handle_mm_fault+0x7a6/0xa60
Oct 08 17:28:42 tmp-node-e2e-7aba7327-cos-beta-117-18613-0-76 kernel:  do_user_addr_fault+0x21b/0x770
Oct 08 17:28:42 tmp-node-e2e-7aba7327-cos-beta-117-18613-0-76 kernel:  exc_page_fault+0x7f/0x100
Oct 08 17:28:42 tmp-node-e2e-7aba7327-cos-beta-117-18613-0-76 kernel:  asm_exc_page_fault+0x26/0x30
Oct 08 17:28:42 tmp-node-e2e-7aba7327-cos-beta-117-18613-0-76 kernel: RIP: 0033:0x7966028a541e
Oct 08 17:28:42 tmp-node-e2e-7aba7327-cos-beta-117-18613-0-76 kernel: Code: 0c 20 4c 8b 6d 98 49 39 d3 49 89 4b 60 0f 95 c2 48 83 c8 01 49 83 c0 10 0f b6 d2 48 c1 e2 02 4c 09 e2 48 83 ca 01 49 89 50 f8 <48> 89 41 08 e9 16 fb ff ff 48 8d 0d f2 ed 0f 00 ba 13 10 00 00 48
Oct 08 17:28:42 tmp-node-e2e-7aba7327-cos-beta-117-18613-0-76 kernel: RSP: 002b:00007ffffc0bb770 EFLAGS: 00010202
Oct 08 17:28:42 tmp-node-e2e-7aba7327-cos-beta-117-18613-0-76 kernel: RAX: 0000000000004f21 RBX: 00007966029d2b00 RCX: 00005d2a23d5a0e0
Oct 08 17:28:42 tmp-node-e2e-7aba7327-cos-beta-117-18613-0-76 kernel: RDX: 00000000000000f1 RSI: 0000000000000000 RDI: 0000000000000004
Oct 08 17:28:42 tmp-node-e2e-7aba7327-cos-beta-117-18613-0-76 kernel: RBP: 00007ffffc0bb830 R08: 00005d2a23d5a000 R09: 00005d2a23d59ed0
Oct 08 17:28:42 tmp-node-e2e-7aba7327-cos-beta-117-18613-0-76 kernel: R10: 00000000000000f0 R11: 00007966029d2aa0 R12: 00000000000000f0
Oct 08 17:28:42 tmp-node-e2e-7aba7327-cos-beta-117-18613-0-76 kernel: R13: 00000000000000e0 R14: 000000000000000f R15: ffffffffffffffb8
Oct 08 17:28:42 tmp-node-e2e-7aba7327-cos-beta-117-18613-0-76 kernel:  </TASK>
Oct 08 17:28:42 tmp-node-e2e-7aba7327-cos-beta-117-18613-0-76 kernel: memory: usage 15360kB, limit 15360kB, failcnt 49
Oct 08 17:28:42 tmp-node-e2e-7aba7327-cos-beta-117-18613-0-76 kernel: swap: usage 0kB, limit 9007199254740988kB, failcnt 0
Oct 08 17:28:42 tmp-node-e2e-7aba7327-cos-beta-117-18613-0-76 kernel: Memory cgroup stats for /kubepods/burstable/pod6e4665e8-ae02-48b1-aee9-fedeeb899d20:
Oct 08 17:28:42 tmp-node-e2e-7aba7327-cos-beta-117-18613-0-76 kernel: anon 3665920
Oct 08 17:28:42 tmp-node-e2e-7aba7327-cos-beta-117-18613-0-76 kernel: file 11612160
Oct 08 17:28:42 tmp-node-e2e-7aba7327-cos-beta-117-18613-0-76 kernel: kernel 442368
Oct 08 17:28:42 tmp-node-e2e-7aba7327-cos-beta-117-18613-0-76 kernel: kernel_stack 98304
Oct 08 17:28:42 tmp-node-e2e-7aba7327-cos-beta-117-18613-0-76 kernel: pagetables 147456
Oct 08 17:28:42 tmp-node-e2e-7aba7327-cos-beta-117-18613-0-76 kernel: sec_pagetables 0
Oct 08 17:28:42 tmp-node-e2e-7aba7327-cos-beta-117-18613-0-76 kernel: percpu 6944
Oct 08 17:28:42 tmp-node-e2e-7aba7327-cos-beta-117-18613-0-76 kernel: sock 0
Oct 08 17:28:42 tmp-node-e2e-7aba7327-cos-beta-117-18613-0-76 kernel: vmalloc 0
Oct 08 17:28:42 tmp-node-e2e-7aba7327-cos-beta-117-18613-0-76 kernel: shmem 11612160
Oct 08 17:28:42 tmp-node-e2e-7aba7327-cos-beta-117-18613-0-76 kernel: file_mapped 4268032
Oct 08 17:28:42 tmp-node-e2e-7aba7327-cos-beta-117-18613-0-76 kernel: file_dirty 0
Oct 08 17:28:42 tmp-node-e2e-7aba7327-cos-beta-117-18613-0-76 kernel: file_writeback 0
Oct 08 17:28:42 tmp-node-e2e-7aba7327-cos-beta-117-18613-0-76 kernel: swapcached 0
Oct 08 17:28:42 tmp-node-e2e-7aba7327-cos-beta-117-18613-0-76 kernel: anon_thp 0
Oct 08 17:28:42 tmp-node-e2e-7aba7327-cos-beta-117-18613-0-76 kernel: file_thp 0
Oct 08 17:28:42 tmp-node-e2e-7aba7327-cos-beta-117-18613-0-76 kernel: shmem_thp 0
Oct 08 17:28:42 tmp-node-e2e-7aba7327-cos-beta-117-18613-0-76 kernel: inactive_anon 15089664
Oct 08 17:28:42 tmp-node-e2e-7aba7327-cos-beta-117-18613-0-76 kernel: active_anon 159744
Oct 08 17:28:42 tmp-node-e2e-7aba7327-cos-beta-117-18613-0-76 kernel: inactive_file 0
Oct 08 17:28:42 tmp-node-e2e-7aba7327-cos-beta-117-18613-0-76 kernel: active_file 0
Oct 08 17:28:42 tmp-node-e2e-7aba7327-cos-beta-117-18613-0-76 kernel: unevictable 0
Oct 08 17:28:42 tmp-node-e2e-7aba7327-cos-beta-117-18613-0-76 kernel: slab_reclaimable 71080
Oct 08 17:28:42 tmp-node-e2e-7aba7327-cos-beta-117-18613-0-76 kernel: slab_unreclaimable 89832
Oct 08 17:28:42 tmp-node-e2e-7aba7327-cos-beta-117-18613-0-76 kernel: slab 160912
Oct 08 17:28:42 tmp-node-e2e-7aba7327-cos-beta-117-18613-0-76 kernel: workingset_refault_anon 0
Oct 08 17:28:42 tmp-node-e2e-7aba7327-cos-beta-117-18613-0-76 kernel: workingset_refault_file 0
Oct 08 17:28:42 tmp-node-e2e-7aba7327-cos-beta-117-18613-0-76 kernel: workingset_activate_anon 0
Oct 08 17:28:42 tmp-node-e2e-7aba7327-cos-beta-117-18613-0-76 kernel: workingset_activate_file 0
Oct 08 17:28:42 tmp-node-e2e-7aba7327-cos-beta-117-18613-0-76 kernel: workingset_restore_anon 0
Oct 08 17:28:42 tmp-node-e2e-7aba7327-cos-beta-117-18613-0-76 kernel: workingset_restore_file 0
Oct 08 17:28:42 tmp-node-e2e-7aba7327-cos-beta-117-18613-0-76 kernel: workingset_nodereclaim 0
Oct 08 17:28:42 tmp-node-e2e-7aba7327-cos-beta-117-18613-0-76 kernel: pgscan 0
Oct 08 17:28:42 tmp-node-e2e-7aba7327-cos-beta-117-18613-0-76 kernel: pgsteal 0
Oct 08 17:28:42 tmp-node-e2e-7aba7327-cos-beta-117-18613-0-76 kernel: pgscan_kswapd 0
Oct 08 17:28:42 tmp-node-e2e-7aba7327-cos-beta-117-18613-0-76 kernel: pgscan_direct 0
Oct 08 17:28:42 tmp-node-e2e-7aba7327-cos-beta-117-18613-0-76 kernel: pgscan_khugepaged 0
Oct 08 17:28:42 tmp-node-e2e-7aba7327-cos-beta-117-18613-0-76 kernel: pgsteal_kswapd 0
Oct 08 17:28:42 tmp-node-e2e-7aba7327-cos-beta-117-18613-0-76 kernel: pgsteal_direct 0
Oct 08 17:28:42 tmp-node-e2e-7aba7327-cos-beta-117-18613-0-76 kernel: pgsteal_khugepaged 0
Oct 08 17:28:42 tmp-node-e2e-7aba7327-cos-beta-117-18613-0-76 kernel: pgfault 1252
Oct 08 17:28:42 tmp-node-e2e-7aba7327-cos-beta-117-18613-0-76 kernel: pgmajfault 0
Oct 08 17:28:42 tmp-node-e2e-7aba7327-cos-beta-117-18613-0-76 kernel: pgrefill 0
Oct 08 17:28:42 tmp-node-e2e-7aba7327-cos-beta-117-18613-0-76 kernel: pgactivate 38
Oct 08 17:28:42 tmp-node-e2e-7aba7327-cos-beta-117-18613-0-76 kernel: pgdeactivate 0
Oct 08 17:28:42 tmp-node-e2e-7aba7327-cos-beta-117-18613-0-76 kernel: pglazyfree 0
Oct 08 17:28:42 tmp-node-e2e-7aba7327-cos-beta-117-18613-0-76 kernel: pglazyfreed 0
Oct 08 17:28:42 tmp-node-e2e-7aba7327-cos-beta-117-18613-0-76 kernel: thp_fault_alloc 0
Oct 08 17:28:42 tmp-node-e2e-7aba7327-cos-beta-117-18613-0-76 kernel: thp_collapse_alloc 0
Oct 08 17:28:42 tmp-node-e2e-7aba7327-cos-beta-117-18613-0-76 kernel: Tasks state (memory values in pages):
Oct 08 17:28:42 tmp-node-e2e-7aba7327-cos-beta-117-18613-0-76 kernel: [  pid  ]   uid  tgid total_vm      rss pgtables_bytes swapents oom_score_adj name
Oct 08 17:28:42 tmp-node-e2e-7aba7327-cos-beta-117-18613-0-76 kernel: [  52432] 65535 52432   401515     2634   163840        0          -998 runc:[2:INIT]
Oct 08 17:28:42 tmp-node-e2e-7aba7327-cos-beta-117-18613-0-76 kernel: oom-kill:constraint=CONSTRAINT_MEMCG,nodemask=(null),cpuset=0eec9f496bc31187cd00a14d2debd8b4bb89cc81bfc2301ee425e30c07572398,mems_allowed=0,oom_memcg=/kubepods/burstable/pod6e4665e8-ae02-48b1-aee9-fedeeb899d20,task_memcg=/kubepods/burstable/pod6e4665e8-ae02-48b1-aee9-fedeeb899d20/0eec9f496bc31187cd00a14d2debd8b4bb89cc81bfc2301ee425e30c07572398,task=runc:[2:INIT],pid=52432,uid=65535
Oct 08 17:28:42 tmp-node-e2e-7aba7327-cos-beta-117-18613-0-76 kernel: Memory cgroup out of memory: Killed process 52432 (runc:[2:INIT]) total-vm:1606060kB, anon-rss:3496kB, file-rss:1152kB, shmem-rss:5888kB, UID:65535 pgtables:160kB oom_score_adj:-998

What version of runc are you using?

1.1.15

Host OS information

No response

Host kernel information

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions