Skip to content

[VPP-1341] IPv6 ping between 2 nodes crash vpp #2807

Closed
@vvalderrv

Description

@vvalderrv

Description

I've created topology (using docker) with two nodes. On both there is configured bridge domain (loopback and memif). Ping from one node shutdown vpp (and docker also).

IMPORTANT NOTE:

Created loopbacks and memifs (from example below) have to be set up before adding to bridge domain !

BTW, using ip4 addresses work

BACKTRACE:

Core was generated by `/opt/vpp-agent/dev/vpp/build-root/install-vpp-native/vpp/bin/vpp -c /etc/vpp/vp'.

Program terminated with signal SIGABRT, Aborted.

#0 0x00007f2ff7e1b428 in raise () from /lib/x86_64-linux-gnu/libc.so.6

[Current thread is 1 (Thread 0x7f2ff9b29740 (LWP 11))]

(gdb) bt

#0 0x00007f2ff7e1b428 in raise () from /lib/x86_64-linux-gnu/libc.so.6

#1 0x00007f2ff7e1d02a in abort () from /lib/x86_64-linux-gnu/libc.so.6

#2 0x0000563b3d4b9943 in os_panic () at /opt/vpp-agent/dev/vpp/build-data/../src/vpp/vnet/main.c:310

#3 0x00007f2ff863176a in clib_mem_alloc_aligned_at_offset (os_out_of_memory_on_failure=1, align_offset=, align=4, size=853040880)

at /opt/vpp-agent/dev/vpp/build-data/../src/vppinfra/mem.h:105

#4 vec_resize_allocate_memory (v=, length_increment=length_increment@entry=1, data_bytes=, header_bytes=, header_bytes@entry=0,

data_align=data_align@entry=4) at /opt/vpp-agent/dev/vpp/build-data/../src/vppinfra/vec.c:84

#5 0x00007f2ff9491174 in _vec_resize_inline (data_align=, header_bytes=, data_bytes=, length_increment=, v=)

at /opt/vpp-agent/dev/vpp/build-data/../src/vppinfra/vec.h:145

#6 vlib_put_next_frame (vm=vm@entry=0x7f2ff96e7f80 <vlib_global_main>, r=r@entry=0x7f2fb79edc80, next_index=next_index@entry=3, n_vectors_left=n_vectors_left@entry=255)

at /opt/vpp-agent/dev/vpp/build-data/../src/vlib/main.c:501

#7 0x00007f2ff8b9005e in ethernet_input_inline (variant=ETHERNET_INPUT_VARIANT_ETHERNET, from_frame=0x7f2fb7601280, node=0x7f2fb79edc80, vm=)

at /opt/vpp-agent/dev/vpp/build-data/../src/vnet/ethernet/node.c:754

#8 ethernet_input (from_frame=0x7f2fb7601280, node=0x7f2fb79edc80, vm=) at /opt/vpp-agent/dev/vpp/build-data/../src/vnet/ethernet/node.c:774

#9 ethernet_input_avx2 (vm=0x7f2ff96e7f80 <vlib_global_main>, node=0x7f2fb79edc80, frame=0x7f2fb7601280) at /opt/vpp-agent/dev/vpp/build-data/../src/vnet/ethernet/node.c:1168

#10 0x00007f2ff9491334 in dispatch_node (last_time_stamp=295603116948830, frame=0x7f2fb7601280, dispatch_state=VLIB_NODE_STATE_POLLING, type=VLIB_NODE_TYPE_INTERNAL,

node=0x7f2fb79edc80, vm=0x7f2ff96e7f80 <vlib_global_main>) at /opt/vpp-agent/dev/vpp/build-data/../src/vlib/main.c:988

#11 dispatch_pending_node (vm=vm@entry=0x7f2ff96e7f80 <vlib_global_main>, pending_frame_index=pending_frame_index@entry=47391158, last_time_stamp=last_time_stamp@entry=295603116948830)

at /opt/vpp-agent/dev/vpp/build-data/../src/vlib/main.c:1138

#12 0x00007f2ff9492dbe in vlib_main_or_worker_loop (is_main=1, vm=0x7f2ff96e7f80 <vlib_global_main>) at /opt/vpp-agent/dev/vpp/build-data/../src/vlib/main.c:1616

#13 vlib_main_loop (vm=0x7f2ff96e7f80 <vlib_global_main>) at /opt/vpp-agent/dev/vpp/build-data/../src/vlib/main.c:1635

#14 vlib_main (vm=vm@entry=0x7f2ff96e7f80 <vlib_global_main>, input=input@entry=0x7f2fb8091fa0) at /opt/vpp-agent/dev/vpp/build-data/../src/vlib/main.c:1826

#15 0x00007f2ff94caac3 in thread0 (arg=139844024958848) at /opt/vpp-agent/dev/vpp/build-data/../src/vlib/unix/main.c:568

#16 0x00007f2ff85fb5d8 in clib_calljmp () at /opt/vpp-agent/dev/vpp/build-data/../src/vppinfra/longjmp.S:110

#17 0x00007ffd0d1fb3a0 in ?? ()

#18 0x00007f2ff94cb81a in vlib_unix_main (argc=, argv=) at /opt/vpp-agent/dev/vpp/build-data/../src/vlib/unix/main.c:632

#19 0x0000001100000000 in ?? ()

#20 0x0000004000000018 in ?? ()

STEPS to reproduce:

1) create two nodes with running vpp with config

unix


{
nodaemon
cli-listen 0.0.0.0:5002
cli-no-pager
}

plugins {

plugin dpdk_plugin.so

{ disable }

}

  1. Setup loopback & memif on node1:

binary-api create_loopback mac 8a:f1:be:90:00:00

binary-api memif_socket_filename_add_del add id 1 filename /tmp/memif_1_2.sock

binary-api memif_create id 1 socket-id 1 master

set int ip address loop0 fd30::1:b:0:0:1/64

set interface state loop0 up

set interface state memif1/1 up

  1. Setup loopback & memif on node2:

binary-api create_loopback mac 8a:f1:be:90:00:01

binary-api memif_socket_filename_add_del add id 1 filename /tmp/memif_1_2.sock

binary-api memif_create id 1 socket-id 1 slave

set int ip address loop0 fd30::1:b:0:0:2/64

set interface state loop0 up

set interface state memif1/1 up

  1. Setup bridge domain on both nodes:

create bridge-domain 1 arp-term 1

set interface l2 bridge loop0 1 bvi

set interface l2 bridge memif1/1 1

  1. Ping from node1 to node2:

ping fd30::1:b:0:0:2 repeat 2

Assignee

Dave Barach

Reporter

Pavel Kotucek

Comments

  • dbarach (Sun, 15 Jul 2018 14:55:38 +0000): Had nothing whatever to do with ipv6 ping. Loopback interface graph arc configuration issue; folks tried to overload one disposition index to send packets two different places.
  • ayourtch (Fri, 13 Jul 2018 19:01:41 +0000): For the previous comment - IPv6 was disabled on the host. Enabling IPv6 and I can reproduce the crash even on the main linux as well.
  • ayourtch (Fri, 13 Jul 2018 18:51:45 +0000): Curiously, I can only reproduce the crash when inside the container. Running with identical reproduction scenario with the same vpp config within the outer linux gives no crash.
  • ayourtch (Fri, 13 Jul 2018 18:29:34 +0000): The issue is reproducible without the memif - using tap interface, and with that it is possible to see the crash just on a single instance of VPP...

step1:

binary-api create_loopback mac 8a:f1:be:90:00:01

set int ip address loop0 192.168.1.2/24

set int ip address loop0 fd30::1:b:0:0:2/64

set interface state loop0 up

 step2:

tap connect taptest

set int state tapcli-0 up

create bridge-domain 1 arp-term 1

set interface l2 bridge loop0 1 bvi

set interface l2 bridge tapcli-0 1

When the last command of the second batch is executed, the assert is hit. The order of interface placement does not matter - the tap may be placed first into bridge group, then loopback.

Pasting both of the groups of the commands at once is successful - there is no crash.

  • ayourtch (Fri, 13 Jul 2018 17:37:34 +0000): looking at the assertion and comparing the values:

0: /home/ubuntu/vpp/build-data/../src/vlib/node_funcs.h:304 (vlib_node_runtime_get_next_frame) assertion `nf->node_runtime_index == next->runtime_index' fails

(gdb) bt

#0 0x00007ffff5bc4428 in __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:54

#1 0x00007ffff5bc602a in __GI_abort () at abort.c:89

#2 0x0000000000407e82 in os_panic () at /home/ubuntu/vpp/build-data/../src/vpp/vnet/main.c:331

#3 0x00007ffff63aba58 in debugger () at /home/ubuntu/vpp/build-data/../src/vppinfra/error.c:84

#4 0x00007ffff63abe90 in _clib_error (how_to_die=2, function_name=0x0, line_number=0, fmt=0x7ffff7969620 "%s:%d (%s) assertion `%s' fails")

at /home/ubuntu/vpp/build-data/../src/vppinfra/error.c:143

#5 0x00007ffff78d83ef in vlib_node_runtime_get_next_frame (vm=0x7ffff7b8aa40 <vlib_global_main>, n=0x7fffb6e88580, next_index=4)

at /home/ubuntu/vpp/build-data/../src/vlib/node_funcs.h:304

#6 0x00007ffff78da300 in vlib_put_next_frame_validate (vm=0x7ffff7b8aa40 <vlib_global_main>, rt=0x7fffb6e88580, next_index=4, n_vectors_left=254)

at /home/ubuntu/vpp/build-data/../src/vlib/main.c:429

#7 0x00007ffff78da5d1 in vlib_put_next_frame (vm=0x7ffff7b8aa40 <vlib_global_main>, r=0x7fffb6e88580, next_index=4, n_vectors_left=254)

at /home/ubuntu/vpp/build-data/../src/vlib/main.c:464

#8 0x00007fffb2e82aa2 in memif_device_input_inline (vm=0x7ffff7b8aa40 <vlib_global_main>, node=0x7fffb6e88580, frame=0x0, mif=0x7fffb6982100,

type=MEMIF_RING_M2S, qid=0, mode=MEMIF_INTERFACE_MODE_ETHERNET) at /home/ubuntu/vpp/build-data/../src/plugins/memif/node.c:496

#9 0x00007fffb2e84891 in memif_input_node_fn_avx2 (vm=0x309c, node=0x7fffb6e88580, frame=0x0) at /home/ubuntu/vpp/build-data/../src/plugins/memif/node.c:903

#10 0x00007ffff78dc3bb in dispatch_node (vm=0x7ffff7b8aa40 <vlib_global_main>, node=0x7fffb6e88580, type=VLIB_NODE_TYPE_INPUT,

dispatch_state=VLIB_NODE_STATE_POLLING, frame=0x0, last_time_stamp=6706732744422024) at /home/ubuntu/vpp/build-data/../src/vlib/main.c:988

#11 0x00007ffff78de2b4 in vlib_main_or_worker_loop (vm=0x7ffff7b8aa40 <vlib_global_main>, is_main=1) at /home/ubuntu/vpp/build-data/../src/vlib/main.c:1507

#12 0x00007ffff78ded67 in vlib_main_loop (vm=0x7ffff7b8aa40 <vlib_global_main>) at /home/ubuntu/vpp/build-data/../src/vlib/main.c:1635

#13 0x00007ffff78df93d in vlib_main (vm=0x7ffff7b8aa40 <vlib_global_main>, input=0x7fffb783afb0) at /home/ubuntu/vpp/build-data/../src/vlib/main.c:1826

#14 0x00007ffff794d995 in thread0 (arg=140737349462592) at /home/ubuntu/vpp/build-data/../src/vlib/unix/main.c:607

#15 0x00007ffff63d0880 in clib_calljmp () at /home/ubuntu/vpp/build-data/../src/vppinfra/longjmp.S:110

#16 0x00007fffffffd1e0 in ?? ()

#17 0x00007ffff794de2c in vlib_unix_main (argc=17, argv=0x7fffffffe4d8) at /home/ubuntu/vpp/build-data/../src/vlib/unix/main.c:674

#18 0x00000000004078c6 in main (argc=17, argv=0x7fffffffe4d8) at /home/ubuntu/vpp/build-data/../src/vpp/vnet/main.c:270

(gdb) frame 5

#5 0x00007ffff78d83ef in vlib_node_runtime_get_next_frame (vm=0x7ffff7b8aa40 <vlib_global_main>, n=0x7fffb6e88580, next_index=4)

at /home/ubuntu/vpp/build-data/../src/vlib/node_funcs.h:304

304 ASSERT (nf->node_runtime_index == next->runtime_index);(gdb) p nf->node_runtime_index

$20 = 410

(gdb) p next->runtime_index

$21 = 420

(gdb) p vm->node_main.nodes_by_type[0][410]

$22 =


{cacheline0 = 0x7fffb6d39400 "\035ۥ\366\377\177", function = 0x7ffff6a5db1d <l2input_node_fn>, errors = 0x7fffb6ee2680,
clocks_since_last_overflow = 0, max_clock = 0, max_clock_n = 0, calls_since_last_overflow = 0, vectors_since_last_overflow = 0, next_frame_index = 1499,
node_index = 466, input_main_loops_per_call = 0, main_loop_count_last_dispatch = 0, main_loop_vector_stats = {0, 0}

, flags = 0, state = 0,

n_next_nodes = 18, cached_next_index = 0, thread_index = 0, runtime_data = 0x7fffb6d39446 ""}

(gdb) p vm->node_main.nodes_by_type[0][420]

$23 =


{cacheline0 = 0x7fffb6d39900 "-\275\236\366\377\177", function = 0x7ffff69ebd2d <ethernet_input>, errors = 0x7fffb7627fd8,
clocks_since_last_overflow = 79708, max_clock = 42232, max_clock_n = 1, calls_since_last_overflow = 4, vectors_since_last_overflow = 4,
next_frame_index = 1643, node_index = 477, input_main_loops_per_call = 0, main_loop_count_last_dispatch = 7804, main_loop_vector_stats = {2, 0}

,

flags = 0, state = 0, n_next_nodes = 13, cached_next_index = 8, thread_index = 0, runtime_data = 0x7fffb6d39946 ""}

(gdb)

  • ayourtch (Fri, 13 Jul 2018 16:56:50 +0000): The previous idea was wrong, because I can recreate the crash with a different sequence of commands:

 

  1. on node2:

binary-api create_loopback mac 8a:f1:be:90:00:01

set int ip address loop0 192.168.1.2/24

set int ip address loop0 fd30::1:b:0:0:2/64

set interface state loop0 up

  1. on node1:

binary-api create_loopback mac 8a:f1:be:90:00:00

binary-api memif_socket_filename_add_del add id 1 filename /shared/all/memif_1_2.sock

binary-api memif_create id 1 socket-id 1 master

set int ip address memif1/1 192.168.1.1/24

set interface state memif1/1 up

  1. on node2:

binary-api memif_socket_filename_add_del add id 1 filename /shared/all/memif_1_2.sock

binary-api memif_create id 1 socket-id 1 slave

set interface state memif1/1 up

create bridge-domain 1 arp-term 1

set interface l2 bridge loop0 1 bvi

set interface l2 bridge memif1/1 1

  1. on node1:

set int ip address memif1/1 fd30::1:b:0:0:1/64

At this point the node2 would crash almost immediately.

This makes me think the issue might be caused by the config in step1..

  • ayourtch (Fri, 13 Jul 2018 16:18:38 +0000): I can reproduce the crash with no ping whatsoever, by applying the following sequence of configurations:

 

  1. on node2:

binary-api create_loopback mac 8a:f1:be:90:00:01

binary-api memif_socket_filename_add_del add id 1 filename /shared/all/memif_1_2.sock

binary-api memif_create id 1 socket-id 1 slave

set int ip address loop0 192.168.1.2/24

set int ip address loop0 fd30::1:b:0:0:2/64

set interface state loop0 up

set interface state memif1/1 up

 

  1. on node1:

 

binary-api create_loopback mac 8a:f1:be:90:00:00

binary-api memif_socket_filename_add_del add id 1 filename /shared/all/memif_1_2.sock

binary-api memif_create id 1 socket-id 1 master

set int ip address memif1/1 192.168.1.1/24

set int ip address memif1/1 fd30::1:b:0:0:1/64

set interface state memif1/1 up

 

  1. on node2 again:

create bridge-domain 1 arp-term 1

set interface l2 bridge loop0 1 bvi

set interface l2 bridge memif1/1 1

 

after a little while, node2 crashes with the same assert as before.

 

This makes me think that some packets received by the node2 while the interface is not in the bridge group yet / does not have ipv6 address cause the issue.

  • ayourtch (Fri, 13 Jul 2018 15:34:39 +0000): Few observations on slightly modified runs:
  1. if I paste the configs of addresses + bridg group imediately (ie rather than doing steps 2,3,4 do steps 2+4 and 3+4 ), the crash does not happen

  2. if I add an IPv4 address config (set int ip address loop0 192.168.1.x/24) to both configurations, do steps 2,3,4, then perform IPv4 ping - the crash happens

  3. if I only have IPv4 address config and do steps 2,3,4, then perform IPv4 ping - the crash does not happen

  4. if I add IPv6 address config after the bridge groups are configured - the crash does not happen

  • ayourtch (Fri, 13 Jul 2018 15:00:26 +0000): I reproduced this issue with two lxc containers. the crash is not immediate, takes a second or two after the non-working ping.

When running the debug build, the crash is different, with an assert:

DBGvpp# 0: /home/ubuntu/vpp/build-data/../src/vlib/node_funcs.h:304 (vlib_node_runtime_get_next_frame) assertion `nf->node_runtime_index == next->runtime_index' fails

Thread 1 "vpp_main" received signal SIGABRT, Aborted.

0x00007ffff5bc4428 in __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:54

54 ../sysdeps/unix/sysv/linux/raise.c: No such file or directory.

(gdb) bt

#0 0x00007ffff5bc4428 in __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:54

#1 0x00007ffff5bc602a in __GI_abort () at abort.c:89

#2 0x0000000000407e82 in os_panic () at /home/ubuntu/vpp/build-data/../src/vpp/vnet/main.c:331

#3 0x00007ffff63aba58 in debugger () at /home/ubuntu/vpp/build-data/../src/vppinfra/error.c:84

#4 0x00007ffff63abe90 in _clib_error (how_to_die=2, function_name=0x0, line_number=0, fmt=0x7ffff7969620 "%s:%d (%s) assertion `%s' fails")

at /home/ubuntu/vpp/build-data/../src/vppinfra/error.c:143

#5 0x00007ffff78d83ef in vlib_node_runtime_get_next_frame (vm=0x7ffff7b8aa40 <vlib_global_main>, n=0x7fffb796b1c0, next_index=4)

at /home/ubuntu/vpp/build-data/../src/vlib/node_funcs.h:304

#6 0x00007ffff78da300 in vlib_put_next_frame_validate (vm=0x7ffff7b8aa40 <vlib_global_main>, rt=0x7fffb796b1c0, next_index=4, n_vectors_left=255)

at /home/ubuntu/vpp/build-data/../src/vlib/main.c:429

#7 0x00007ffff78da5d1 in vlib_put_next_frame (vm=0x7ffff7b8aa40 <vlib_global_main>, r=0x7fffb796b1c0, next_index=4, n_vectors_left=255)

at /home/ubuntu/vpp/build-data/../src/vlib/main.c:464

#8 0x00007fffb2e82aa2 in memif_device_input_inline (vm=0x7ffff7b8aa40 <vlib_global_main>, node=0x7fffb796b1c0, frame=0x0, mif=0x7fffb6aa3e40,

type=MEMIF_RING_M2S, qid=0, mode=MEMIF_INTERFACE_MODE_ETHERNET) at /home/ubuntu/vpp/build-data/../src/plugins/memif/node.c:496

#9 0x00007fffb2e84891 in memif_input_node_fn_avx2 (vm=0x6a1f, node=0x7fffb796b1c0, frame=0x0) at /home/ubuntu/vpp/build-data/../src/plugins/memif/node.c:903

#10 0x00007ffff78dc3bb in dispatch_node (vm=0x7ffff7b8aa40 <vlib_global_main>, node=0x7fffb796b1c0, type=VLIB_NODE_TYPE_INPUT,

dispatch_state=VLIB_NODE_STATE_POLLING, frame=0x0, last_time_stamp=6678623582126224) at /home/ubuntu/vpp/build-data/../src/vlib/main.c:988

#11 0x00007ffff78de2b4 in vlib_main_or_worker_loop (vm=0x7ffff7b8aa40 <vlib_global_main>, is_main=1) at /home/ubuntu/vpp/build-data/../src/vlib/main.c:1507

#12 0x00007ffff78ded67 in vlib_main_loop (vm=0x7ffff7b8aa40 <vlib_global_main>) at /home/ubuntu/vpp/build-data/../src/vlib/main.c:1635

#13 0x00007ffff78df93d in vlib_main (vm=0x7ffff7b8aa40 <vlib_global_main>, input=0x7fffb723afb0) at /home/ubuntu/vpp/build-data/../src/vlib/main.c:1826

#14 0x00007ffff794d995 in thread0 (arg=140737349462592) at /home/ubuntu/vpp/build-data/../src/vlib/unix/main.c:607

#15 0x00007ffff63d0880 in clib_calljmp () at /home/ubuntu/vpp/build-data/../src/vppinfra/longjmp.S:110

#16 0x00007fffffffd1e0 in ?? ()

#17 0x00007ffff794de2c in vlib_unix_main (argc=17, argv=0x7fffffffe4d8) at /home/ubuntu/vpp/build-data/../src/vlib/unix/main.c:674

#18 0x00000000004078c6 in main (argc=17, argv=0x7fffffffe4d8) at /home/ubuntu/vpp/build-data/../src/vpp/vnet/main.c:270

(gdb)

 

Also, once I saw this crash even before pinging, which leads me to think the issue might be isolated to the receiving side and might not be specific to ping send path.

Original issue: https://jira.fd.io/browse/VPP-1341

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions