Address update causing infinite loop

Maria Matejka maria.matejka at nic.cz
Mon Jan 31 22:02:31 CET 2022


Hello!

This is a bug which may be something completely different as the linked 
list item is already screwed. It may happen typically by removing an 
already removed node. Could you please disclose your BIRD version and 
configuration, or at least whether you run BFD which is the only 
multithreaded part for now?

Side note: We should also probably at least log a warning when ASSUME() 
fails to check for adding an already added item into a linked list.

Thanks
Maria

On 1/31/22 9:20 PM, James Oakley wrote:
> Hi,
> 
> We've had an occasional issue where we found BIRD stuck using 100% CPU a few
> times over the past couple of years. Whenever it happened, all protocols
> stopped running and birdc was unable to communicate with the process.
> Unfortunately, it was usually in a situation where we could not quite see what
> was going so we ended up having to kill the process and restart it to get
> things working again.
> 
> However, today it finally happened again, and I was able to grab the debug
> symbols for the build so I could inspect it in gdb.
> 
> So here is the backtrace:
> 
> (gdb) bt
> #0  if_recalc_preferred (i=0x56541cd48de0) at nest/iface.c:518
> #1  0x000056541c662d55 in if_end_partial_update (i=i at entry=0x56541cd48de0) at nest/iface.c:359
> #2  0x000056541c6b19cb in nl_parse_addr4 (new=1, scan=0, i=0x56541cc20700) at sysdep/linux/netlink.c:1020
> #3  nl_parse_addr (h=h at entry=0x56541cc206f0, scan=scan at entry=0) at sysdep/linux/netlink.c:1128
> #4  0x000056541c6b3127 in nl_async_msg (h=0x56541cc206f0) at sysdep/linux/netlink.c:1932
> #5  nl_async_hook (sk=<optimized out>, size=<optimized out>) at sysdep/linux/netlink.c:1985
> #6  0x000056541c6b74a0 in sk_read (s=s at entry=0x56541cc22700, revents=<optimized out>) at sysdep/unix/io.c:1896
> #7  0x000056541c6b80ec in io_loop () at sysdep/unix/io.c:2345
> #8  0x000056541c633f61 in main (argc=<optimized out>, argv=<optimized out>) at sysdep/unix/main.c:939
> 
> If I keep hitting next, it appears to be stuck in a loop:
> 
> (gdb) n
> 524     nest/iface.c: No such file or directory.
> (gdb) n
> 514     in nest/iface.c
> (gdb) n
> 517     in nest/iface.c
> (gdb) n
> 518     in nest/iface.c
> (gdb) n
> 524     in nest/iface.c
> (gdb) n
> 514     in nest/iface.c
> (gdb) n
> 517     in nest/iface.c
> (gdb) n
> 518     in nest/iface.c
> (gdb) n
> 524     in nest/iface.c
> (gdb) n
> 514     in nest/iface.c
> 
> So here is what that code looks like:
> 
> 510  struct ifa *a;
> 511  WALK_LIST(a, i->addrs)
> 512    {
> 513      /* Secondary address is never selected */
> 514      if (a->flags & IA_SECONDARY)
> 515         continue;
> 516
> 517      if (ipa_is_ip4(a->ip)) {
> 518         if (!a4 || ipa_equal(a->ip, pref_v4))
> 519           a4 = a;
> 520      } else if (!ipa_is_link_local(a->ip)) {
> 521        if (!a6 || ipa_equal(a->ip, ic->pref_v6))
> 522          a6 = a;
> 523      } else {
> 524        if (!ll || ipa_equal(a->ip, ic->pref_ll))
> 525          ll = a;
> 526      }
> 527    }
> 
> OK, we are iterating over a linked link. Let's look at the list:
> 
> (gdb) p i->addrs
> $30 = {{head_node = {next = 0x56541cd02b30, prev = 0x0}, head_padding = 0x56541cd02b30}, {tail_padding = 0x56541cd02b30, tail_node = {next = 0x0, prev = 0x56541cd02b30}}, {head = 0x56541cd02b30, null = 0x0,
>      tail = 0x56541cd02b30}}
> (gdb) p i->addrs.head
> $31 = (struct node *) 0x56541cd02b30
> (gdb) p *i->addrs.head
> $32 = {next = 0x56541cd02b30, prev = 0x56541cd02b30}
> 
> So there is a single item, with next pointing to itself.
> 
> Looking up the stack, it seems the only place here where this list is modified
> is when nl_parse_addr4() calls ifa_update() which then calls add_tail(), which
> looks like this:
> 
>    EXPENSIVE_CHECK(check_list(l, NULL));
>    ASSUME(n->prev == NULL);
>    ASSUME(n->next == NULL);
> 
>    node *z = l->tail;
> 
>    n->next = &l->tail_node;
>    n->prev = z;
>    z->next = n;
>    l->tail = n;
> 
> So this is where I am stumped. How could it even be possible for the item
> next pointer to be set to it's own address, and not that of the static tail
> node?
> 
> Any insights?
> 


More information about the Bird-users mailing list