Address update causing infinite loop
James Oakley
james at ttgi.io
Mon Jan 31 21:20:47 CET 2022
Hi,
We've had an occasional issue where we found BIRD stuck using 100% CPU a few
times over the past couple of years. Whenever it happened, all protocols
stopped running and birdc was unable to communicate with the process.
Unfortunately, it was usually in a situation where we could not quite see what
was going so we ended up having to kill the process and restart it to get
things working again.
However, today it finally happened again, and I was able to grab the debug
symbols for the build so I could inspect it in gdb.
So here is the backtrace:
(gdb) bt
#0 if_recalc_preferred (i=0x56541cd48de0) at nest/iface.c:518
#1 0x000056541c662d55 in if_end_partial_update (i=i at entry=0x56541cd48de0) at nest/iface.c:359
#2 0x000056541c6b19cb in nl_parse_addr4 (new=1, scan=0, i=0x56541cc20700) at sysdep/linux/netlink.c:1020
#3 nl_parse_addr (h=h at entry=0x56541cc206f0, scan=scan at entry=0) at sysdep/linux/netlink.c:1128
#4 0x000056541c6b3127 in nl_async_msg (h=0x56541cc206f0) at sysdep/linux/netlink.c:1932
#5 nl_async_hook (sk=<optimized out>, size=<optimized out>) at sysdep/linux/netlink.c:1985
#6 0x000056541c6b74a0 in sk_read (s=s at entry=0x56541cc22700, revents=<optimized out>) at sysdep/unix/io.c:1896
#7 0x000056541c6b80ec in io_loop () at sysdep/unix/io.c:2345
#8 0x000056541c633f61 in main (argc=<optimized out>, argv=<optimized out>) at sysdep/unix/main.c:939
If I keep hitting next, it appears to be stuck in a loop:
(gdb) n
524 nest/iface.c: No such file or directory.
(gdb) n
514 in nest/iface.c
(gdb) n
517 in nest/iface.c
(gdb) n
518 in nest/iface.c
(gdb) n
524 in nest/iface.c
(gdb) n
514 in nest/iface.c
(gdb) n
517 in nest/iface.c
(gdb) n
518 in nest/iface.c
(gdb) n
524 in nest/iface.c
(gdb) n
514 in nest/iface.c
So here is what that code looks like:
510 struct ifa *a;
511 WALK_LIST(a, i->addrs)
512 {
513 /* Secondary address is never selected */
514 if (a->flags & IA_SECONDARY)
515 continue;
516
517 if (ipa_is_ip4(a->ip)) {
518 if (!a4 || ipa_equal(a->ip, pref_v4))
519 a4 = a;
520 } else if (!ipa_is_link_local(a->ip)) {
521 if (!a6 || ipa_equal(a->ip, ic->pref_v6))
522 a6 = a;
523 } else {
524 if (!ll || ipa_equal(a->ip, ic->pref_ll))
525 ll = a;
526 }
527 }
OK, we are iterating over a linked link. Let's look at the list:
(gdb) p i->addrs
$30 = {{head_node = {next = 0x56541cd02b30, prev = 0x0}, head_padding = 0x56541cd02b30}, {tail_padding = 0x56541cd02b30, tail_node = {next = 0x0, prev = 0x56541cd02b30}}, {head = 0x56541cd02b30, null = 0x0,
tail = 0x56541cd02b30}}
(gdb) p i->addrs.head
$31 = (struct node *) 0x56541cd02b30
(gdb) p *i->addrs.head
$32 = {next = 0x56541cd02b30, prev = 0x56541cd02b30}
So there is a single item, with next pointing to itself.
Looking up the stack, it seems the only place here where this list is modified
is when nl_parse_addr4() calls ifa_update() which then calls add_tail(), which
looks like this:
EXPENSIVE_CHECK(check_list(l, NULL));
ASSUME(n->prev == NULL);
ASSUME(n->next == NULL);
node *z = l->tail;
n->next = &l->tail_node;
n->prev = z;
z->next = n;
l->tail = n;
So this is where I am stumped. How could it even be possible for the item
next pointer to be set to it's own address, and not that of the static tail
node?
Any insights?
--
James Oakley
james at ttgi.io
More information about the Bird-users
mailing list