[PATCH] more list node initialisation

andreas at rammhold.de andreas at rammhold.de
Wed Nov 25 21:27:14 CET 2020


On 15:59 25.11.20, Ondrej Zajicek wrote:
> Thanks, merged (with some minor changes). The first issue was also
> reported earlier by Mikael Magnusson.

Thanks!

> > I am still hunting another variant of this that crashes BIRD (with
> > --enable-debug) whenever a Babel peer is shutting down..
> 
> Is is somehow complicated? IMHO this class of issues shoud die() in
> add_tail(), and exact location should be clear from stack trace.

It didn't die with that visible in the stack trace. Here is what I did
see:

(gdb) bt
#0  babel_announce_rte (p=p at entry=0x1052790, e=e at entry=0x10bbe80) at proto/babel/babel.c:674
#1  0x0000000000453547 in babel_select_route (p=p at entry=0x1052790, e=0x10bbe80, mod=mod at entry=0x10ff3b8) at proto/babel/babel.c:780
#2  0x0000000000454f4b in babel_retract_route (r=0x10ff3b8, p=0x1052790) at proto/babel/babel.c:178
#3  babel_handle_update (m=<optimized out>, ifa=<optimized out>) at proto/babel/babel.c:1240
#4  0x00000000004575b6 in babel_process_packet (pkt=0x10c4cf0, len=<optimized out>, saddr=..., ifa=0x10bf940) at proto/babel/packets.c:1461
#5  0x0000000000457684 in babel_rx_hook (sk=<optimized out>, len=<optimized out>) at proto/babel/packets.c:1520
#6  0x0000000000490ef1 in sk_read (s=s at entry=0x10c4ba0, revents=<optimized out>) at sysdep/unix/io.c:1910
#7  0x000000000049179c in io_loop () at sysdep/unix/io.c:2342
#8  0x000000000049574e in main (argc=<optimized out>, argv=<optimized out>) at sysdep/unix/main.c:923

Keep in mind that a few of the babel.c line numbers might be +/- a few
as I've started adding my v4-via-v6 changes to it. The setup didn't have
any of those in the routing table tho so I am pretty sure they aren't
the source of the issue. When disabling the debug flag the crashes were
gone again.

(gdb) info locals
a0 = {next = 0x0, pprev = 0x0, uc = 0, hash_key = 0, eattrs = 0x0, src = 0x105a498, hostentry = 0x0, from = {addr = {0, 0, 0, 0}},
      igp_metric = 0, source = 13 '\r', scope = 4 '\004', dest = 3 '\003', aflags = 0 '\000',
      nh = {gw = {addr = {0, 0, 0, 0}}, iface = 0x0, next = 0x0, flags = 0 '\000', weight = 0 '\000',
            labels_orig = 0 '\000', labels = 0 '\000', label = 0x7ffffbd7391c}
     }
a = <optimized out>
rte = <optimized out>
r = 0x0
c = 0x10529f0

Going by the line numbers the crash should occur roughly here (+/- a few
because memset and those assignments above might have been reordered):

babel_announce_rte():
 …
   660 │   else if (e->valid && (e->router_id != p->router_id))
   661 │   {
   662 │     /* Unreachable */
   663 │     rta a0 = {
   664 │       .src = p->p.main_source,
   665 │       .source = RTS_BABEL,
   666 │       .scope = SCOPE_UNIVERSE,
   667 │       .dest = RTD_UNREACHABLE,
   668 │     };
   669 │
   670 │     rta *a = rta_lookup(&a0);
   671 │     rte *rte = rte_get_temp(a);
   672 │     memset(&rte->u.babel, 0, sizeof(rte->u.babel));
   673 │     rte->pflags = 0;
-> 674 │     rte->pref = 1;
   675 │
   676 │     e->unreachable = 1;
   677 │     rte_update2(c, e->n.addr, rte, p->p.main_source);

The assembly at $rip looks like this:

   0x0000000000451f9e <+367>:	rep stos QWORD PTR es:[rdi],rax
   0x0000000000451fa1 <+370>:	mov    rax,QWORD PTR [r12+0x58]
   0x0000000000451fa6 <+375>:	mov    QWORD PTR [rsp+0x20],rax
   0x0000000000451fab <+380>:	mov    BYTE PTR [rsp+0x44],0xd
   0x0000000000451fb0 <+385>:	mov    BYTE PTR [rsp+0x45],0x4
   0x0000000000451fb5 <+390>:	mov    BYTE PTR [rsp+0x46],0x3
=> 0x0000000000451fba <+395>:	mov    rax,QWORD PTR [rbp+0x28]
   0x0000000000451fbe <+399>:	mov    rax,QWORD PTR [rax+0x10]
   0x0000000000451fc2 <+403>:	mov    rcx,QWORD PTR [rax+0x18]
   0x0000000000451fc6 <+407>:	mov    rsi,QWORD PTR [rbp+0x40]
   0x0000000000451fca <+411>:	mov    rdx,QWORD PTR [rbp+0x48]

$ print $rbp
$1 = (void *) 0x0

rbp being 0 is likely what causes the crash here? I haven't invested
much more time into this.


andi-


More information about the Bird-users mailing list