BIRD crashing when --enable-debug is set

Toke Høiland-Jørgensen toke at toke.dk
Sun Nov 22 19:01:52 CET 2020


andreas at rammhold.de writes:

> On 18:21 22.11.20, Toke Høiland-Jørgensen wrote:
>> andreas at rammhold.de writes:
>> > I've been playing with the BIRD source code and implementing a new
>> > extension to Babel but that is not what this mail is about.
>> 
>> Still curious - which extension? :)
>
> The V4 over/via V6 draft. I would link to it but it seems the IETF
> website is currently unreachable (or I am being cloudflared…).

Ah, cool! That was on my list as well, but good to know you're working
on it! I'll let you have at it, then :)

>> > Breakpoint 1, add_tail (n=0x234ac30, l=0x241f620) at ./lib/lists.c:82
>> > #0  add_tail (n=0x234ac30, l=0x241f620) at ./lib/lists.c:82
>> > #1  sl_alloc (s=0x21bd7d0) at lib/slab.c:264
>> 
>> This is that call to add_tail in slab.c, with a few lines of context:
>> 
>> full_partial:
>>   rem_node(&h->n);
>>   add_tail(&s->full_heads, &h->n);
>>   goto redo;
>> 
>> Since add_tail is being called immediately after rem_node(), and they I
>> both inlines, my guess would be that this comes from the compiler
>> optimising out the clearing of n->prev and n->next in rem_node() since
>> they are immediately being reset.
>
> Interesting. The compiler removes the first assignment becuase it is
> rewritten (across function boundaries/after inlining) but doesn't
> realize that it is being read?

Hmm, yeah, good point, if you have a read in-between that optimisation
really shouldn't happen - but maybe the compiler is rearranging things
in other interesting ways? I guess you'd have to inspect the generated
code to know for sure...

>> > Are these all false positives or should there be more cleanup code on
>> > these list entries?
>> 
>> Speaking for the backtraces you saw in the Babel code, these all come
>> from variants of the following pattern:
>> 
>> struct babel_msg_node *msgn = sl_alloc(p->msg_slab);
>> /* initialise msgn->msg */
>> add_tail(&queue, NODE msgn);
>> 
>> where babel_msg_node is just:
>> 
>> struct babel_msg_node {
>>   node n;
>>   union babel_msg msg;
>> };
>> 
>> Since add_tail sets n->next and n->prev right after those asserts, no
>> harm is actually done and the memory ends up fully initialised, so I'm
>> not sure if there's any reason to do anything about it: It's really
>> only a problem because of the debug asserts.
>
> That was my hope as well. Maybe there are some attributes / qualifiers
> that could be added to the variables (when the debug flag is set) to
> prohibit this optimisation? Having these checks is probably useful (e.g.
> for CI, tests & development) but only if it doesn't break the entire
> software.

Yeah, it is a bit annoying that it breaks debug builds. My preference
would be to just have the slab code zero out objects, but let's see what
the Bird devs think :)

-Toke



More information about the Bird-users mailing list