IPv6 BGP debugging

Alexander V. Chernikov melifaro at yandex-team.ru
Fri Apr 11 17:03:21 CEST 2014

On 11.04.2014 13:16, Ondrej Zajicek wrote:
> On Thu, Apr 10, 2014 at 09:44:57PM +0400, Alexander V. Chernikov wrote:
>> On 10.04.2014 12:41, Ondrej Zajicek wrote:
>>> On Thu, Apr 10, 2014 at 10:14:03AM +0400, Peter Andreev wrote:
>>>> Hi Alexander,
>>>> I tried "debug MLPA1 all" from birdc console, but nothing new appeared in
>>>> log file.
>>>> Currently I rolled back to 1.3.10 version from ports because 1.4.2 started
>>>> to crash with the following backtrace:
>>> Do you have the same socket problems in 1.3.10 or not?
>> We're currently investigating this further, it really looks like
>> OS-dependent problem.
>> However, it seems that core was triggered by uninitialized rta *a in
>> IPv6 bgp_do_rx_update().
> Hmm, that seems like sime kind of strange coincidence, as the rta *a variable
> is not used before it is initialized (see attached patch).
Well, not exactly. DECODE_PREFIX() can perform 'goto done' for invalid 
prefix being withdrawn. In that case we'll probably get garbage in a.

Anyway, initially I was talking about IPv6 case (which has different 
bgp_do_rx_update() version).
The same there, DO_NLRI() can jump to 'done' label before *a 
initialization which actually happened.

I suddenly realized that we're speaking about different crash cases: the 
one I'm talking about is much clearer:

#0 0x000000000042fa37 in rta_free (r=0x40118adb8) at route.h:476
#1 0x000000000042f4c2 in bgp_do_rx_update (conn=0x801007dc8, 
withdrawn=0x801f3b015 "", withdrawn_len=0, nlri=0x801f3b055 "^�", 
nlri_len=0, attrs=0x801f3b017 "\220\017", attr_len=62)
at ../../../proto/bgp/packets.c:1255
#2 0x000000000042fc04 in bgp_rx_update (conn=0x801007dc8, 
pkt=0x801f3b000 '�' <repeats 16 times>, len=85) at 
#3 0x000000000043011e in bgp_rx_packet (conn=0x801007dc8, 
pkt=0x801f3b000 '�' <repeats 16 times>, len=85) at 
#4 0x0000000000430283 in bgp_rx (sk=0x80101b140, size=85) at 
#5 0x000000000045eb5a in sk_read (s=0x80101b140) at io.c:1602
#6 0x000000000045f5d4 in io_loop () at io.c:1843
#7 0x00000000004669b3 in main (argc=3, argv=0x7fffffffdb20) at main.c:820

(gdb) x/96xb pkt
0x801f3b000: 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff
0x801f3b008: 0xff 0xff 0xff 0xff 0xff 0xff 0xff 0xff
0x801f3b010: 0x00 0x55 0x02 0x00 0x00 0x00 0x3e 0x90
0x801f3b018: 0x0f 0x00 0x3a 0x00 0x02 0x01 0x20 0x20
0x801f3b020: 0x01 0x1b 0xf8 0x30 0x2a 0x02 0x16 0xd8
0x801f3b028: 0x01 0x01 0x20 0x2a 0x00 0x17 0x80 0x30
0x801f3b030: 0x20 0x01 0x06 0x7c 0x23 0x2c 0x30 0x2a
0x801f3b038: 0x02 0x16 0xd8 0x01 0x03 0x20 0x2a 0x02
0x801f3b040: 0x16 0xd8 0x30 0x2a 0x02 0x16 0xd8 0x01
0x801f3b048: 0x02 0x20 0x28 0x00 0x06 0x70 0x30 0x2a
0x801f3b050: 0x02 0x16 0xd8 0x01 0x04 0x5e 0xab 0x00
0x801f3b058: 0x05 0xda 0xb7 0xc1 0x00 0x30 0x20 0x01

(gdb) p p->mp_reach_len
$7 = 0
(gdb) p p->mp_unreach_len
$8 = 58

DO_NLRI(mp_reach) is not used so "a" assignment does not happen.

Btw, clang yells for both cases.

> Perhaps it is related to the bug fixed in efd6d12b975441c7e1875a59dd9e0f3db7e958cb ?
> Are these compiler options used in FreeBSD build?

More information about the Bird-users mailing list