bird6 1.6.2 hangs doing recvmsg on netlink socket
Israel G. Lugo
israel.lugo at lugosys.com
Fri Dec 2 19:46:36 CET 2016
Hello,
I am getting some random crashes in bird6, running on Debian, version
1.6.2-1~bpo8+1 from your http://bird.network.cz/debian/ repository.
I've got a single OSPF instance with 74 routes, one eBGP session
receiving a default route, and one iBGP session with another Bird
router, which sends me its own default.
What happens is that, from time to time, bird6 becomes stuck in an
infinite loop doing recvmsg() on a netlink socket, and IPv6 routes are
lost. The interval seems random; it's been 3 days, and it's also been 2
weeks.
gk1 # strace -p 11465
recvmsg(7, 0x7ffe8cfecb70, 0) = -1 EAGAIN (Resource
temporarily unavailable)
recvmsg(7, 0x7ffe8cfecb70, 0) = -1 EAGAIN (Resource
temporarily unavailable)
recvmsg(7, 0x7ffe8cfecb70, 0) = -1 EAGAIN (Resource
temporarily unavailable)
recvmsg(7, 0x7ffe8cfecb70, 0) = -1 EAGAIN (Resource
temporarily unavailable)
recvmsg(7, 0x7ffe8cfecb70, 0) = -1 EAGAIN (Resource
temporarily unavailable)
recvmsg(7, 0x7ffe8cfecb70, 0) = -1 EAGAIN (Resource
temporarily unavailable)
recvmsg(7, 0x7ffe8cfecb70, 0) = -1 EAGAIN (Resource
temporarily unavailable)
recvmsg(7, 0x7ffe8cfecb70, 0) = -1 EAGAIN (Resource
temporarily unavailable)
recvmsg(7, 0x7ffe8cfecb70, 0) = -1 EAGAIN (Resource
temporarily unavailable)
recvmsg(7, 0x7ffe8cfecb70, 0) = -1 EAGAIN (Resource
temporarily unavailable)
recvmsg(7, 0x7ffe8cfecb70, 0) = -1 EAGAIN (Resource
temporarily unavailable)
[...]
File descriptor 7 is a netlink socket:
gk1 # lsof -p 11465
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
bird6 11465 bird cwd DIR 253,0 4096 2 /
bird6 11465 bird rtd DIR 253,0 4096 2 /
bird6 11465 bird txt REG 253,0 540648 787381
/usr/sbin/bird6
bird6 11465 bird mem REG 253,0 47712 659204
/lib/x86_64-linux-gnu/libnss_files-2.19.so
bird6 11465 bird mem REG 253,0 43592 659208
/lib/x86_64-linux-gnu/libnss_nis-2.19.so
bird6 11465 bird mem REG 253,0 89104 659199
/lib/x86_64-linux-gnu/libnsl-2.19.so
bird6 11465 bird mem REG 253,0 31632 659200
/lib/x86_64-linux-gnu/libnss_compat-2.19.so
bird6 11465 bird mem REG 253,0 1738176 659160
/lib/x86_64-linux-gnu/libc-2.19.so
bird6 11465 bird mem REG 253,0 137440 655379
/lib/x86_64-linux-gnu/libpthread-2.19.so
bird6 11465 bird mem REG 253,0 140928 655799
/lib/x86_64-linux-gnu/ld-2.19.so
bird6 11465 bird 0u CHR 1,3 0t0 1028
/dev/null
bird6 11465 bird 1u CHR 1,3 0t0 1028
/dev/null
bird6 11465 bird 2u CHR 1,3 0t0 1028
/dev/null
bird6 11465 bird 3u unix 0xffff8803269f7c00 0t0 127941139
socket
bird6 11465 bird 4u unix 0xffff8803269f7480 0t0 127941145
/run/bird/bird6.ctl
bird6 11465 bird 5u netlink 0t0 127906248
ROUTE
bird6 11465 bird 6u netlink 0t0 127906249
ROUTE
bird6 11465 bird 7u netlink 0t0 127906250
ROUTE
bird6 11465 bird 8u IPv6 127906251 0t0 TCP
*:bgp (LISTEN)
bird6 11465 bird 9u raw6 0t0 127906252
00000000000000000000000000000000:0059->00000000000000000000000000000000:0000
st=07
bird6 11465 bird 10u IPv6 127994711 0t0 TCP
e0.gk1:bgp->e0.gk2:39074 (CLOSE_WAIT)
bird6 11465 bird 11u IPv6 127965176 0t0 TCP
[2001:w:y:x::133]:58268->[2001:w:y:x::1]:bgp (CLOSE_WAIT)
Unfortunately I didn't find any debug symbols for this package, so all I
could get from gdb was the following:
(gdb) bt
#0 0x00007f5ad1705e80 in __recvmsg_nocancel () at
../sysdeps/unix/syscall-template.S:81
#1 0x00007f5ad1b90428 in ?? ()
#2 0x00007f5ad1b8956b in ?? ()
#3 0x00007f5ad1b8a06b in ?? ()
#4 0x00007f5ad1b3f0c7 in ?? ()
#5 0x00007f5ad136db45 in __libc_start_main (main=0x7f5ad1b3eb10,
argc=5, argv=0x7ffe8cfece28, init=<optimized out>, fini=<optimized out>,
rtld_fini=<optimized out>, stack_end=0x7ffe8cfece18)
at libc-start.c:287
#6 0x00007f5ad1b3f3ec in ?? ()
(gdb) info r
rax 0xfffffffffffffff5 -11
rbx 0x7f5ad32aefe0 140028066590688
rcx 0xffffffffffffffff -1
rdx 0x0 0
rsi 0x7ffe8cfecb70 140731263929200
rdi 0x7 7
rbp 0x7f5ad1dba270 0x7f5ad1dba270
rsp 0x7ffe8cfecb18 0x7ffe8cfecb18
r8 0x7f5ad32aefe0 140028066590688
r9 0x0 0
r10 0x1 1
r11 0x246 582
r12 0x0 0
r13 0x7f5ad32c7f60 140028066692960
r14 0x100 256
r15 0x0 0
rip 0x7f5ad1705e80 0x7f5ad1705e80 <__recvmsg_nocancel+7>
eflags 0x246 [ PF ZF IF ]
cs 0x33 51
ss 0x2b 43
ds 0x0 0
es 0x0 0
fs 0x0 0
gs 0x0 0
Unfortunately, I did not have debug on when this crashed. I had it on
for several days, but either I was "lucky" or the debug prevented the
crash somehow. I was having several MB worth of debug logs every day, so
I ended up disabling debug.
I'm not 100% sure that this was installed from your CZ repository, it
may have been from Debian backports. But I'm 95% sure it came from CZ.
In any case the MD5 is as follows:
56e48e8e5a1380b384f1758df2077e53 bird_1.6.2-1~bpo8+1_amd64.deb
I have now upgraded to 1.6.2-3~bpo8+1, from your CZ repository.
I can provide the configuration file off-list, if that helps.
Regards,
Israel
More information about the Bird-users
mailing list