bird6 1.6.2 hangs doing recvmsg on netlink socket
Israel G. Lugo
israel.lugo at lugosys.com
Fri Dec 9 13:26:33 CET 2016
Just had another crash, 7 days after my previous email. Exact same
symptoms, this time with the latest version from CZ repository:
1.6.2-3~bpo8+1.
bird6 stuck on recvmsg using 100% CPU, getting EAGAIN in an infinite loop:
# strace -p 23020
recvmsg(7, 0x7ffc45ae0ab0, 0) = -1 EAGAIN (Resource
temporarily unavailable)
recvmsg(7, 0x7ffc45ae0ab0, 0) = -1 EAGAIN (Resource
temporarily unavailable)
recvmsg(7, 0x7ffc45ae0ab0, 0) = -1 EAGAIN (Resource
temporarily unavailable)
recvmsg(7, 0x7ffc45ae0ab0, 0) = -1 EAGAIN (Resource
temporarily unavailable)
recvmsg(7, 0x7ffc45ae0ab0, 0) = -1 EAGAIN (Resource
temporarily unavailable)
[...]
None of this happened in 1.5.0.
What can I do to help troubleshoot this? This is a major regression and
it's making me seriously concerned about both my edge routers using the
same version of Bird.
On 12/02/2016 06:46 PM, Israel G. Lugo wrote:
> Hello,
>
> I am getting some random crashes in bird6, running on Debian, version
> 1.6.2-1~bpo8+1 from your http://bird.network.cz/debian/ repository.
>
> I've got a single OSPF instance with 74 routes, one eBGP session
> receiving a default route, and one iBGP session with another Bird
> router, which sends me its own default.
>
> What happens is that, from time to time, bird6 becomes stuck in an
> infinite loop doing recvmsg() on a netlink socket, and IPv6 routes are
> lost. The interval seems random; it's been 3 days, and it's also been 2
> weeks.
>
>
> gk1 # strace -p 11465
> recvmsg(7, 0x7ffe8cfecb70, 0) = -1 EAGAIN (Resource
> temporarily unavailable)
> recvmsg(7, 0x7ffe8cfecb70, 0) = -1 EAGAIN (Resource
> temporarily unavailable)
> recvmsg(7, 0x7ffe8cfecb70, 0) = -1 EAGAIN (Resource
> temporarily unavailable)
> recvmsg(7, 0x7ffe8cfecb70, 0) = -1 EAGAIN (Resource
> temporarily unavailable)
> recvmsg(7, 0x7ffe8cfecb70, 0) = -1 EAGAIN (Resource
> temporarily unavailable)
> recvmsg(7, 0x7ffe8cfecb70, 0) = -1 EAGAIN (Resource
> temporarily unavailable)
> recvmsg(7, 0x7ffe8cfecb70, 0) = -1 EAGAIN (Resource
> temporarily unavailable)
> recvmsg(7, 0x7ffe8cfecb70, 0) = -1 EAGAIN (Resource
> temporarily unavailable)
> recvmsg(7, 0x7ffe8cfecb70, 0) = -1 EAGAIN (Resource
> temporarily unavailable)
> recvmsg(7, 0x7ffe8cfecb70, 0) = -1 EAGAIN (Resource
> temporarily unavailable)
> recvmsg(7, 0x7ffe8cfecb70, 0) = -1 EAGAIN (Resource
> temporarily unavailable)
> [...]
>
> File descriptor 7 is a netlink socket:
>
> gk1 # lsof -p 11465
> COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
> bird6 11465 bird cwd DIR 253,0 4096 2 /
> bird6 11465 bird rtd DIR 253,0 4096 2 /
> bird6 11465 bird txt REG 253,0 540648 787381
> /usr/sbin/bird6
> bird6 11465 bird mem REG 253,0 47712 659204
> /lib/x86_64-linux-gnu/libnss_files-2.19.so
> bird6 11465 bird mem REG 253,0 43592 659208
> /lib/x86_64-linux-gnu/libnss_nis-2.19.so
> bird6 11465 bird mem REG 253,0 89104 659199
> /lib/x86_64-linux-gnu/libnsl-2.19.so
> bird6 11465 bird mem REG 253,0 31632 659200
> /lib/x86_64-linux-gnu/libnss_compat-2.19.so
> bird6 11465 bird mem REG 253,0 1738176 659160
> /lib/x86_64-linux-gnu/libc-2.19.so
> bird6 11465 bird mem REG 253,0 137440 655379
> /lib/x86_64-linux-gnu/libpthread-2.19.so
> bird6 11465 bird mem REG 253,0 140928 655799
> /lib/x86_64-linux-gnu/ld-2.19.so
> bird6 11465 bird 0u CHR 1,3 0t0 1028
> /dev/null
> bird6 11465 bird 1u CHR 1,3 0t0 1028
> /dev/null
> bird6 11465 bird 2u CHR 1,3 0t0 1028
> /dev/null
> bird6 11465 bird 3u unix 0xffff8803269f7c00 0t0 127941139
> socket
> bird6 11465 bird 4u unix 0xffff8803269f7480 0t0 127941145
> /run/bird/bird6.ctl
> bird6 11465 bird 5u netlink 0t0 127906248
> ROUTE
> bird6 11465 bird 6u netlink 0t0 127906249
> ROUTE
> bird6 11465 bird 7u netlink 0t0 127906250
> ROUTE
> bird6 11465 bird 8u IPv6 127906251 0t0 TCP
> *:bgp (LISTEN)
> bird6 11465 bird 9u raw6 0t0 127906252
> 00000000000000000000000000000000:0059->00000000000000000000000000000000:0000
> st=07
> bird6 11465 bird 10u IPv6 127994711 0t0 TCP
> e0.gk1:bgp->e0.gk2:39074 (CLOSE_WAIT)
> bird6 11465 bird 11u IPv6 127965176 0t0 TCP
> [2001:w:y:x::133]:58268->[2001:w:y:x::1]:bgp (CLOSE_WAIT)
>
> Unfortunately I didn't find any debug symbols for this package, so all I
> could get from gdb was the following:
>
> (gdb) bt
> #0 0x00007f5ad1705e80 in __recvmsg_nocancel () at
> ../sysdeps/unix/syscall-template.S:81
> #1 0x00007f5ad1b90428 in ?? ()
> #2 0x00007f5ad1b8956b in ?? ()
> #3 0x00007f5ad1b8a06b in ?? ()
> #4 0x00007f5ad1b3f0c7 in ?? ()
> #5 0x00007f5ad136db45 in __libc_start_main (main=0x7f5ad1b3eb10,
> argc=5, argv=0x7ffe8cfece28, init=<optimized out>, fini=<optimized out>,
> rtld_fini=<optimized out>, stack_end=0x7ffe8cfece18)
> at libc-start.c:287
> #6 0x00007f5ad1b3f3ec in ?? ()
> (gdb) info r
> rax 0xfffffffffffffff5 -11
> rbx 0x7f5ad32aefe0 140028066590688
> rcx 0xffffffffffffffff -1
> rdx 0x0 0
> rsi 0x7ffe8cfecb70 140731263929200
> rdi 0x7 7
> rbp 0x7f5ad1dba270 0x7f5ad1dba270
> rsp 0x7ffe8cfecb18 0x7ffe8cfecb18
> r8 0x7f5ad32aefe0 140028066590688
> r9 0x0 0
> r10 0x1 1
> r11 0x246 582
> r12 0x0 0
> r13 0x7f5ad32c7f60 140028066692960
> r14 0x100 256
> r15 0x0 0
> rip 0x7f5ad1705e80 0x7f5ad1705e80 <__recvmsg_nocancel+7>
> eflags 0x246 [ PF ZF IF ]
> cs 0x33 51
> ss 0x2b 43
> ds 0x0 0
> es 0x0 0
> fs 0x0 0
> gs 0x0 0
>
>
> Unfortunately, I did not have debug on when this crashed. I had it on
> for several days, but either I was "lucky" or the debug prevented the
> crash somehow. I was having several MB worth of debug logs every day, so
> I ended up disabling debug.
>
> I'm not 100% sure that this was installed from your CZ repository, it
> may have been from Debian backports. But I'm 95% sure it came from CZ.
> In any case the MD5 is as follows:
>
> 56e48e8e5a1380b384f1758df2077e53 bird_1.6.2-1~bpo8+1_amd64.deb
>
> I have now upgraded to 1.6.2-3~bpo8+1, from your CZ repository.
>
> I can provide the configuration file off-list, if that helps.
>
> Regards,
>
> Israel
>
More information about the Bird-users
mailing list