BIRD Crashes
Ian Chilton
ichilton at fastmail.co.uk
Thu Aug 18 17:57:44 CEST 2022
Hi,
We are an IXP running 2x route servers with BIRD, each running separate daemons for IPv4 and IPv6.
We are running BIRD 2.0.8-1 on Debian 10 and have around 250 peers, ~150k routes on v4 and ~50k routes on v6.
Since upgrading to BIRD 2 nearly 3 years ago, it was really stable until May this year. Since then we've had 3 crashes of the daemon for v4 on one of the servers. The v6 daemon on that server has been fine, as has the second route server, running the same, with the same peers and therefore in theory, the same routes.
The first two of these crashes happened a week apart, after which I rebooted the VM to ensure everything was clean and it was fine for 90 days, but then did the same yesterday.
Our BIRD configuration is generated by IXP Manager and updated hourly.
We then run a "bird re-validate" cron job every hour (at twenty past the hour):
/usr/sbin/birdc -s /run/bird-ipv6.ctl reload in all > /dev/null ; /usr/sbin/birdc -s /run/bird-ipv4.ctl reload in all
Interestingly all 3 crashes have happened at just after twenty past the hour, i.e soon after this cron job has run.
It looks like the following in the logs:
Aug 17 17:20:01 rs1 CRON[29229]: (root) CMD (/usr/sbin/birdc -s /run/bird-ipv6.ctl reload in all > /dev/null ; /usr/sbin/birdc -s /run/bird-ipv4.ctl reload in all > /dev/null)
Aug 17 17:20:01 rs1 bird: Reloading protocol device1
Aug 17 17:20:01 rs1 bird: Reloading protocol pp_0121_asxx
..etc..
Aug 17 17:20:01 rs1 bird: Reloading protocol pp_1082_asxxxxxx
Aug 17 17:20:01 rs1 bird: Reloading protocol pb_1082_asxxxxxx
Aug 17 17:20:01 rs1 bird: Tagging invalid ROA 2001:xxxx:xxxx::/48 for ASN xxxxx
..etc..
Aug 17 17:21:17 rs1 bird: Tagging invalid ROA x.x.x.x/23 for ASN xxxx
Aug 17 17:21:19 rs1 kernel: [7811815.959943] bird[586]: segfault at f30021 ip 000055a1bf450fc3 sp 00007ffe64f3da98 error 4 in bird[55a1bf42a000+d8000]
Aug 17 17:21:19 rs1 kernel: [7811815.966760] Code: 95 78 01 00 00 5b 5d 41 5c c3 0f 1f 00 66 2e 0f 1f 84 00 00 00 00 00 48 85 ff b8 01 00 00 00 74 15 48 85 f6 0f 84 a6 00 00 00 <0f> b6 46 21 0f b6 57 21 29 d0 74 11 f3 c3 0f 1f 44 00 00 66 2e 0f
Aug 17 17:21:19 rs1 systemd[1]: bird-ipv4.service: Main process exited, code=killed, status=11/SEGV
Aug 17 17:21:19 rs1 systemd[1]: bird-ipv4.service: Failed with result 'signal'.
Aug 17 17:21:19 rs1 systemd[1]: bird-ipv4.service: Service RestartSec=100ms expired, scheduling restart.
Aug 17 17:21:19 rs1 systemd[1]: bird-ipv4.service: Scheduled restart job, restart counter is at 1.
Aug 17 17:21:19 rs1 systemd[1]: Stopped BIRD - ipv4.
Aug 17 17:21:19 rs1 systemd[1]: Starting BIRD - ipv4...
Aug 17 17:21:22 rs1 systemd[1]: Started BIRD - ipv4.
Aug 17 17:21:22 rs1 bird: Started
When the second crash happened, we happened to be at RIPE84 so we chatted to Maria in person. She said that it was possible to debug it, but would need a core dump.
After looking in to this, I did:
ulimit -S -c unlimited
and installed the systemd-coredump package.
...which was supposed to dump a core file if a process crashed. I tested this by killing a sleep command from the shell with kill -s 6 and it worked.
When the crash happened again yesterday, I hoped to have a core file to send, but there is no sign of it having generated one :(
Testing on a test server, killing sleep generates a core file, but not killing bird.
So two things - has anyone experienced similar crashes or have any ideas why we might be seeing this?
Can anyone advise how to reliably get a core dump if bird crashes?
Thanks!
Ian
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://trubka.network.cz/pipermail/bird-users/attachments/20220818/1b8beaf6/attachment.htm>
More information about the Bird-users
mailing list