BIRD Crashes

Barry O'Donovan (INEX) barry.odonovan at inex.ie
Thu Aug 18 19:08:59 CEST 2022


Hi Ian, all,

Ian Chilton wrote on 18/08/2022 16:57:
> We then run a "bird re-validate" cron job every hour (at twenty past the 
> hour):
> /usr/sbin/birdc -s /run/bird-ipv6.ctl reload in all > /dev/null ; 
> /usr/sbin/birdc -s /run/bird-ipv4.ctl reload in all
> 
> Interestingly all 3 crashes have happened at just after twenty past the 
> hour, i.e soon after this cron job has run.

As you're running Bird 2.0.8 this should be no longer necessary. Per 
2.0.8's release logs:

 > Version 2.0.8 (2021-03-18)
 >  o Automatic channel reloads based on RPKI changes

So given all three crashes appear linked to this, stopping those manual 
reloads should, hopefully, return you to stability.

You're also two bugfix releases behind. At INEX we've been running 2.0.9 
for ~5/6 months now without issue.

There appears to be a lot of bugfixes between 2.0.8 and 2.0.10 so it 
might be worthwhile updating or checking the git commit logs to see if 
there's anything relevant to RPKI in there?

hth,
  - Barry


> It looks like the following in the logs:
> 
> Aug 17 17:20:01 rs1 CRON[29229]: (root) CMD (/usr/sbin/birdc -s 
> /run/bird-ipv6.ctl reload in all > /dev/null ; /usr/sbin/birdc -s 
> /run/bird-ipv4.ctl reload in all > /dev/null)
> Aug 17 17:20:01 rs1 bird: Reloading protocol device1
> Aug 17 17:20:01 rs1 bird: Reloading protocol pp_0121_asxx
> ..etc..
> Aug 17 17:20:01 rs1 bird: Reloading protocol pp_1082_asxxxxxx
> Aug 17 17:20:01 rs1 bird: Reloading protocol pb_1082_asxxxxxx
> Aug 17 17:20:01 rs1 bird: Tagging invalid ROA 2001:xxxx:xxxx::/48 for 
> ASN xxxxx
> ..etc..
> Aug 17 17:21:17 rs1 bird: Tagging invalid ROA x.x.x.x/23 for ASN xxxx
> Aug 17 17:21:19 rs1 kernel: [7811815.959943] bird[586]: segfault at 
> f30021 ip 000055a1bf450fc3 sp 00007ffe64f3da98 error 4 in 
> bird[55a1bf42a000+d8000]
> Aug 17 17:21:19 rs1 kernel: [7811815.966760] Code: 95 78 01 00 00 5b 5d 
> 41 5c c3 0f 1f 00 66 2e 0f 1f 84 00 00 00 00 00 48 85 ff b8 01 00 00 00 
> 74 15 48 85 f6 0f 84 a6 00 00 00 <0f> b6 46 21 0f b6 57 21 29 d0 74 11 
> f3 c3 0f 1f 44 00 00 66 2e 0f
> Aug 17 17:21:19 rs1 systemd[1]: bird-ipv4.service: Main process exited, 
> code=killed, status=11/SEGV
> Aug 17 17:21:19 rs1 systemd[1]: bird-ipv4.service: Failed with result 
> 'signal'.
> Aug 17 17:21:19 rs1 systemd[1]: bird-ipv4.service: Service 
> RestartSec=100ms expired, scheduling restart.
> Aug 17 17:21:19 rs1 systemd[1]: bird-ipv4.service: Scheduled restart 
> job, restart counter is at 1.
> Aug 17 17:21:19 rs1 systemd[1]: Stopped BIRD - ipv4.
> Aug 17 17:21:19 rs1 systemd[1]: Starting BIRD - ipv4...
> Aug 17 17:21:22 rs1 systemd[1]: Started BIRD - ipv4.
> Aug 17 17:21:22 rs1 bird: Started
> 
> When the second crash happened, we happened to be at RIPE84 so we 
> chatted to Maria in person. She said that it was possible to debug it, 
> but would need a core dump.
> 
> After looking in to this, I did:
> 
> ulimit -S -c unlimited
> and installed the systemd-coredump package.
> 
> ...which was supposed to dump a core file if a process crashed. I tested 
> this by killing a sleep command from the shell with kill -s 6 and it worked.
> 
> When the crash happened again yesterday, I hoped to have a core file to 
> send, but there is no sign of it having generated one :(
> 
> Testing on a test server, killing sleep generates a core file, but not 
> killing bird.
> 
> So two things - has anyone experienced similar crashes or have any ideas 
> why we might be seeing this?
> 
> Can anyone advise how to reliably get a core dump if bird crashes?
> 
> Thanks!
> 
> Ian
> 


-- 

Kind regards,
Barry O'Donovan
Consultant

For and on behalf of INEX

https://www.inex.ie/support/
+353 1 531 3339




More information about the Bird-users mailing list