Bird6 freeze under high load

Chris Caputo ccaputo at alt.net
Sat Jan 31 01:27:56 CET 2015


If built with symbols, after it has gotten into the CPU busy-loop, use gdb 
to attach to it, ala:

  gdb <path to bird6> <process id of bird6>

Ex:

  gdb /usr/local/sbin/bird6 `ps -C bird6 -o pid=`

then "bt for a stack trace, possibly showing where stuck.

"cont" to continue and then another control-c to check again.

Do this a few times.  Hopefully there will be a pattern.

Copy & paste the results to this list.

"quit" to exit gdb, allowing bird6 to continue.

Chris

On Sat, 31 Jan 2015, Baptiste Jonglez wrote:
> I just tried downgrading from 1.4.5 to 1.4.4, using the 1.4.4-1~bpo70+1 
> Debian package from http://bird.network.cz/?download&tdir=debian/
> 
> The result is the same, bird6 also freezes periodically with version 1.4.4.
> 
> By the way, I think I ruled out the possibility that a particular BGP peer
> is sending garbage: the issue still arises when leaving only one BGP
> session active, whichever it is.
> 
> Is there anything else I can do to help troubleshoot the root cause of
> this issue?
> 
> On Thu, Jan 29, 2015 at 08:03:07PM +0100, Baptiste Jonglez wrote:
> > Hi,
> > 
> > We are experiencing regular "freezes" of bird6 on a BGP router.  When this
> > happens, bird6 maxes out a CPU for several minutes.  If a command is run
> > in birdc6 during such a freeze, the command hangs, and the result is only
> > returned when birdc6 has stopped using the CPU.  Note that this also
> > applies to "cheap" commands like "show protocols", which usually complete
> > instantly (both with bird, and with bird6 in non-freeze conditions).
> > 
> > Sometimes (but not always), the non-responsiveness of bird6 causes all BGP
> > sessions to drop, which is really annoying on a full-view BGP router.
> > 
> > The freezes happen at random, but seem to happen more frequently when the
> > router is under load (typically, at peak time, each CPU spends ~20%
> > forwarding packets, on a 4-core box).
> > 
> > The BGP setup is made of multiple transit and peerings, on multiple VLANs
> > (some BGP neighbours share the same VLAN).  The setup is pretty similar on
> > bird and bird6, but only bird6 exhibits these freezes, bird works just fine.
> > 
> > The box is running Debian wheezy on amd64, with bird from backports: 1.4.5-1~bpo70+1
> > 
> > Attached is the configuration, and two extracts of the logs when all BGP
> > sessions dropped (with debug { states, interfaces, events }).  All files
> > are anonymised, but should be consistent.
> > 
> > What do you think?  It looks like bird6 gets stuck on some very expensive
> > operation, which prevents it from doing anything else (include maintaining
> > BGP sessions alive).
> > 
> > Thanks,
> > Baptiste


More information about the Bird-users mailing list