Bird6 freeze under high load
Chris Caputo
ccaputo at alt.net
Sat Jan 31 01:27:56 CET 2015
If built with symbols, after it has gotten into the CPU busy-loop, use gdb
to attach to it, ala:
gdb <path to bird6> <process id of bird6>
Ex:
gdb /usr/local/sbin/bird6 `ps -C bird6 -o pid=`
then "bt for a stack trace, possibly showing where stuck.
"cont" to continue and then another control-c to check again.
Do this a few times. Hopefully there will be a pattern.
Copy & paste the results to this list.
"quit" to exit gdb, allowing bird6 to continue.
Chris
On Sat, 31 Jan 2015, Baptiste Jonglez wrote:
> I just tried downgrading from 1.4.5 to 1.4.4, using the 1.4.4-1~bpo70+1
> Debian package from http://bird.network.cz/?download&tdir=debian/
>
> The result is the same, bird6 also freezes periodically with version 1.4.4.
>
> By the way, I think I ruled out the possibility that a particular BGP peer
> is sending garbage: the issue still arises when leaving only one BGP
> session active, whichever it is.
>
> Is there anything else I can do to help troubleshoot the root cause of
> this issue?
>
> On Thu, Jan 29, 2015 at 08:03:07PM +0100, Baptiste Jonglez wrote:
> > Hi,
> >
> > We are experiencing regular "freezes" of bird6 on a BGP router. When this
> > happens, bird6 maxes out a CPU for several minutes. If a command is run
> > in birdc6 during such a freeze, the command hangs, and the result is only
> > returned when birdc6 has stopped using the CPU. Note that this also
> > applies to "cheap" commands like "show protocols", which usually complete
> > instantly (both with bird, and with bird6 in non-freeze conditions).
> >
> > Sometimes (but not always), the non-responsiveness of bird6 causes all BGP
> > sessions to drop, which is really annoying on a full-view BGP router.
> >
> > The freezes happen at random, but seem to happen more frequently when the
> > router is under load (typically, at peak time, each CPU spends ~20%
> > forwarding packets, on a 4-core box).
> >
> > The BGP setup is made of multiple transit and peerings, on multiple VLANs
> > (some BGP neighbours share the same VLAN). The setup is pretty similar on
> > bird and bird6, but only bird6 exhibits these freezes, bird works just fine.
> >
> > The box is running Debian wheezy on amd64, with bird from backports: 1.4.5-1~bpo70+1
> >
> > Attached is the configuration, and two extracts of the logs when all BGP
> > sessions dropped (with debug { states, interfaces, events }). All files
> > are anonymised, but should be consistent.
> >
> > What do you think? It looks like bird6 gets stuck on some very expensive
> > operation, which prevents it from doing anything else (include maintaining
> > BGP sessions alive).
> >
> > Thanks,
> > Baptiste
More information about the Bird-users
mailing list