Bird6 freeze under high load

Pendzik, Edward ependzik at harris.com
Sat Jan 31 01:37:09 CET 2015


if the pid of bird6 is 123, run (as root)

strace -p 123 -o /tmp/bird6.strace.out

and then send in the bird6.strace.out.

strace uses the same hooks gdb uses but doesnt stop the process.
The outfile will contain all the system calls and their args from the process in real time as it is running.
control-C will kill strace and which should cause it to detach and let bird6 keep going unfettered.

Ed

-----Original Message-----
From: bird-users-bounces at network.cz [mailto:bird-users-bounces at network.cz] On Behalf Of Chris Caputo
Sent: Friday, January 30, 2015 7:28 PM
To: Baptiste Jonglez
Cc: bird-users at network.cz
Subject: Re: Bird6 freeze under high load

If built with symbols, after it has gotten into the CPU busy-loop, use gdb 
to attach to it, ala:

  gdb <path to bird6> <process id of bird6>

Ex:

  gdb /usr/local/sbin/bird6 `ps -C bird6 -o pid=`

then "bt for a stack trace, possibly showing where stuck.

"cont" to continue and then another control-c to check again.

Do this a few times.  Hopefully there will be a pattern.

Copy & paste the results to this list.

"quit" to exit gdb, allowing bird6 to continue.

Chris

On Sat, 31 Jan 2015, Baptiste Jonglez wrote:
> I just tried downgrading from 1.4.5 to 1.4.4, using the 1.4.4-1~bpo70+1 
> Debian package from http://bird.network.cz/?download&tdir=debian/
> 
> The result is the same, bird6 also freezes periodically with version 1.4.4.
> 
> By the way, I think I ruled out the possibility that a particular BGP peer
> is sending garbage: the issue still arises when leaving only one BGP
> session active, whichever it is.
> 
> Is there anything else I can do to help troubleshoot the root cause of
> this issue?
> 
> On Thu, Jan 29, 2015 at 08:03:07PM +0100, Baptiste Jonglez wrote:
> > Hi,
> > 
> > We are experiencing regular "freezes" of bird6 on a BGP router.  When this
> > happens, bird6 maxes out a CPU for several minutes.  If a command is run
> > in birdc6 during such a freeze, the command hangs, and the result is only
> > returned when birdc6 has stopped using the CPU.  Note that this also
> > applies to "cheap" commands like "show protocols", which usually complete
> > instantly (both with bird, and with bird6 in non-freeze conditions).
> > 
> > Sometimes (but not always), the non-responsiveness of bird6 causes all BGP
> > sessions to drop, which is really annoying on a full-view BGP router.
> > 
> > The freezes happen at random, but seem to happen more frequently when the
> > router is under load (typically, at peak time, each CPU spends ~20%
> > forwarding packets, on a 4-core box).
> > 
> > The BGP setup is made of multiple transit and peerings, on multiple VLANs
> > (some BGP neighbours share the same VLAN).  The setup is pretty similar on
> > bird and bird6, but only bird6 exhibits these freezes, bird works just fine.
> > 
> > The box is running Debian wheezy on amd64, with bird from backports: 1.4.5-1~bpo70+1
> > 
> > Attached is the configuration, and two extracts of the logs when all BGP
> > sessions dropped (with debug { states, interfaces, events }).  All files
> > are anonymised, but should be consistent.
> > 
> > What do you think?  It looks like bird6 gets stuck on some very expensive
> > operation, which prevents it from doing anything else (include maintaining
> > BGP sessions alive).
> > 
> > Thanks,
> > Baptiste



More information about the Bird-users mailing list