Bird6 freeze under high load

Baptiste Jonglez baptiste at bitsofnetworks.org
Thu Feb 5 15:07:07 CET 2015


By the way, I think I found a kernel commit that should fix this issue:

commit 1c2658545816088477e91860c3a645053719cb54
Author: Kumar Sundararajan <kumar at fb.com>
Date:   Thu Apr 24 09:48:53 2014 -0400

    ipv6: fib: fix fib dump restart

    When the ipv6 fib changes during a table dump, the walk is
    restarted and the number of nodes dumped are skipped. But the existing
    code doesn't advance to the next node after a node is skipped. This can
    cause the dump to loop or produce lots of duplicates when the fib
    is modified during the dump.

    This change advances the walk to the next node if the current node is
    skipped after a restart.

    Signed-off-by: Kumar Sundararajan <kumar at fb.com>
    Signed-off-by: Chris Mason <clm at fb.com>
    Signed-off-by: David S. Miller <davem at davemloft.net>

 net/ipv6/ip6_fib.c |    3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)


The patch is very simple... See also
http://kernel.opensuse.org/cgit/kernel/commit/?id=1c2658545816088477e91860c3a645053719cb54


Does Bird keep inserting routes in the kernel while it is scanning the
routing table?  If so, then we probably hit this bug.

Thanks,
Baptiste

On Thu, Feb 05, 2015 at 02:50:08PM +0100, Baptiste Jonglez wrote:
> On Sat, Jan 31, 2015 at 03:06:37PM +0100, Ondrej Zajicek wrote:
> > On Sat, Jan 31, 2015 at 02:47:51PM +0100, Baptiste Jonglez wrote:
> > > This took several minutes to complete, and there certainly isn't so much
> > > IPv6 routes in the kernel: routes appear several times in the output of
> > > "ip -6 r".  Running this command multiple times yields very different
> > > results each time.
> > > 
> > > Thus, I don't think the bug is in Bird.  Could it be some kind of race
> > > condition with netlink?  I haven't been able to find any reference to this
> > > bug, either in the kernel or in iproute2.  For reference, this is on a
> > > Debian wheezy system, but I can reproduce the duplicate routes in "ip -6 r"
> > > on Debian jessie as well.
> > 
> > Interesting, what are the kernel versions in these Wheezy and Jessie systems?
> > Does the problem (with 'ip -6 r') appears also when BIRD is not running?
> 
> So far, I saw this behaviour in Wheezy (3.2.0-4-amd64 + iproute
> 20120521-3+b3), and Jessie (3.14.4-1 + iproute2 3.16.0-2).  In both cases,
> Bird was running.
> 
> I tried shutting down bird6 (with "persist" in the kernel protocol, so
> that routes stay in the kernel), and the problem seemed to persist:
> 
> $ ip -6 r | wc -l
> 26720
> $ ip -6 r | wc -l
> 21794
> $ ip -6 r | wc -l
> 50321
> $ ip -6 r | wc -l
> 37602
> $ ip -6 r | wc -l
> 59011
> 
> However, the amplitude is very reduced (when Bird is running and talking
> to the kernel, doing the same thing yields millions of routes).
> 
> > I wonder what factors are specific to this problem. I remember there were
> > a similar report or two few years ago, but these reports are too uncommon
> > to be an universal problem in IPv6 Linux forwarding.
> 
> I actually found a way to reproduce this without Bird, just by inserting
> static routes.  It seems that dumping the routing table while some routes
> are being inserted is enough to trigger the bug.
> 
> Simple way to see the issue: run this on one shell
> 
>   watch -n 0.2 "ip -6 r | wc -l"
> 
> and run this in another, root, shell:
> 
>   for i in {0000..3999};  do ip -6 r add unreachable 2001:db8:$i::/48 proto 57;  sleep 0.005;  done;  sleep 30;  ip -6 r flush proto 57
> 
> Then look at the output of watch: at first, the number of routes grows
> regularly, and then, at some point, it will start jumping up and down.
> 
> It should be possible to write a small program that automatically tests
> the presence of the bug (for instance by checking that the number of
> routes reported by "ip -6 r" is always increasing).
> 
> 
> So far, using the little shell script above, I saw the issue with:
> 
> - Wheezy (3.2.0-4-amd64 + iproute 20120521-3+b3)
> 
> The bug didn't show up with:
> 
> - Wheezy with a multipath TCP kernel (3.14.27 + iproute 201407242100)
> - Archlinux (3.18.2 + iproute2 3.17.0 )
> - Archlinux (3.14.27-1-mptcp + iproute2 3.18.0-1)


-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 819 bytes
Desc: not available
URL: <http://trubka.network.cz/pipermail/bird-users/attachments/20150205/8c83a7a6/attachment.asc>


More information about the Bird-users mailing list