Inserting fulltable into kernel FIB makes bird crazy

Alarig Le Lay alarig at swordarmor.fr
Sun Feb 20 14:32:51 CET 2022


Hello,

Thanks for giving me the original patch! I backported it along with some
of the following commits to 2.0.8 and it seems to work too.
The whole diff is https://git.grifon.fr/alarig/SwordArMor-gentoo-overlay/src/branch/master/net-misc/bird/files/bird-2.0.8-linux-netlink-filters.patch

I will test it on old kernels too, and if it works, I’m planning to
include it in the gentoo package.

On Sat 19 Feb 2022 01:44:43 GMT, Ondrej Zajicek wrote:
> On Sat, Feb 19, 2022 at 12:44:11AM +0100, Alarig Le Lay wrote:
> > Hello,
> > 
> > I gave the 2.0.9 git snapshot (71c9484b00b4428ae6c7d7c8eea6d96073683a54)
> > a try tonight, and it seems to fix the issue for me. I’ve not tested on
> > 5.10 though, as the LTS is now 5.15.
> > However, I did test 5.15 with 2.0.8 and I had the same behaviour.
> > 
> > The VM is up for 6h now and everything is stable. Before, the logs were
> > flooded within an hour.
> 
> Hello
> 
> Thanks for confirming it. That is likely an issue investigated and fixed
> by Tomas Hlavacek:
> 
> http://trubka.network.cz/pipermail/bird-users/2022-January/015909.html
> 
> 
> > On Fri 24 Sep 2021 23:29:25 GMT, Alarig Le Lay wrote:
> > > Hello,
> > > 
> > > Now that the IPv6 bug is supposed to be resolved since 5.8, I tried to
> > > upgrade a router from 4.14 to 5.10
> > > 
> > > Bird starts, however while inserting routes to FIB, I have long I/O loop
> > > cycles and at some point bird is unable to keep up.
> > > I already recompiled bird in case of a header change or something like
> > > that, and to switch to a pre-compiled kernel, neither have any effect.
> > > 
> > > When bird begins to loose track of itself, I have this kind of messages:
> > > Sep 24 08:44:43 edge04-hostzealot bird: Netlink: File exists
> > > Sep 24 08:44:43 edge04-hostzealot bird: Netlink: File exists
> > > Sep 24 08:44:43 edge04-hostzealot bird: Netlink: File exists
> > > Sep 24 08:44:43 edge04-hostzealot bird: Netlink: File exists
> > > Sep 24 08:44:43 edge04-hostzealot bird: Netlink: File exists
> > > Sep 24 08:44:43 edge04-hostzealot bird: ...
> > > Sep 24 08:44:43 edge04-hostzealot bird: I/O loop cycle took 28703 ms for 1 events
> > > Sep 24 08:44:43 edge04-hostzealot bird: Kernel dropped some netlink messages, will resync on next scan.
> > > Sep 24 08:45:50 edge04-hostzealot bird: Netlink: File exists
> > > Sep 24 08:45:50 edge04-hostzealot bird: Netlink: File exists
> > > Sep 24 08:45:50 edge04-hostzealot bird: Netlink: File exists
> > > Sep 24 08:45:50 edge04-hostzealot bird: Netlink: File exists
> > > Sep 24 08:45:50 edge04-hostzealot bird: Netlink: File exists
> > > Sep 24 08:45:50 edge04-hostzealot bird: ...
> > > Sep 24 08:45:51 edge04-hostzealot bird: I/O loop cycle took 36201 ms for 1 events
> > > 
> > > And then ospf begins to flap and routes are re-calculated based on
> > > remaining bgp ones.
> > > Sep 24 08:46:54 edge04-hostzealot bird: Next hop address 185.107.95.180 resolvable through recursive route for 185.107.92.0/22
> > > (I have a way more specific route in OSPF)
> > > 
> > > I activated the debug, and I can see that bird is re-scanning the entire
> > > kernel table when the “I/O loop” message appears
> > > Sep 24 09:07:30 edge04-hostzealot bird: kernel_grt_ipv4: 1.0.0.0/24: seen
> > > Sep 24 09:07:30 edge04-hostzealot bird: kernel_grt_ipv4: 1.0.4.0/24: seen
> > > 
> > > And it tries to insert already inserted routes
> > > Sep 24 09:08:04 edge04-hostzealot bird: kernel_grt_ipv4: 122.76.248.0/23: installing
> > > Sep 24 09:08:04 edge04-hostzealot bird: Netlink: File exists
> > > 
> > > And then OSPF is clearly going down
> > > Sep 24 09:08:04 edge04-hostzealot bird: ospf_ipv4: Inactivity timer expired for nbr 45.91.126.248 on gre4
> > > Sep 24 09:08:04 edge04-hostzealot bird: ospf_ipv4: Neighbor 45.91.126.248 on gre4 changed state from Full to Down
> > > Sep 24 09:08:04 edge04-hostzealot bird: ospf_ipv4: Neighbor 45.91.126.248 on gre4 removed
> > > 
> > > Here are some more detailed logs: https://paste.swordarmor.fr/raw/HX45
> > > https://paste.swordarmor.fr/raw/oM9s
> > > 
> > > This server isn’t the fastest one on the marked, but stuffed enough to
> > > handle full views. And with an older kernel it works very well.
> > > 
> > > I have RRs running on 5.10 kernels, so it’s more likely a kernel issue,
> > > but I’m not able to determine if it’s caused by the kernel itself or by
> > > the way bird is using netlink.
> > > 
> > > I’m using bird 2.0.8, I didn’t try an older version.
> 
> 
> -- 
> Elen sila lumenn' omentielvo
> 
> Ondrej 'Santiago' Zajicek (email: santiago at crfreenet.org)
> OpenPGP encrypted e-mails preferred (KeyID 0x11DEADC3, wwwkeys.pgp.net)
> "To err is human -- to blame it on a computer is even more so."


More information about the Bird-users mailing list