Inserting fulltable into kernel FIB makes bird crazy

Alarig Le Lay alarig at swordarmor.fr
Fri Sep 24 23:29:25 CEST 2021


Hello,

Now that the IPv6 bug is supposed to be resolved since 5.8, I tried to
upgrade a router from 4.14 to 5.10

Bird starts, however while inserting routes to FIB, I have long I/O loop
cycles and at some point bird is unable to keep up.
I already recompiled bird in case of a header change or something like
that, and to switch to a pre-compiled kernel, neither have any effect.

When bird begins to loose track of itself, I have this kind of messages:
Sep 24 08:44:43 edge04-hostzealot bird: Netlink: File exists
Sep 24 08:44:43 edge04-hostzealot bird: Netlink: File exists
Sep 24 08:44:43 edge04-hostzealot bird: Netlink: File exists
Sep 24 08:44:43 edge04-hostzealot bird: Netlink: File exists
Sep 24 08:44:43 edge04-hostzealot bird: Netlink: File exists
Sep 24 08:44:43 edge04-hostzealot bird: ...
Sep 24 08:44:43 edge04-hostzealot bird: I/O loop cycle took 28703 ms for 1 events
Sep 24 08:44:43 edge04-hostzealot bird: Kernel dropped some netlink messages, will resync on next scan.
Sep 24 08:45:50 edge04-hostzealot bird: Netlink: File exists
Sep 24 08:45:50 edge04-hostzealot bird: Netlink: File exists
Sep 24 08:45:50 edge04-hostzealot bird: Netlink: File exists
Sep 24 08:45:50 edge04-hostzealot bird: Netlink: File exists
Sep 24 08:45:50 edge04-hostzealot bird: Netlink: File exists
Sep 24 08:45:50 edge04-hostzealot bird: ...
Sep 24 08:45:51 edge04-hostzealot bird: I/O loop cycle took 36201 ms for 1 events

And then ospf begins to flap and routes are re-calculated based on
remaining bgp ones.
Sep 24 08:46:54 edge04-hostzealot bird: Next hop address 185.107.95.180 resolvable through recursive route for 185.107.92.0/22
(I have a way more specific route in OSPF)

I activated the debug, and I can see that bird is re-scanning the entire
kernel table when the “I/O loop” message appears
Sep 24 09:07:30 edge04-hostzealot bird: kernel_grt_ipv4: 1.0.0.0/24: seen
Sep 24 09:07:30 edge04-hostzealot bird: kernel_grt_ipv4: 1.0.4.0/24: seen

And it tries to insert already inserted routes
Sep 24 09:08:04 edge04-hostzealot bird: kernel_grt_ipv4: 122.76.248.0/23: installing
Sep 24 09:08:04 edge04-hostzealot bird: Netlink: File exists

And then OSPF is clearly going down
Sep 24 09:08:04 edge04-hostzealot bird: ospf_ipv4: Inactivity timer expired for nbr 45.91.126.248 on gre4
Sep 24 09:08:04 edge04-hostzealot bird: ospf_ipv4: Neighbor 45.91.126.248 on gre4 changed state from Full to Down
Sep 24 09:08:04 edge04-hostzealot bird: ospf_ipv4: Neighbor 45.91.126.248 on gre4 removed

Here are some more detailed logs: https://paste.swordarmor.fr/raw/HX45
https://paste.swordarmor.fr/raw/oM9s

This server isn’t the fastest one on the marked, but stuffed enough to
handle full views. And with an older kernel it works very well.

I have RRs running on 5.10 kernels, so it’s more likely a kernel issue,
but I’m not able to determine if it’s caused by the kernel itself or by
the way bird is using netlink.

I’m using bird 2.0.8, I didn’t try an older version.

Thanks a lot,
-- 
Alarig Le Lay


More information about the Bird-users mailing list