BIRD trying to reinsert existing kernel routes, netlink issue?

Sasha Romijn sasha at dashcare.nl
Sun Aug 2 17:36:13 CEST 2020


Hello,

I’m running into a rather complex issue where BIRD tries to reinsert
routes into the kernel routing table that are already in there. I’m 
at the point of suspecting a netlink bug in the 5.x linux kernel.
However, I may be entirely off, and would like to hear what people with
more BIRD and/or netlink expertise think about it.

My setup is bird 2.0.7 on Ubuntu 20.04. I’ve been able to reproduce
this with the default 5.4.0-42-generic kernel as well as the ubuntu
mainline 5.7.0-050700-generic kernel. This bird instance has an IBGP
peering with another bird 2.0.7 instance, which sends an IPv4 full
table. The IBGP peering is set up over a GRE tunnel. There is no other
routing software running. I am seeing this issue on another server too,
ubuntu 20.04 with 5.4.0-42-generic. A third one with ubuntu 18.04 and a
4.15.0-20-generic kernel, is fine.

The problem presents as follows: a few thousand routes fail insertion
into the kernel routing table, marked by bird as ‘!’:

-------------------------
# birdc show route table master4|grep \!|wc -l
5838
-------------------------

The number varies, but is in the order of a few thousand.
However, all routes seem to be in the kernel table:

-------------------------
# birdc show route count
BIRD 2.0.7 ready.
788748 of 788748 routes for 788747 networks in table master4
0 of 0 routes for 0 networks in table master6
Total: 788748 of 788748 routes for 788747 networks in 2 tables
# ip route|grep bird|wc -l
788748
-------------------------

When I wait a few minutes and check again, the set of failed prefixes
is almost entirely different. Apparently, the previously failed prefixes
have been “fixed”, but now the same problem appears with different ones.

When I check a particular sample, the route really is in the kernel
routing table already:
-------------------------
# birdc show route 81.200.176.0/20
BIRD 2.0.7 ready.
Table master4:
81.200.176.0/20      unicast [fra1 14:47:20.892] ! (100) [AS50664i]
	via 10.195.6.1 on tun-fra1

# ip route|grep 81.200.176.0/20
81.200.176.0/20 via 10.195.6.1 dev tun-fra1 proto bird metric 32

# birdc show route 81.200.176.0/20
BIRD 2.0.7 ready.
Table master4:
81.200.176.0/20      unicast [fra1 14:47:20.892] ! (100) [AS50664i]
	via 10.195.6.1 on tun-fra1
-------------------------

The logs are repeats of:
-------------------------
Aug 01 09:50:23 fra2 bird[83586]: Netlink: File exists
Aug 01 09:50:23 fra2 bird[83586]: Netlink: File exists
Aug 01 09:50:23 fra2 bird[83586]: Netlink: File exists
Aug 01 09:50:23 fra2 bird[83586]: Netlink: File exists
Aug 01 09:50:23 fra2 bird[83586]: Netlink: File exists
Aug 01 09:50:23 fra2 bird[83586]: ...
Aug 01 09:50:23 fra2 bird[83586]: I/O loop cycle took 5110 ms for 1 events
-------------------------

Seemingly, bird thinks routes are not yet in the kernel routing table,
tries to insert them, which fails because they are already there.
Later scans move the problem to different prefixes.

If I remove the tunneled IBGP peering, and set up an upstream peering
on a directly connected non-tunneled peer, everything works fine. If I
enable both the tunneled IBGP peering and the non-tunneled upstream,
the issue does appear. The routes that are affected by the failed
insertion attempts are then from both peers.

The prefixes that are affected are almost entirely cases where multiple
prefixes exist in the DFZ with the same network address, but different
lengths. Above I mentioned seeing 81.200.176.0/20 at some point, and
81.200.176.0/24 is also in the DFZ. From two samples of failed prefixes
at different points in time, about 98% are prefixes of this kind, where
the DFZ contains the same network address with a different length too.
This distribution is not reflected in the whole v4 full table,
suggesting that routes that meet this case, are most likely to fail
insertion.

I have done some deeper digging with strace.
https://p.6core.net/p/RtQ7xSSYd560i7FioG78ebsi is an strace of BIRD
starting and loading the v4 full table from the IBGP peer. I have
filtered this for “45.189.104.0” “GETROUTE” to trim down. I have the
full strace, but it’s 8GB:

1. Initially, bird does GETROUTE a few times, before the BGP session is
   established.
2. Near the start, at 09:06:33.208648 and 09:06:33.209003, the BGP
   session seems established, and 45.189.104.0/24 and 45.189.104.0/22
   are inserted correctly. (Acknowledgement not in paste as it does not
   contain the prefix.)
3. At 09:06:39.774454 bird sends to birdc that 45.189.104.0/24 is best
   route and inserted correctly.
4. At 09:06:49.915838, bird sends a GETROUTE. Only the reply lines that
   contain “45.189.104.0” are in the paste. Apparently, only
   45.189.104.0/24 is included in the reply. 45.189.104.0/22 does not 
   seem to appear in the reply to GETROUTE.
5. At 09:07:08.455767, bird tries to insert 45.189.104.0/22 into the
   kernel routing table. This fails, because it already exists.
6. At 09:07:43.144207 bird replies to birdc that 45.189.104.0/22 failed
   to insert.
7. At 09:07:49.874282, bird does another GETROUTE. Both the /22 and the
   /24 are included in the response.
8. At 09:08:25.623269, bird tries to update 45.189.104.0/22 which
   succeeds. Note the different flags from 09:07:08.455767, specifically
   NLM_F_EXCL vs NLM_F_REPLACE, presumably because bird is aware there
   is an existing route.
9. Future GETROUTEs return both the /24 and /22, bird does nothing.

Working theory: netlink GETROUTE in at least some 5.x kernels may not
return all routes, when at least some routes in the table have a next
hop that is a tunnel interface, and this is almost entirely contained
to cases where multiple prefixes exist with the same network address.

Thoughts? And if that theory is correct, can we work around it in BIRD?

A few things I’ve changed that had no effect:
- Changing the tunnel type.
- Changing the peering from IBGP to EBGP.
- Using IPv6 addresses for the tunnel endpoints, and removing the static
  route in the config below.

This is the full config I’m currently using:
-------------------------
log syslog all;
router id 141.98.136.36;

protocol kernel {
    scan time 20;
    ipv4 {
        export all;
        import none;
    };
}

# never route the tunnel through the tunnel
protocol static tunnel_endpoints {
    ipv4;
    route 45.12.69.14/32 via 141.98.136.33;
}

protocol device {
}

# ibgp peering over GRE tunnel, ipv4 full table
protocol bgp fra1 {
    local as 213279;
    source address 10.195.6.2;
    neighbor 10.195.6.1 as 213279;
    direct;

    ipv4 {
        import all;
        export none;
    };
}
-------------------------

Sasha





More information about the Bird-users mailing list