Mitigations/tunables for reducing netlink loss?

Tue Sep 21 09:53:06 CEST 2021

Hello!

>    Sep 20 11:50:48 ganges bird: Kernel dropped some netlink messages, will resync on next scan.
[...]
>    Sep 20 11:51:19 ganges bird: Kernel dropped some netlink messages, will resync on next scan.

This is somehow inevitable, as the netlink manpage states:

        However, reliable transmissions from kernel to user are
        impossible in any case.  The kernel can't send a netlink message
        if the socket buffer is full: the message will be dropped and the
        kernel and the user-space process will no longer have the same
        view of kernel state.  It is up to the application to detect when
        this happens (via the ENOBUFS error returned by recvmsg(2)) and
        resynchronize.

This unreliability is also a good reason to have periodic table scans, 
just to be sure that kernel is in sync with BIRD.

> I'm seeing netlink drops when upstream internet churn is say more than
> 200 updates/sec or so, not huge, but quite freqent and can continue
> for minutes/hours.

Yes, this is quite a known situation. We can't do much about it in 
single-threaded BIRD – the ENOBUFS error signals that the kernel has no 
more room to store route updates. (See more thorough explanation down 
there.)

> Some items I've investigated so far:
> 
> Increasing net.core.rmem_max and net.core.wmem_max sysctls doesn't
> seem to help much, strace of bird doesn't indicate any EAGAIN or
> blocking when writing to the netlink sockets.

Here somebody suggests increasing net.core.rmem_default before starting 
BIRD.

https://bird.network.cz/pipermail/bird-users/2017-September/011541.html

> strace shows some room for optimization in the prot kernel (these
> would obviously be code changes).  For example, when a route changes
> next-hop/interface, 2 netlink messages are sent, delete followed by
> add, instead of a single change/replace (this would complicate bird,
> but reduce netlink message in half for updates).

This would be feasible in a world of one single kernel with no bugs, yet 
there have been quite a few bugs needed to be worked-around and we have 
no useful detection mechanism to check whether this exact kernel version 
suffers from that bug. (There are still people running new BIRD on old 
kernels.)

> There is plenty of cpu cycles available, bird is <%1, etc...
> 
> Any pointers on tuning or config changes that may help here are
> appreciated.

Well, to be honest, I think this may be fixed by having a separate 
netlink thread (which is a work-almost-in-progress), yet without that, 
it is almost impossible. The reason is how it works now:

1) BGP receives a packet (quite a big one or several of them)
2) BGP parses the input data and for each single route:
2A) import filter is run
2B) best route in table is recalculated
2C) all exports are run; in case of kernel, the netlink message is sent
2D) kernel generates a netlink message in response, confirming the route 
update
(repeat this for all the data)
3) BGP is done and another socket is read. For simplicity, let's assume 
it is the netlink receive socket.
4) Netlink parses the incoming messages, getting ENOBUFS and realizing 
that there are some more updates that didn't fit into the receive 
buffer, issuing that warning.
5) After a while, netlink scan is issued, successfully checking that all 
routes are there.

The actual reason for BIRD showing these warning in tables where only 
BIRD writes is simply the impossibility of reading the netlink socket 
while exporting routes from another protocol.

This will be fixed in future BIRD versions supporting multithreaded 
execution where the netlink thread should have enough time to read the 
netlink socket and the exports for netlink (and all other protocols) 
will properly queue and wait to be processed until the protocol decides 
to actually export.

Maria