Dropped netlink updates during scans
Shaun.Crampton at metaswitch.com
Thu Aug 13 18:31:53 CEST 2015
We set our scan time to 2s and then create many routes using "ip route
add². With many (say 50k) routes, the scan starts taking a second or more
so BIRD is ignoring about 50% of the route updates and only picking them
up on the scan.
We¹re running 100ms interval pings between our containers that we¹re
routing so we see a a cluster of ping times around 2s and another cluster
On 13/08/2015 17:00, "Ondrej Zajicek" <santiago at crfreenet.org> wrote:
>On Thu, Aug 13, 2015 at 03:12:26PM +0000, Shaun Crampton wrote:
>> We¹re using BIRD to redistribute routes that are programmed into the
>> kernel for routing to local containers or VMs. We set a scan time in
>> kernel section of the config in order to notice when routes are removed.
>> Normally, BIRD picks up routes that are added extremely quickly.
>> if a route is added during a scan, it seems to be missed and it is not
>> picked up until the next scan, many seconds later.
>I was not aware of this issue but the cause it is pretty clear - scans
>are implemented in a synchronous way and BIRD ignores all non-related
>messages during these scans.
>The proper solution would be to make make the BIRD netlink code fully
>asynchronous, but that means rewritting half of netlink and route
>scanning code. As a workaround we could just queue these asynchronous
>messages and process them after scans (and other netlink operations).
>BTW, the issue is likely not limited to route scans but may happen with
>any netlink operation (like request for route change), but other
>operations are probably too quick to cause the problem in practice.
>Do you have a simple way how to trigger the issue?
>Elen sila lumenn' omentielvo
>Ondrej 'Santiago' Zajicek (email: santiago at crfreenet.org)
>OpenPGP encrypted e-mails preferred (KeyID 0x11DEADC3, wwwkeys.pgp.net)
>"To err is human -- to blame it on a computer is even more so."
More information about the Bird-users