netlink filtering to avoid clostly FNHE table dumps on Linux

Tomas Hlavacek tmshlvck at gmail.com
Sun Jan 9 01:43:36 CET 2022


Hi Ondrej, all,

On Sat, Jan 8, 2022 at 5:56 AM Ondrej Zajicek <santiago at crfreenet.org> wrote:
> > I believe that many different types of Linux tunnels create the PMTU
> > records for all packets transmitted over the tunnel as well. And it
> > works like that for a long time - the code that creates the route
> > cache (at that time, now it is FNHE table) records has been introduced
> > in Linux 3.10 (https://elixir.bootlin.com/linux/v3.10/source/net/ipv4/ip_tunnel.c#L591).
>
> If i understand it correctly, these PMTU records can also be a result of
> regular TCP communication from/to the router even if there are no tunnels?

Yes, but in most cases the kernel should not create that many PMTU
records. Even with 600 s expiration I would expect several thousands
or hundreds of thousands maximum. I still do not fully understand why
I saw over 130M PMTU records received by BIRD in one scan. Either
there is some multiplication within the dump or there was something
very wrong. Anyway, I am going to analyze the kernel part in more
detail and I will address this in LKML.

> > Regardless of what may or may not happen on the kernel side I think
> > that implementing the netlink filter in BIRD to avoid the described
> > situation makes sense. I am almost certain that my experimental fix
> > breaks other things (most likely OSPF) but I would be glad to help
> > make it right.
>
> How could OSPF be affected by filters on netlink socket?

My experimental patch actually broke kif_do_scan(). It turned out that
there are some (all?) missing link records caused by the
NETLINK_GET_STRICT_CHK sockopt. I guess it breaks device protocol,
which in turn breaks OSPF. In any case OSPF did not start on the GRE
interface (it didn't send or receive any messages) until I fixed the
kif_do_scan(). I think there is an easy way out without needing larger
changes: We can enable NETLINK_GET_STRICT_CHK only for krt_do_scan().
I'll send a new RFC patch shortly.

Best regards,
Tomas


More information about the Bird-users mailing list