[PATCH master] IPv6 ECMP support fixes for linux
Mikhail Sennikovskii
mikhail.sennikovskii at profitbricks.com
Wed Feb 17 14:59:12 CET 2016
Hi,
2016-02-16 18:48 GMT+01:00 Ondrej Zajicek <santiago at crfreenet.org>:
> On Wed, Feb 10, 2016 at 04:38:03PM +0100, Mikhail Sennikovskii wrote:
> > Hello Fellows,
> >
> > as a follow-up for the recent discussion about BIRD support of IPv6 ECMP
> on Linux
> > ( http://trubka.network.cz/pipermail/bird-users/2016-January/010163.html
> ),
> > I've created a patch, which addresses this problem.
>
> Hi
>
> Thanks for the patch, i have several comments:
>
> 1) In the patch, the incoming direction (collection from kernel) is
> handled in unix/krt.c, while outgoing direction (krt_replace_rte()) is
> handled in linux/netlink.c . It would make more sense to handle both
> directions in the same place. As BIRD uses internally representation
> with one route and mpnh structure, i would prefer to move collection
> also to linux/netlink.c, so generic code in unix/krt.c would not
> care about OS-specific interface (and does not need any changes).
> Linux code would just collect next hops and then call krt_got_route()
> for the whole multipath route.
>
Agreed, will fix.
2) Function krt_replace_rte() should be smarter than just flushing the
> whole old multipath rte. It should compare old and new next hops (we
> could suppose they are sorted) and either add missing
I'm not sure for now whether it's possible to make this work,
as netlink returns EEXIST whenever one tries to add a route for the network,
already present in the routing table.
This is why removing and re-installing seemed the best solution for me.
or remove
> surplussing next hops by calling nl_send_route().
>
Just removing should work, yes.
> 3) When collecting received routes, you should avoid creating list
> of routes for the same net, and then merging such list by rt_merge_list(),
> you could simply collect next hops (struct mpnh *) for the current route.
>
I can do it this way.
>
> 4) Also rt_merge_list() uses mpnh_merge_rta() which uses rte_update_pool.
> That is probably not safe outside of nest update code. The linux netlink
> code should have its own linpool (static and shared, because netlink socket
> is also shared), so you could allocate mpnh structures from it. The linpool
> should be flushed when the route is collected and propagated to
> krt_got_route().
> So you don't need to do rta_lookup() for temporary next hops.
>
Right, will do.
> Note that such linpool could be also used for IPv4 multipath next hops in
> nl_parse_multipath() instead of nh_buffer.
>
I can add this fix as well, probably as a separate patch.
> Note that the patch still does not support the learn mode for IPv6 ECMP.
> > The support for it can be added later.
>
> Well, the learn mode is not really a problem. With the modifications
> mentioned above, learning would work automatically (it is handled by
> unix/krt.c, regardless of how route is generated).
>
> What is problematic is handling asynchronous notifications (because you
> don't know inside netlink code what are other next hops and whether you
> will get another one related to the same net).
>
Sorry, I guess I just mixed up with bird terminology.
I thought the learn mode is directly related to handling these asynchronous
notifications
and adjusting the bird routes set accordingly.
Regards,
Mikhail
>
> --
> Elen sila lumenn' omentielvo
>
> Ondrej 'Santiago' Zajicek (email: santiago at crfreenet.org)
> OpenPGP encrypted e-mails preferred (KeyID 0x11DEADC3, wwwkeys.pgp.net)
> "To err is human -- to blame it on a computer is even more so."
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://trubka.network.cz/pipermail/bird-users/attachments/20160217/19a2efac/attachment.html>
More information about the Bird-users
mailing list