[PATCH] More multipath support for OSPF
Peter Christensen
pch at ordbogen.com
Thu Feb 6 21:47:03 CET 2014
On 02/06/2014 04:14 PM, Ondrej Zajicek wrote:
> On Thu, Feb 06, 2014 at 02:17:12PM +0100, Peter Christensen wrote:
>> Hi,
> Hello
>
>> I noticed that the multipath support in OSPF seems to be fairly limited.
>> Essentially I was only able to make it do multipath if I had two
>> interfaces connecting to the same router.
>> At my company, we need true multipath between multiple routers using a
>> single interface.
>> (If I needed the other, I could use LACP)
> Not if such multipath spans multiple routers (e.g. a network consists of
> several routers connected by ptp links to a circle.
True. I was just considering the simple case with two routers with two
interfaces each connecting to a switch in between. Here, LACP would work
just fine.
>
> Also note that even if you have just one interface, you still get ECMP
> if there are several paths (through different neighbor routers) to one
> router few hops away.
Apparently I didn't. I essentially tries to make my routers balance
traffic across multiple load-balancers running OSPF with BIRD. Their
setup looks something like this (simplified):
protocol ospf {
import none;
export none;
area 1 {
interface "eth0";
interface "lo";
};
}
The loopback interface contains a number of anycast addresses which
appears as stubnets in OSPF. The routers see the stubnet on both
load-balancer, but only pick one (seemingly) random load-balancer when
inserting into the routing table.
If both the router and one of the load-balancers participated in the
area on two interfaces, I got a multipath route entry. I traced the flow
and found that stubnets never visited the current multipath code.
>
>> I am aware of the implications the default multipath implementation in
>> Linux which operates on a per-packet basis, which is why we've patched
>> our kernels to do it per-flow instead.
> Really? AFAIK default Linux implementation is per-flow, not per-packet,
> unless this was changed recently.
The IPv4 multipath code in the kernel actually picks a pseudo-random
route in a round-robin fashion. The route cache would however ensure
that the flow stayed on a particular path for a while if the route was
used continuously. In Linux 3.6 the route cache was removed from the
kernel (apparently the route cache behaved badly under heavy load),
effectively turning the multipath code from per-flow to per-packet. The
IPv6 multipath code has always used a hash-based modulo-N algorithm
which ensured consistent flow-based multipath. So we basically added an
option in the kernel allowing for hash-based modulo-N based multipath in
IPv4 (as an added bonus, the round-robin code required a spinlock, while
the hash-based code is lock-free). Unfortunately our implementation
disregard multipath weights, so I haven't bothered sending it to any
kernel mailing list. By recommendation of RFC 2992 (Analysis of an
Equal-Cost Multi-Path Algorithm) I'll probably change our hash modulo-N
algorithm to a hash-threshold algorithm, which have better behavior in
case of gateways being added or removed to the multipath.
>
>> Anyway, I seemed to have managed to make multipath work as expected - at
>> least in our setup. (Patch attached)
> Well, what is expected is the question. BIRD currently do multipath
> on idea that multiple paths through OSPF network topology to one
> destination in one area are merged, but two same routes originated by
> two different routers are considered different destinations (which makes
> perfect sense for propagated default gateways or anycast destinations).
The way I interpret the OSPFv2 spec, a destination is simply an IP
address prefix. There may be several routes to a particular destination
through a lot of routers, but if multiple routes to that destination
exist whcih seems identical in quality (cost etc.), those routes are
eligible for multipath - even though those destinations are default
gateways or anycast destinations (anycast destination are after all
indistinguishable from ordinary destinations). So at least what I expect
is that /any/ seemingly equal route to a given network should be merged
into a multipath route if ecmp is enabled.
RFC 4786 (Operation of Anycast Services) talks about using ECMP with
anycast services, obviously mentioning that per-packet load-balancing
can be problematic with anycast, and that hash-based ECMP is preferred.
In other words, combining hash-based multipath with anycast may often be
preferable, and the OSPF algorithm ought to ensure that all active
routes to the anycast destination are of equal best cost.
>
> You patch merges such routes from different routers, but still keeps
> routes from different area. Few months ago, Volodymyr Samodid
> commented that ECMP in OSPF should merge paths from multiple areas.
Really? From RFC 2328 (OSPF Version 2) section 16.8 (page 178):
"Each one of the multiple routes will be of the same type
(intra-area, inter-area, type 1 external or type 2 external),
cost, and will have the same associated area. However, each
route may specify a separate next hop and Advertising router."
Arent't they saying that each route in the multipath entry must share the same associated area?
>
> So it seems that this should be at least configurable (like 'ecmp merge
> internal <bool>', 'ecmp merge external <bool>', 'ecmp merge areas <bool>').
> The question is how much detailed such configuration shouldbe. For example,
> it may be useful to merge external routes with the same route tag, but
> not merge external routes with different ones. And what about merging
> internal and external routes together, is this useful?
>
> Any thoughts on this issue?
At least from the RFC 2328 point of view, it apparently doesn't make
sense to merge the routes across different types of routes. But I guess
that boils down to the fact that they usually have different costs.
>
>
>> Essentially, I've hooked my multipath code into ri_install_ext() and
>> ri_install_net(), where I add the equal routes if the routes share the
>> same type, metrics and OSPF area.
>> I realize that my add_nexthops() is /very/ similar to merge_nexthops()
>> in functionality, but it seemed that the top_hash_entry() could be null,
>> so I wrote a new method which did not rely on that - at the cost of more
>> calls to copy_nexthop(), I guess.
>>
>> Any thoughts?
> The implementation looks clear and simple, i will look at it thoroughly
> in a few days. On the first look i see that the patch forgot to zero
> orta->rid and perhaps orta->tag if merged routes have it different.
>
Yeah, I guess clearing rid makes sense since the route is really from
different routers. As for the tag, I'm not sure what the expected
behavior is, since it is out of the scope of the OSPFv2 spec. Maybe that
is cause enough to make it tunable whether routes with different tags
can be merged.
Another thing I've personally noticed, is that I should probably also
check ORTA_NSSA, ORTA_PROP and ORTA_PREF when verifying equal cost
routes in ri_install_ext. ri_better_ext is after all taking them into
consideration.
More information about the Bird-users
mailing list