BIRD and ECMP on Linux seems flaky
Arno Töll
arno.toell+bird at profitbricks.com
Mon Jan 11 18:38:39 CET 2016
Hi list,
I've been experimenting with bird's ECMP features added to the current git
head a while back [1] on my Debian based Linux system. I tried the setup below
with git head as of today.
I have three Linux routers running with bird. Two I called gw (gateway) one
being a frontend. The gateways are configured to establish one BGP session to
the frontend each, and advertising fd57::1 and fd57::2 to it.
The frontend accepts both advertisement, and exports both to the
kernel table. All output below comes from this bird. I configured bird
like this:
log syslog { debug, trace, info, remote, warning, error, auth, fatal, bug };
router id 10.3.101.3;
# Filters
filter kernel_export {
if net ~ [ fd57::/64{128,128} ] then accept;
reject;
}
# BGP Filters
filter bgp_import {
if net ~ [ fd57::/64{128,128} ] then accept;
reject;
}
filter bgp_export {
reject;
}
# Local devices
protocol device {
scan time 10;
}
protocol direct {
interface "*";
}
protocol kernel {
import none;
#learn;
merge paths on;
export filter kernel_export;
}
# BGP peers
protocol bgp 'gw1' {
description "gw1";
default bgp_local_pref 100;
local fc57::3 as 65001;
neighbor fc57::1 as 65000;
next hop self;
import filter bgp_import;
export filter bgp_export;
hold time 30;
error wait time 5, 30;
}
protocol bgp 'gw2' {
description "gw2";
default bgp_local_pref 100;
local fc57::3 as 65001;
neighbor fc57::2 as 65000;
next hop self;
import filter bgp_import;
export filter bgp_export;
hold time 30;
error wait time 5, 30;
}
On the system I have this address configuration:
root at debian:~# ip addr show eth0
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 64000 qdisc pfifo_fast state UP
group default qlen 1000
link/ether 02:01:93:0b:5a:e9 brd ff:ff:ff:ff:ff:ff
inet 10.10.216.12/24 brd 10.10.216.255 scope global eth0
valid_lft forever preferred_lft forever
inet6 fc57::3/64 scope global
valid_lft forever preferred_lft forever
inet6 fe80::1:93ff:fe0b:5ae9/64 scope link
valid_lft forever preferred_lft forever
in bird:
root at debian:~# birdc6
BIRD 1.5.0 ready.
bird> show protocols
name proto table state since info
device1 Device master up 14:53:54
direct1 Direct master up 14:53:54
kernel1 Kernel master up 14:53:54
static1 Static master up 14:53:54
gw1 BGP master up 14:53:58 Established
gw2 BGP master up 14:53:55 Established
bird> show route
fd57::2/128 via fc57::1 on eth0 [gw1 14:53:58] ! (100) [AS65000i]
via fc57::2 on eth0 [gw2 14:53:55] (100) [AS65000i]
fd57::1/128 via fc57::1 on eth0 [gw1 14:53:58] ! (100) [AS65000i]
via fc57::2 on eth0 [gw2 14:53:55] (100) [AS65000i]
fc57::/64 dev eth0 [direct1 14:53:54] * (240)
Result being, that the multipath routes are being installed into the kernel
routing table as expected once the sessions are up:
root at debian:~# ip -6 route show
fc57::/64 dev eth0 proto kernel metric 256
fd57::1 via fc57::1 dev eth0 proto bird metric 1024
fd57::1 via fc57::2 dev eth0 proto bird metric 1024
fd57::2 via fc57::1 dev eth0 proto bird metric 1024
fd57::2 via fc57::2 dev eth0 proto bird metric 1024
fe80::/64 dev eth0 proto kernel metric 256
However, after some time, bird seems to confuse itself by the routes it
installed and removes the multipath route again. This can be seen again in ip
route show:
root at debian:~# ip -6 route show
fc57::/64 dev eth0 proto kernel metric 256
fd57::1 via fc57::2 dev eth0 proto bird metric 1024
fd57::2 via fc57::2 dev eth0 proto bird metric 1024
fe80::/64 dev eth0 proto kernel metric 256
In bird they are however still received:
root at ps:~# birdc6 show route all
BIRD 1.5.0 ready.
fd57::2/128 via fc57::1 on eth0 [gw1 16:25:25] ! (100) [AS65000i]
Type: BGP unicast univ
BGP.origin: IGP
BGP.as_path: 65000
BGP.next_hop: fc57::1 fe80::1:bdff:feab:7f12
BGP.local_pref: 100
via fc57::2 on eth0 [gw2 16:25:23] (100) [AS65000i]
Type: BGP unicast univ
BGP.origin: IGP
BGP.as_path: 65000
BGP.next_hop: fc57::2 fe80::1:bdff:fe05:64b7
BGP.local_pref: 100
fd57::1/128 via fc57::1 on eth0 [gw1 16:25:25] ! (100) [AS65000i]
Type: BGP unicast univ
BGP.origin: IGP
BGP.as_path: 65000
BGP.next_hop: fc57::1 fe80::1:bdff:feab:7f12
BGP.local_pref: 100
via fc57::2 on eth0 [gw2 16:25:23] (100) [AS65000i]
Type: BGP unicast univ
BGP.origin: IGP
BGP.as_path: 65000
BGP.next_hop: fc57::2 fe80::1:bdff:fe05:64b7
BGP.local_pref: 100
...
In the bird log, with debug output enabled I can see:
Jan 11 15:12:46 ps bird6: gw1: Got KEEPALIVE
Jan 11 15:12:49 ps bird6: gw2: Got KEEPALIVE
Jan 11 15:12:50 ps bird6: gw1: Sending KEEPALIVE
Jan 11 15:12:52 ps bird6: gw2: Sending KEEPALIVE
Jan 11 15:12:54 ps bird6: device1: Scanning interfaces
Jan 11 15:12:54 ps bird6: kernel1: Scanning routing table
Jan 11 15:12:54 ps bird6: kernel1: fd57::1/128: will be updated
Jan 11 15:12:54 ps bird6: kernel1: fd57::1/128: already seen
Jan 11 15:12:54 ps bird6: kernel1: fd57::2/128: will be updated
Jan 11 15:12:54 ps bird6: kernel1: fd57::2/128: already seen
Jan 11 15:12:54 ps bird6: kernel1: Pruning table master
Jan 11 15:12:54 ps bird6: kernel1: fd57::2/128: updating
Jan 11 15:12:54 ps bird6: Netlink: File exists
Jan 11 15:12:54 ps bird6: kernel1: fd57::1/128: updating
Jan 11 15:12:54 ps bird6: Netlink: File exists
After a while, the problem fixes itself, both routes are being installed, and
then the problem reappears for the next cycle.
Is there a way around this, or is this actually a bug? To me this looks like
bird was scanning it's own routes and falsely scans only one of them.
Experimenting with "import all", "learn" etc. for the kernel protocol seems to
make no difference.
[1]
https://gitlab.labs.nic.cz/labs/bird/commit/8d9eef17713a9b38cd42bd59c4ce76c3ef6c2fc2
--
Arno Töll
GnuPG Key-ID: 0x9D80F36D
More information about the Bird-users
mailing list