BIRD and ECMP on Linux seems flaky

Arno Töll arno.toell+bird at profitbricks.com
Mon Jan 11 18:38:39 CET 2016


Hi list,

I've been experimenting with bird's ECMP features added to the current git
head a while back [1] on my Debian based Linux system. I tried the setup below
with git head as of today.

I have three Linux routers running with bird. Two I called gw (gateway) one
being a frontend. The gateways are configured to establish one BGP session to
the frontend each, and advertising fd57::1 and fd57::2 to it.

The frontend accepts both advertisement, and exports both to the
kernel table. All output below comes from this bird. I configured bird
like this:



log syslog { debug, trace, info, remote, warning, error, auth, fatal, bug };

router id 10.3.101.3;

# Filters

filter kernel_export {
    if net ~ [ fd57::/64{128,128} ] then accept;
    reject;
}


# BGP Filters

filter bgp_import {
    if net ~ [ fd57::/64{128,128} ] then accept;
    reject;
}

filter bgp_export {
    reject;
}

# Local devices
protocol device {
    scan time 10;
}

protocol direct {
    interface "*";
}

protocol kernel {
    import none;
    #learn;
    merge paths on;
    export filter kernel_export;
}


# BGP peers

protocol bgp 'gw1' {
    description "gw1";
    default bgp_local_pref 100;
    local fc57::3 as 65001;
    neighbor fc57::1 as 65000;
    next hop self;
    import filter bgp_import;
    export filter bgp_export;
    hold time 30;
    error wait time 5, 30;
}

protocol bgp 'gw2' {
    description "gw2";
    default bgp_local_pref 100;
    local fc57::3 as 65001;
    neighbor fc57::2 as 65000;
    next hop self;
    import filter bgp_import;
    export filter bgp_export;
    hold time 30;
    error wait time 5, 30;
}

On the system I have this address configuration:

root at debian:~# ip addr show eth0
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 64000 qdisc pfifo_fast state UP
group default qlen 1000
    link/ether 02:01:93:0b:5a:e9 brd ff:ff:ff:ff:ff:ff
    inet 10.10.216.12/24 brd 10.10.216.255 scope global eth0
       valid_lft forever preferred_lft forever
    inet6 fc57::3/64 scope global
       valid_lft forever preferred_lft forever
    inet6 fe80::1:93ff:fe0b:5ae9/64 scope link
       valid_lft forever preferred_lft forever


in bird:


root at debian:~# birdc6
BIRD 1.5.0 ready.
bird> show protocols
name     proto    table    state  since       info
device1  Device   master   up     14:53:54
direct1  Direct   master   up     14:53:54
kernel1  Kernel   master   up     14:53:54
static1  Static   master   up     14:53:54
gw1      BGP      master   up     14:53:58    Established
gw2      BGP      master   up     14:53:55    Established
bird> show route
fd57::2/128        via fc57::1 on eth0 [gw1 14:53:58] ! (100) [AS65000i]
                   via fc57::2 on eth0 [gw2 14:53:55] (100) [AS65000i]
fd57::1/128        via fc57::1 on eth0 [gw1 14:53:58] ! (100) [AS65000i]
                   via fc57::2 on eth0 [gw2 14:53:55] (100) [AS65000i]
fc57::/64          dev eth0 [direct1 14:53:54] * (240)



Result being, that the multipath routes are being installed into the kernel
routing table as expected once the sessions are up:

root at debian:~# ip -6 route show
fc57::/64 dev eth0  proto kernel  metric 256
fd57::1 via fc57::1 dev eth0  proto bird  metric 1024
fd57::1 via fc57::2 dev eth0  proto bird  metric 1024
fd57::2 via fc57::1 dev eth0  proto bird  metric 1024
fd57::2 via fc57::2 dev eth0  proto bird  metric 1024
fe80::/64 dev eth0  proto kernel  metric 256

However, after some time, bird seems to confuse itself by the routes it
installed and removes the multipath route again. This can be seen again in ip
route show:

root at debian:~# ip -6 route show
fc57::/64 dev eth0  proto kernel  metric 256
fd57::1 via fc57::2 dev eth0  proto bird  metric 1024
fd57::2 via fc57::2 dev eth0  proto bird  metric 1024
fe80::/64 dev eth0  proto kernel  metric 256

In bird they are however still received:

root at ps:~# birdc6 show route all
BIRD 1.5.0 ready.
fd57::2/128        via fc57::1 on eth0 [gw1 16:25:25] ! (100) [AS65000i]
        Type: BGP unicast univ
        BGP.origin: IGP
        BGP.as_path: 65000
        BGP.next_hop: fc57::1 fe80::1:bdff:feab:7f12
        BGP.local_pref: 100
                   via fc57::2 on eth0 [gw2 16:25:23] (100) [AS65000i]
        Type: BGP unicast univ
        BGP.origin: IGP
        BGP.as_path: 65000
        BGP.next_hop: fc57::2 fe80::1:bdff:fe05:64b7
        BGP.local_pref: 100
fd57::1/128        via fc57::1 on eth0 [gw1 16:25:25] ! (100) [AS65000i]
        Type: BGP unicast univ
        BGP.origin: IGP
        BGP.as_path: 65000
        BGP.next_hop: fc57::1 fe80::1:bdff:feab:7f12
        BGP.local_pref: 100
                   via fc57::2 on eth0 [gw2 16:25:23] (100) [AS65000i]
        Type: BGP unicast univ
        BGP.origin: IGP
        BGP.as_path: 65000
        BGP.next_hop: fc57::2 fe80::1:bdff:fe05:64b7
        BGP.local_pref: 100
...


In the bird log, with debug output enabled I can see:

Jan 11 15:12:46 ps bird6: gw1: Got KEEPALIVE
Jan 11 15:12:49 ps bird6: gw2: Got KEEPALIVE
Jan 11 15:12:50 ps bird6: gw1: Sending KEEPALIVE
Jan 11 15:12:52 ps bird6: gw2: Sending KEEPALIVE
Jan 11 15:12:54 ps bird6: device1: Scanning interfaces
Jan 11 15:12:54 ps bird6: kernel1: Scanning routing table
Jan 11 15:12:54 ps bird6: kernel1: fd57::1/128: will be updated
Jan 11 15:12:54 ps bird6: kernel1: fd57::1/128: already seen
Jan 11 15:12:54 ps bird6: kernel1: fd57::2/128: will be updated
Jan 11 15:12:54 ps bird6: kernel1: fd57::2/128: already seen
Jan 11 15:12:54 ps bird6: kernel1: Pruning table master
Jan 11 15:12:54 ps bird6: kernel1: fd57::2/128: updating
Jan 11 15:12:54 ps bird6: Netlink: File exists
Jan 11 15:12:54 ps bird6: kernel1: fd57::1/128: updating
Jan 11 15:12:54 ps bird6: Netlink: File exists


After a while, the problem fixes itself, both routes are being installed, and
then the problem reappears for the next cycle.

Is there a way around this, or is this actually a bug? To me this looks like
bird was scanning it's own routes and falsely scans only one of them.
Experimenting with "import all", "learn" etc. for the kernel protocol seems to
make no difference.




[1]
https://gitlab.labs.nic.cz/labs/bird/commit/8d9eef17713a9b38cd42bd59c4ce76c3ef6c2fc2

--
Arno Töll
GnuPG Key-ID: 0x9D80F36D



More information about the Bird-users mailing list