Missing bable routes after configure - bug in bird 2.0.8?

Toke Høiland-Jørgensen toke at toke.dk
Thu Apr 29 17:55:00 CEST 2021


Nico Schottelius <nico.schottelius at ungleich.ch> writes:

> Hello,
>
> we are using direct + babel as an IGP and today something "funky"
> happened: the routes of one router disappeared and on other routers the
> babel entries contained empty router IDs in babel:

Hmm, the obvious reason for this would be if Babel no longer considers
that router reachable. The output of 'show babel neighbors' and 'show
babel routes' from both sides (while this is going on) might shed some
light on this.

> bird> show babel entries
> babel1:
> Prefix                   Router ID               Metric Seqno  Routes Sources
> 185.203.114.0/23         00:00:00:00:93:4e:c3:fb      0  2204       4       0
> 195.141.230.102/31       <none>                       -     -       1       0
> 185.203.112.0/24         <none>                       -     -       2       0
> 147.78.195.224/27        00:00:00:00:93:4e:c3:fb      0  2204       7       0
> 193.192.225.72/31        <none>                       -     -       1       0
> 94.78.2.168/29           00:00:00:00:93:4e:c3:fb      0  2204       4       0
> 147.78.195.0/29          <none>                       -     -       1       0
> 185.155.31.0/24          00:00:00:00:93:4e:c3:fb      0  2204       4       0
> 45.134.132.0/29          00:00:00:00:93:4e:c3:fb      0  2204       4       0
> 185.155.31.0/26          00:00:00:00:93:4e:c3:fb      0  2204       4       0
> 2a0a:e5c0:1:f::/64       <none>                       -     -       1       0
> 2a0a:e5c0:1:e::/64       <none>                       -     -       1       0
> 2a0a:e5c0:1:d::/64       00:00:00:00:93:4e:c3:fb      0  2204       5       0
> 2a0a:e5c0:0:9::/64       00:00:00:00:00:00:00:2f     96     1       1       0
> 2a0a:e5c0:1:8::/64       00:00:00:00:93:4e:c3:fb      0  2204      10       0
> 2a0a:e5c0:0:6::/64       00:00:00:00:00:00:00:2e     96     2       1       0
> 2a0a:e5c0:0:5::/64       <none>                       -     -       2       0
> 2a0a:e5c0:0:4::/64       00:00:00:00:00:00:00:2f     96     1       1       0
> 2a0a:e5c0:1:4::/64       00:00:00:00:93:4e:c3:fb      0  2204       5       0
> 2a0a:e5c0:0:3::/64       00:00:00:00:00:00:00:2e     96     2       1       0
> 2a0a:e5c0:0:2::/64       <none>                       -     -       2       0
> 2a0a:e5c0:2:2::/64       00:00:00:00:93:4e:c3:fb      0  2204       5       0
> 2a0a:e5c0:10:1::/64      <none>                       -     -       1       0
>
> After restarting the router, the entries are correct now (router id
> 00:00:00:00:93:4e:c3:e3):

How long did you wait before and after restarting?

> bird> show babel entries
> babel1:
> Prefix                   Router ID               Metric Seqno  Routes Sources
> 185.203.114.0/23         00:00:00:00:93:4e:c3:fb      0  2204       4       0
> 195.141.230.102/31       <none>                       -     -       1       0
> 185.203.112.0/24         <none>                       -     -       2       0
> 147.78.195.224/27        00:00:00:00:93:4e:c3:fb      0  2204       7       0
> 193.192.225.72/31        <none>                       -     -       1       0
> 94.78.2.168/29           00:00:00:00:93:4e:c3:fb      0  2204       4       0
> 147.78.195.0/29          00:00:00:00:93:4e:c3:e3     96     1       1       0
> 185.155.31.0/24          00:00:00:00:93:4e:c3:fb      0  2204       4       0
> 45.134.132.0/29          00:00:00:00:93:4e:c3:fb      0  2204       4       0
> 185.155.31.0/26          00:00:00:00:93:4e:c3:fb      0  2204       4       0
> 2a0a:e5c0:1:f::/64       00:00:00:00:93:4e:c3:e3     96     1       1       0
> 2a0a:e5c0:1:e::/64       00:00:00:00:93:4e:c3:e3     96     1       1       0
> 2a0a:e5c0:1:d::/64       00:00:00:00:93:4e:c3:fb      0  2204       5       0
> 2a0a:e5c0:0:9::/64       00:00:00:00:00:00:00:2f     96     1       1       0
> 2a0a:e5c0:1:8::/64       00:00:00:00:93:4e:c3:fb      0  2204      10       0
> 2a0a:e5c0:0:6::/64       00:00:00:00:00:00:00:2e     96     2       1       0
> 2a0a:e5c0:0:5::/64       <none>                       -     -       2       0
>
> The "funny" part is that bird 2.0.8 on that router was up and running,
> it did see babel neighbors, it did BGP, however only after restarting
> bird, the routes were correctly received be neighboring routers.

How did you determine that it "sees babel neighbors"?

> As you can see in above table there is at least another router that is
> affected. I tried `disable babel1` and `enable babel1` on it, however
> that did not fix the problem.
>
> We did some amount of `configure` commands on the bird process, but my
> understanding was that it should be very similar to restarting it,
> without the loss of sessions. The babel protocol *did* previously export
> the routes correctly to other routers, but I am 99% sure that it did
> stop doing it, until we hard restarted bird.

What did you do to determine that? What's the output of the 'babel show
*' commands on the affected router itself?

> The babel configuration we use on most of our routers looks like this:
>
> protocol direct {
>     ipv4;
>     ipv6;
>     interface "bond0.*";
> }
>
> protocol babel {
>         interface "bond0.*" {
>                 type wired;
>         };
>
>         ipv4 {
>                 export where (source = RTS_DEVICE);
>         };
>         ipv6 {
>                 export where (source = RTS_DEVICE);
>         };
> }
>
> So we are not re-babling like babeld, but only inject local device
> routes, as we have 2 redundant routers per network.
>
> Is there a conceptual problem or is this a bug in bird 2.0.8 and in
> either way, is there anything we can do to fix it besides restarting
> bird on router propagation error?

If you're not re-exporting the babel routes you lose the resiliency of
going through additional hops, of course (i.e., if you re-export, the
routers can re-route through one another as long as one of them has a
valid route to the destination), but I assume that's what you're trying
to avoid(?), so in that case there should not be any conceptual issue
with what you're doing AFAICT...

-Toke


More information about the Bird-users mailing list