Babel in Bird 1.6.0

Baptiste Jonglez baptiste at bitsofnetworks.org
Sat Apr 30 15:47:50 CEST 2016


On Sat, Apr 30, 2016 at 03:15:52PM +0200, Toke Høiland-Jørgensen wrote:
> > === Configuration ===
> >
> > When the quotes are missing around the interface name, the error message
> > is misleading.  This configuration:
> >
> >     protocol babel {
> >         interface eth0;
> >     }
> >
> > gives:
> >
> >     /etc/bird6.conf, line 240: IP address expected
> >
> > which is very strange, because an IP address does not make sense here
> > (and something like "interface 2001:db8::42" obviously gives an
> > error).
> 
> Not sure why this is. The inner workings of the configuration parsing
> still has some a good portion of black magic...

Ah, I thought that this "interface" statement was specific to Babel, but
it's actually defined for all protocols.  The syntax seems fairly complex:

  http://bird.network.cz/?get_doc&f=bird-3.html#ss3.3

IP prefixes are allowed for OSPF, which explains the error message.

> > === Filtering ===
> >
> > When updating a filter that removes some routes (e.g. switching from
> > "export all" to "export none"), the routes stay in the Babel table, even
> > though they are no longer exported to the protocol:
> >
> > bird> show babel entries babel1
> > babel1:
> > Prefix                        Router ID               Metric Seqno Expires Sources
> > 2001:db8:42::/64              <pending>                                          1
> > bird> show route export babel1
> > bird>
> 
> These are routes coming from the box itself (not from peers)? They
> should expire after BABEL_HOLD_TIME (10 seconds)...

Yes, these are local routes, imported from the kernel.  I do see a
transient state of about 10 seconds, but this "pending" state happens
*after* the transient state.

Before changing the export filter:

Prefix                        Router ID               Metric Seqno Expires Sources
2001:db8:42::/64              <self>                       0     1       0       1

Just after running "configure" (the route will expire in 9 seconds):

Prefix                        Router ID               Metric Seqno Expires Sources
2001:db8:42::/64              <self>                   65535     1       9       1

After the route has expired:

Prefix                        Router ID               Metric Seqno Expires Sources
2001:db8:42::/64              <pending>                                          1

The route stays in this state for quite a while (around 5 minutes), then
disappears.

> > === Hello interval range checking ===
> >
> > The hello interval cannot be set below 1 second: the parser seems to
> > expect an integer.  The packet format encodes intervals in centiseconds,
> > so it would make sense to allow any fractional Hello interval down to 0.01
> > seconds.
> >
> > I just checked, babeld's parser allows to go as low as 0.001s (which is a
> > bit scary in itself, since it generates about 1 kpps / 1.5 Mbps of control
> > traffic).
> 
> Hmm, yes. However, since the internal Bird timers run at a granularity
> of seconds only there's not much point in having the ability to
> configure smaller values.

Fair enough, it seems reasonable.

> > === Interaction between reconfiguration and timers ===
> 
> Hmm, is this consistent? Reconfiguration simply replaces the struct with
> the values in the internal data structures, so the only reason I can see
> why you would get that behaviour is because there's a TLV that happens
> to be queued at the time you reconfigure. You can verify this by looking
> at the debug output when running with TRACE level debugging enabled;
> that outputs messages when the TLVs are generated, not when they are
> sent out.

Ok, I will try to have a closer look.

> I do seem to have forgotten to generate a new hello when reconfiguring
> (RFC section 3.4.1: "Equivalently, a node SHOULD send an unscheduled
> Hello immediately after increasing its Hello interval."). I do believe
> just adding that should resolve the issues with inconsistent behaviour?

Oh, I thought you had already taken that into account (by applying the new
interval only after the next Hello message, which is another way of
ensuring neighbours don't forget us), but that's possibly the queuing
issue you mention.

> > === TLV parsing error ===
> > Type 8 is the Update TLV, I think Bird complains about this particular
> > message (here seen by tcpdump):
> >
> >     Update/id ::/0 metric 65535 seqno 21902 interval infinity
> 
> I remember running into this. What happens here is that babeld sends an
> update without a preceding router_id TLV, with a wildcard address, but
> flag 0x40 set (meaning "infer the router ID from the address").
> 
> While I'm not sure what the purpose of this is (a null update with a
> null router ID with infinity metric and interval?) it *is* technically
> in spec.

Juliusz will certainly have an answer.

> I think the reason why Bird complains is that Ondrej's cleanup
> of my (admittedly messy) packet parsing code inadvertently moved the
> check for the 0x40 flag inside the case branch for AE_IP6.
> 
> If you turn on debugging you should get a "No router ID seen before
> update" message which will confirm that this is indeed the issue.

Hmm, even with "debug all" on the protocol, it only shows this:

avril 30 15:44:33 lud bird6[17966]: babel1: Packet received from fe80::e8db:78ff:fe05:8a64 via tap-fastd
avril 30 15:44:33 lud bird6[17966]: babel1: Bad TLV from fe80::e8db:78ff:fe05:8a64 via tap-fastd type 8 pos 18 - parse error
avril 30 15:44:33 lud bird6[17966]: babel1: Handling hello seqno 39103 interval 4

> Most of these issues are fairly trivial fixes. I'll produce a patch once
> I'm done grokking Ondrej's code cleanup changes :)

Great :)

Thanks,
Baptiste
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 819 bytes
Desc: not available
URL: <http://trubka.network.cz/pipermail/bird-users/attachments/20160430/20bdf5c0/attachment.asc>


More information about the Bird-users mailing list