Issues in Babel

Thu Oct 19 13:23:45 CEST 2017

Ondrej Zajicek <santiago at crfreenet.org> writes:

> Hi Toke (and others)
>
> I updated Babel to use microsecond timers [1] and during testing i
> noticed several unrelated issues.

Ah, awesome, thanks for doing that! :)

> 1) Hello expiration was not properly implemented, as i understand
> RFC, it should reset the timer based on last packet hello interval
> and time-out repeatedly until hello map is zero. Fixed in [2].

Yeah, I think you are right. However, your fix doesn't apply
BABEL_HELLO_EXPIRY_FACTOR to the value stored in last_hello_int, which
is probably a bug, right?

> 2) In some setups unicast seqno requests were sent repeatedly and seqno
> was increased indefinitely due to last case in babel_handle_update(),
> when the RFC specifies it should be done only if unfeasible route would
> otherwise become best. Fixed in [3], by this condition handling all
> unicast seqno requests:
>
>   if (!feasible && (metric != BABEL_INFINITY) &&
>       (!best || (r == best) || (metric < best->metric)))
>     babel_unicast_seqno_request(e, s, nbr);
>
> It also changes the behavior by sending seqno request when receiving
> infeasible update if there is no previous route from that neighbor.

Yup, this seams reasonable as well.

>
> 3) Seems like the fix of hello expiration caused that when neighbor
> disappears, it is no longer removed timely and routes have to wait
> for route timeout, which is about a minute for 4 s hello interval.
>
> It seems that by RFC, we should recompute route metrics whanever rxcost
> reported by IHU is changed, so IHU expiration should be enough to make
> all related routes unreachable. Apparently, this is not done in BIRD.

Hmm, yeah, babel_expire_ihu() should probably do something similar to
babel_flush_neighbor(), i.e., loop through all selected routes from that
neighbor and re-run the route selection on them?

> 4) Seems like we do not trigger updates when a route update with just
> increased seqno due to forwarded unicast request is received. That also
> slows convergence. IMHO we should do in babel_handle_update() something
> like:
>
> if (route_is_selected() && matching_entry_in_seqno_request_cache())
>   babel_trigger_update()

Not sure we should use the request cache for this? We might not have
seen the request, but a downstream still needs the updated seqno; so
maybe just do something like:

diff --git a/proto/babel/babel.c b/proto/babel/babel.c
index c5d695c9..06e1d3ad 100644
--- a/proto/babel/babel.c
+++ b/proto/babel/babel.c
@@ -1219,6 +1219,11 @@ babel_handle_update(union babel_msg *m, struct babel_iface *ifa)
   }
   else
   {
+    /* Trigger an update if we get a seqno increase for a selected route, to
+       improve convergence time */
+    if (r->seqno < msg->seqno && r == best)
+      babel_trigger_update(p);
+
     /* Last paragraph above - update the entry */
     r->advert_metric = msg->metric;
     r->metric = metric;


> Do you think that my interpretation of these issues and the fixes are
> correct?

Think so, yeah; nice job. And thanks for taking the time to go over this :)

-Toke