[PATCH] babel: Send out low-interval hello on shutdown

Toke Høiland-Jørgensen toke at toke.dk
Wed May 4 15:38:55 CEST 2022


Ondrej Zajicek <santiago at crfreenet.org> writes:

> On Sat, Apr 23, 2022 at 10:31:26PM +0200, Toke Høiland-Jørgensen wrote:
>> Ondrej Zajicek <santiago at crfreenet.org> writes:
>> > BTW, when we added CI tests for Babel authentication, we noticed that it
>> > has rather slow convergence after reconfiguration. The reason is that
>> > when authentication changed to become non-matching, it took many missed
>> > (misauthenticated) hellos to sufficiently clean up hello_map for neighbor
>> > to go down.
>> 
>> Just to make sure I'm understanding the scenario correctly: A node is
>> reconfigured to turn on authentication, but not all peers use it; so now
>> some peers are essentially cut off (basically like if they just dropped
>> off the network).
>
> See:
>
> https://gitlab.nic.cz/labs/bird-tools/-/tree/master/netlab/cf-babel-auth
>
> We just change keys to be matching/mismatched (although enabling /
> disabling auth is a similar case)
>
>> However, their hello history remain, so their routes
>> stay active until they time out. Right?
>
> Yes
>
>
>> > Perhaps there could be some decision in iface reconfiguration that the
>> > change is significant to affect reachability of neighbors and in such
>> > case deprecate some/most items in hello_map.
>> 
>> Hmm, yeah, we could do something like that I suppose. I'm wondering if
>> it should really be stronger, though? Enabling auth on an
>> already-running instance is an increase in the "security level" of the
>> interface, so should we really be keeping unauthenticated data around at
>> all? I.e., maybe we should simply flush all neighbour entries on an
>> interface when enabling auth on that interface?
>
> I would prefer to not make it that disruptive. Also it is not just
> enabling auth, but in general changing keys.
>
>
>> Or another, slightly less disruptive, option is to flush a neighbour
>> if we receive a packet from it that fails auth, and that neighbour
>> doesn't have the 'auth_passed' flag set?
>> For existing neighbours that
>> succeeds auth, that flag should be set immediately on the next packet we
>> receive from that neighbour, whereas this would quickly clear out
>> neighbours that fail it. We could maybe speed things up further by
>> immediately issuing auth challenges to all neighbours when the config
>> changes?
>
> I thing that the current approach where there is some leeway between
> reconfiguration and dropping a neighbor is basically ok and it is better
> to just ignore failed auths than trigger some action based on that. Just
> that with full hello history it took too long. So perhaps just removing
> 3/4 of hellos from hello history?

Hmm, right, that sounds reasonable. Something like the below, then?

-Toke

diff --git a/proto/babel/babel.c b/proto/babel/babel.c
index 4a7d550f5efe..55ae5f972573 100644
--- a/proto/babel/babel.c
+++ b/proto/babel/babel.c
@@ -1861,8 +1861,25 @@ babel_reconfigure_iface(struct babel_proto *p, struct babel_iface *ifa, struct b
 
   babel_iface_update_buffers(ifa);
 
-  if ((new->auth_type != BABEL_AUTH_NONE) && (new->auth_type != old->auth_type))
-    babel_auth_reset_index(ifa);
+  if (new->auth_type != BABEL_AUTH_NONE)
+  {
+    struct babel_neighbor *n;
+
+    if (new->auth_type != old->auth_type)
+      babel_auth_reset_index(ifa);
+
+    WALK_LIST(n, ifa->neigh_list)
+    {
+      /* trim the hello history on auth change to make sure neighbours expire
+       * quickly if they no longer pass auth
+       */
+      if (n->hello_cnt < 2)
+        continue;
+
+      n->hello_cnt = MAX(2, n->hello_cnt / 4);
+      n->hello_map &= 0xFFFF >> (16 - n->hello_cnt);
+    }
+  }
 
   if (ipa_zero(ifa->next_hop_ip4) && p->ip4_channel)
     log(L_WARN "%s: Missing IPv4 next hop address for %s", p->p.name, ifa->ifname);



More information about the Bird-users mailing list