[PATCH v3] babel: Rework seqno request handling

Tue Dec 20 18:21:59 CET 2022

On Tue, Dec 20, 2022 at 05:39:19PM +0100, Toke Høiland-Jørgensen wrote:
> Ondrej Zajicek <santiago at crfreenet.org> writes:
> > But i still think this modification is worse than keeping entries for the
> > short time (without accounting for replies), for these reasons:
> >
> > 1) For variant with the short time, if the entry in the forwarding router
> > expires, then originator re-send the request, forwarding router re-forward
> > and re-create the entry, so satisfying logic still detect the reply and
> > trigger an update.
> 
> Not if the update arrives between the time we expire the entry, and the
> time we receive the retransmit. Which can easily happen if there's a
> slight timing difference, or if we're talking to a different
> implementation which has a different value for the expiry timer. So to
> be reasonably sure we always do the right thing and trigger an update,
> we really do need to keep the entry alive for longer. At the same time,
> we can't just keep a long timeout, because then we'll suppress actual
> retransmits.

OK, i agree that keeping satisfy-detection for longer than
duplicate-suppression makes sense, to avoid issues with timing
differences.

(although there is IMHO no need to keep it mixed with re-sent counter and
make it as long as full timeout of originating router)

> > 2) This change effectively eliminates 'multiple forward check' once
> > sr->count > 0. It would make sense to have 'one forward per one sr->count
> > increase' limit. But then it is pretty much equivalent to the 'short time
> > variant'
> 
> It doesn't quite eliminate it: if sr->count is >0 when a second
> forwarded request arrives, the object will get recycled and sr->count
> will get reset to 0. So there will be a new (short) period of duplicate
> suppression each time a new retransmit is performed. This would be
> identical if we removed the entry entirely after the first timeout.

You are right, i missed that.

> > 3) It makes more sense to have one interval for both duplicate detection
> > and reply detection.
> 
> But they serve different purposes: The duplicate suppression deals with
> the case where lots of neighbours all lose a prefix at the same time and
> send identical seqno requests for it. To deal with this we want a
> *short* window of time where we suppress duplicates. Whereas a reply may
> take longer to get to us, so we need a longer record to trigger the
> update, as described above.
> 
> Reusing the expiry counter mechanism is maybe not the clearest way of
> expressing this; if you prefer I can rework it to have an explicit
> separate timer for duplicate suppression?

Yes, that would be much better, and would clearly communicate the purpose.

> Something like (in babel_add_seqno_request()):
> 
> if (sr->forwarded)
> {
>   sr->expires = current_time() + BABEL_SEQNO_FORWARD_EXPIRY; /* longish - 10 S? */;
>   sr->dup_suppress = current_time() + BABEL_SEQNO_DUPLICATE_SUPPRESS_TIME; /* 1 S */;
> }
> else
> {
>   sr->expires = current_time() + BABEL_SEQNO_REQUEST_EXPIRY; /* current  value */
> }
> 
> Then we can do what you suggest and just remove the entry the first time
> we hit the expiry time if sr->forwarded (and we'd change the check on
> sr->count to look at sr->dup_suppress in add_seqno_request() of course).
> 
> WDYT?

That would be great.

-- 
Elen sila lumenn' omentielvo

Ondrej 'Santiago' Zajicek (email: santiago at crfreenet.org)
OpenPGP encrypted e-mails preferred (KeyID 0x11DEADC3, wwwkeys.pgp.net)
"To err is human -- to blame it on a computer is even more so."