[PATCH v3] babel: Rework seqno request handling
Toke Høiland-Jørgensen
toke at toke.dk
Tue Dec 20 17:39:19 CET 2022
Ondrej Zajicek <santiago at crfreenet.org> writes:
> On Tue, Dec 20, 2022 at 03:22:32PM +0100, Toke Høiland-Jørgensen wrote:
>> Ondrej Zajicek <santiago at crfreenet.org> writes:
>>
>> > On Sat, Dec 17, 2022 at 09:45:59PM +0100, Toke Høiland-Jørgensen via Bird-users wrote:
>> >> The seqno request retransmission handling was tracking the destination that
>> >> a forwarded request was being sent to and always retransmitting to that
>> >> same destination. This is unnecessary because we only need to retransmit
>> >> requests we originate ourselves, not those we forward on behalf of
>> >> others; in fact retransmitting on behalf of others can lead to exponential
>> >> multiplication of requests, which would be bad.
>> >
>> > Hi
>> >
>> > Thanks for investigating the issue and the patch. It took me some time to
>> > process it, so answering now. I have some notes / suggestions:
>> >
>> > - In babel_expire_requests() the check for sr->forwarded is inside
>> > sr->count check, so BABEL_SEQNO_REQUEST_RETRY is applied for forwarded
>> > info, which does not make sense, perhaps it should be like:
>> > if (!sr->forwarded && (sr->count < BABEL_SEQNO_REQUEST_RETRY)) ...
>>
>> This was actually on purpose: we want to keep the request object around
>> for longer, so we can use it to trigger an update when it's satisfied.
>> So I figured it would be simpler to just keep the expiry logic without
>> actually transmitting anything. There's a check in
>> babel_add_seqno_request() which uses sr->count time-limit the duplicate
>> check to one update interval (which is also shorter for forwarded
>> requests).
>>
>> I'll update the comment to explain this.
>
> OK, i thought that forwarding would be supressed for retransmited seqno requests,
> but now i see there is added condition in babel_add_seqno_request() for it:
>
> (!sr->forwarded || (fwd_from && !sr->count))
>
>
> But i still think this modification is worse than keeping entries for the
> short time (without accounting for replies), for these reasons:
>
> 1) For variant with the short time, if the entry in the forwarding router
> expires, then originator re-send the request, forwarding router re-forward
> and re-create the entry, so satisfying logic still detect the reply and
> trigger an update.
Not if the update arrives between the time we expire the entry, and the
time we receive the retransmit. Which can easily happen if there's a
slight timing difference, or if we're talking to a different
implementation which has a different value for the expiry timer. So to
be reasonably sure we always do the right thing and trigger an update,
we really do need to keep the entry alive for longer. At the same time,
we can't just keep a long timeout, because then we'll suppress actual
retransmits.
> 2) This change effectively eliminates 'multiple forward check' once
> sr->count > 0. It would make sense to have 'one forward per one sr->count
> increase' limit. But then it is pretty much equivalent to the 'short time
> variant'
It doesn't quite eliminate it: if sr->count is >0 when a second
forwarded request arrives, the object will get recycled and sr->count
will get reset to 0. So there will be a new (short) period of duplicate
suppression each time a new retransmit is performed. This would be
identical if we removed the entry entirely after the first timeout.
> 3) It makes more sense to have one interval for both duplicate detection
> and reply detection.
But they serve different purposes: The duplicate suppression deals with
the case where lots of neighbours all lose a prefix at the same time and
send identical seqno requests for it. To deal with this we want a
*short* window of time where we suppress duplicates. Whereas a reply may
take longer to get to us, so we need a longer record to trigger the
update, as described above.
Reusing the expiry counter mechanism is maybe not the clearest way of
expressing this; if you prefer I can rework it to have an explicit
separate timer for duplicate suppression?
Something like (in babel_add_seqno_request()):
if (sr->forwarded)
{
sr->expires = current_time() + BABEL_SEQNO_FORWARD_EXPIRY; /* longish - 10 S? */;
sr->dup_suppress = current_time() + BABEL_SEQNO_DUPLICATE_SUPPRESS_TIME; /* 1 S */;
}
else
{
sr->expires = current_time() + BABEL_SEQNO_REQUEST_EXPIRY; /* current value */
}
Then we can do what you suggest and just remove the entry the first time
we hit the expiry time if sr->forwarded (and we'd change the check on
sr->count to look at sr->dup_suppress in add_seqno_request() of course).
WDYT?
-Toke
More information about the Bird-users
mailing list