babel RTT metric false samples

Toke Høiland-Jørgensen toke at toke.dk
Fri Apr 12 16:22:50 CEST 2024


Stephanie Wilde-Hobbs via Bird-users <bird-users at network.cz> writes:

> Hi,
>
> The babel RTT metric measurements provided by bird appears suspect for 
> my setup. The metric through a tunnel with a latency of about 5ms is 
> shown in babel as 150+ms.
>
> Can others replicate this issue? (should be easy to check for other 
> babel users since RTT measurement is on by default in recent versions)
>
> First I suspected a problem with the tunnel, but I compared bird's babel 
> RTT measurement against a long-running ping for the same time period and 
> got ~160ms measured by bird's babel implementation, and 4.6ms with a 
> 28ms maximum latency reported by pings in the same wireguard tunnel. 
> Other machines across my network also report similarly inflated RTT 
> metrics for all non-wired links.
>
> Debug logs show many RTT samples with approximately correct timestamps 
> (4-6ms) then the occasional IHU with 800-1200ms calculated instead. 
> Calculating the RTT metric by hand using babel packet logs shows that 
> the calculations are correct. By correlating two packet dumps (the 
> machines have <1ms NTP timekeeping) I can also see that the packets for 
> which high RTT is calculated have similar transit times through the 
> tunnel as other packets. Hence, I suspect the accuracy of the packet 
> timestamps recorded by bird. Is the current packet timestamping system 
> giving correct timestamps if the packet arrives while babel is 
> processing another event?

Hmm, so Babel implementation in Bird tries to get a timestamp as early
as possible after receiving the packet, and set it as late as possible
before sending out the packet. However, the former in practice means
after returning from poll(), so if the packet has been sitting around in
the OS buffer for a while before Bird gets around to process it, the
timestamp is not set until Bird is done processing it. Likewise, if the
packet sits around in a socket buffer (or in a lower-level buffer on the
sending side) after Bird has sent it out, that time will also be counted
as part of the RTT.

You may be able to figure out on which side the inflated RTTs happen by
looking at differences in timestamps between a packet capture and the
values Bird puts into the packet. The timestamps Bird writes are just
CLOCK_MONOTONIC times in microsecond resolution, so they should be
comparable to packet timestamps if you can get the offset between the
system's monotonic and real-time clocks at the time of the packet
capture.

-Toke


More information about the Bird-users mailing list