Failing DNS resolution causes BGP sessions to flap

Ondřej Caletka ondrej at caletka.cz
Mon Oct 21 16:17:50 CEST 2024


Dear BIRD users,

I have recently noticed an interesting issue. A newly set up BGP session 
between BIRD on our side and a remote party running whatever was taken 
down every hour or so. The reason for the session being terminated was 
Received: Hold timer expired.

Looking at the logs, it turned out that this was caused by broken DNS 
resolver on the machine where BIRD is running. Whenever BIRD was trying 
to resolve the host name of the RPKI Validator cache, it got stuck for 
24 seconds. This was apparently enough for the other BGP speaker to 
consider it dead and take the session down (the hold timers were 
shortened from the standard values there).

Oct 15 13:45:51 vrtr-4.mtg.ripe.net bird[907]: rpki_validator: Cannot 
resolve hostname 'rpki-validator.mtg.ripe.net': >
Oct 15 13:45:51 vrtr-4.mtg.ripe.net bird[907]: I/O loop cycle took 
24009.001 ms for 1 events
Oct 15 13:45:51 vrtr-4.mtg.ripe.net bird[907]: Kernel dropped some 
netlink messages, will resync on next scan.
Oct 15 13:45:51 vrtr-4.mtg.ripe.net bird[907]: peer_as2852_v4: Received: 
Hold timer expired
Oct 15 13:46:55 vrtr-4.mtg.ripe.net bird[907]: peer_as2852_v4.ipv4: 
Automatic RPKI reload not active for import

This case was fixed by making sure DNS resolver works, but I still 
wonder whether this is a known limitation or whether this is something 
that can possibly be improved.

--
Best regards,

Ondřej Caletka


More information about the Bird-users mailing list