Failing DNS resolution causes BGP sessions to flap
Ondřej Caletka
ondrej at caletka.cz
Mon Oct 21 16:17:50 CEST 2024
Dear BIRD users,
I have recently noticed an interesting issue. A newly set up BGP session
between BIRD on our side and a remote party running whatever was taken
down every hour or so. The reason for the session being terminated was
Received: Hold timer expired.
Looking at the logs, it turned out that this was caused by broken DNS
resolver on the machine where BIRD is running. Whenever BIRD was trying
to resolve the host name of the RPKI Validator cache, it got stuck for
24 seconds. This was apparently enough for the other BGP speaker to
consider it dead and take the session down (the hold timers were
shortened from the standard values there).
Oct 15 13:45:51 vrtr-4.mtg.ripe.net bird[907]: rpki_validator: Cannot
resolve hostname 'rpki-validator.mtg.ripe.net': >
Oct 15 13:45:51 vrtr-4.mtg.ripe.net bird[907]: I/O loop cycle took
24009.001 ms for 1 events
Oct 15 13:45:51 vrtr-4.mtg.ripe.net bird[907]: Kernel dropped some
netlink messages, will resync on next scan.
Oct 15 13:45:51 vrtr-4.mtg.ripe.net bird[907]: peer_as2852_v4: Received:
Hold timer expired
Oct 15 13:46:55 vrtr-4.mtg.ripe.net bird[907]: peer_as2852_v4.ipv4:
Automatic RPKI reload not active for import
This case was fixed by making sure DNS resolver works, but I still
wonder whether this is a known limitation or whether this is something
that can possibly be improved.
--
Best regards,
Ondřej Caletka
More information about the Bird-users
mailing list