BGP keepalive packets delayed during reconfigure
Damian Zaremba
damian at damianzaremba.co.uk
Fri Mar 5 14:41:40 CET 2021
Hi,
I recently observed a case where bird 2.0.7 stopped emitting keepalive
packets to configured neighbors during a protocol reconfigure.
To replicate this condition I have constructed a single BGP session that
is fed a large number of filtered routes, the configuration structure
resembles;
protocol static -> table source -> protocol pipe [import filters] ->
table neighbor -> protocol bgp
Changing a route attribute in the source table and calling `reconfigure`
results in the following observations;
1. Bird logs on the sender side show a slow reconfigure;
2021-03-05 10:28:19.872 <TRACE> bgp1: Sending KEEPALIVE
2021-03-05 10:28:34.886 <TRACE> bgp1: Sending KEEPALIVE
2021-03-05 10:28:53.772 <TRACE> bgp1: Sending KEEPALIVE
2021-03-05 10:28:56.259 <INFO> Reconfiguring
2021-03-05 10:45:13.606 <TRACE> bgp1: Sending KEEPALIVE
2021-03-05 10:45:13.611 <WARN> I/O loop cycle took 966448 ms for 1016 events
2021-03-05 10:45:13.612 <INFO> Reconfigured
2021-03-05 10:45:13.612 <TRACE> bgp1: Got KEEPALIVE
2021-03-05 10:45:13.612 <TRACE> bgp1: Got KEEPALIVE
2021-03-05 10:45:13.612 <TRACE> bgp1: Got KEEPALIVE
2021-03-05 10:45:13.612 <RMT> bgp1: Received: Hold timer expired
2021-03-05 10:46:06.106 <TRACE> bgp1: Sending KEEPALIVE
2021-03-05 10:46:24.541 <TRACE> bgp1: Sending KEEPALIVE
2021-03-05 10:46:42.282 <TRACE> bgp1: Sending KEEPALIVE
2. Packet capture shows the client receiving no keepalives and then
multiple close together;
$ cat bgp.pcap | pbgpp --filter-message-type=KEEPALIVE --formatter=LINE -
1614940133.773725 KEEPALIVE None None None
1614940138.282074 KEEPALIVE None None None
1614940154.308565 KEEPALIVE None None None
1614940174.343514 KEEPALIVE None None None
1614940190.374193 KEEPALIVE None None None
1614941113.607580 KEEPALIVE None None None
1614941166.106192 KEEPALIVE None None None
1614941166.106316 KEEPALIVE None None None
1614941183.134392 KEEPALIVE None None None
1614941184.541810 KEEPALIVE None None None
1614941200.157543 KEEPALIVE None None None
1614941202.282814 KEEPALIVE None None None
1614941214.180433 KEEPALIVE None None None
1614941217.732128 KEEPALIVE None None None
1614941234.209164 KEEPALIVE None None None
1614941236.416377 KEEPALIVE None None None
1614941250.233110 KEEPALIVE None None None
1614941253.668823 KEEPALIVE None None None
1614941268.260864 KEEPALIVE None None None
1614941269.668392 KEEPALIVE None None None
1614941285.289222 KEEPALIVE None None None
1614941287.120920 KEEPALIVE None None None
1614941301.314094 KEEPALIVE None None None
1614941302.270840 KEEPALIVE None None None
1614941319.342323 KEEPALIVE None None None
1614941321.92028 KEEPALIVE None None None
1614941335.366858 KEEPALIVE None None None
1614941337.417243 KEEPALIVE None None None
1614941352.392059 KEEPALIVE None None None
1614941356.43815 KEEPALIVE None None None
1614941369.420711 KEEPALIVE None None None
1614941373.705122 KEEPALIVE None None None
1614941386.445406 KEEPALIVE None None None
1614941391.885447 KEEPALIVE None None None
1614941403.473187 KEEPALIVE None None None
1614941407.172799 KEEPALIVE None None None
1614941419.498776 KEEPALIVE None None None
The 6th packet is 15min after the 5th packet then they spread back out
to the intervals seen in the steady state.
3. Client trips off on timers until we get to a converged state on the
sender;
2021-03-05 10:29:53 <RMT> bgp1: Error: Hold timer expired
2021-03-05 10:34:05 <RMT> bgp1: Error: Hold timer expired
2021-03-05 10:39:38 <RMT> bgp1: Error: Hold timer expired
2021-03-05 10:43:48 <RMT> bgp1: Error: Hold timer expired
I suspect this is down to the scheduling of BGP packets more generally
as we also observed the case where a route addition into a bgp protocol
(as reported by trace) was not seen as an UPDATE by the client for
multiple minutes during a reconfigure.
- Damian
More information about the Bird-users
mailing list