BGP keepalive packets delayed during reconfigure

Damian Zaremba damian at damianzaremba.co.uk
Fri Mar 5 14:41:40 CET 2021


Hi,

I recently observed a case where bird 2.0.7 stopped emitting keepalive 
packets to configured neighbors during a protocol reconfigure.

To replicate this condition I have constructed a single BGP session that 
is fed a large number of filtered routes, the configuration structure 
resembles;

protocol static -> table source -> protocol pipe [import filters] -> 
table neighbor -> protocol bgp


Changing a route attribute in the source table and calling `reconfigure` 
results in the following observations;

1. Bird logs on the sender side show a slow reconfigure;

2021-03-05 10:28:19.872 <TRACE> bgp1: Sending KEEPALIVE
2021-03-05 10:28:34.886 <TRACE> bgp1: Sending KEEPALIVE
2021-03-05 10:28:53.772 <TRACE> bgp1: Sending KEEPALIVE
2021-03-05 10:28:56.259 <INFO> Reconfiguring
2021-03-05 10:45:13.606 <TRACE> bgp1: Sending KEEPALIVE
2021-03-05 10:45:13.611 <WARN> I/O loop cycle took 966448 ms for 1016 events
2021-03-05 10:45:13.612 <INFO> Reconfigured
2021-03-05 10:45:13.612 <TRACE> bgp1: Got KEEPALIVE
2021-03-05 10:45:13.612 <TRACE> bgp1: Got KEEPALIVE
2021-03-05 10:45:13.612 <TRACE> bgp1: Got KEEPALIVE
2021-03-05 10:45:13.612 <RMT> bgp1: Received: Hold timer expired
2021-03-05 10:46:06.106 <TRACE> bgp1: Sending KEEPALIVE
2021-03-05 10:46:24.541 <TRACE> bgp1: Sending KEEPALIVE
2021-03-05 10:46:42.282 <TRACE> bgp1: Sending KEEPALIVE


2. Packet capture shows the client receiving no keepalives and then 
multiple close together;

$ cat bgp.pcap | pbgpp --filter-message-type=KEEPALIVE --formatter=LINE -
1614940133.773725    KEEPALIVE    None    None    None
1614940138.282074    KEEPALIVE    None    None    None
1614940154.308565    KEEPALIVE    None    None    None
1614940174.343514    KEEPALIVE    None    None    None
1614940190.374193    KEEPALIVE    None    None    None
1614941113.607580    KEEPALIVE    None    None    None
1614941166.106192    KEEPALIVE    None    None    None
1614941166.106316    KEEPALIVE    None    None    None
1614941183.134392    KEEPALIVE    None    None    None
1614941184.541810    KEEPALIVE    None    None    None
1614941200.157543    KEEPALIVE    None    None    None
1614941202.282814    KEEPALIVE    None    None    None
1614941214.180433    KEEPALIVE    None    None    None
1614941217.732128    KEEPALIVE    None    None    None
1614941234.209164    KEEPALIVE    None    None    None
1614941236.416377    KEEPALIVE    None    None    None
1614941250.233110    KEEPALIVE    None    None    None
1614941253.668823    KEEPALIVE    None    None    None
1614941268.260864    KEEPALIVE    None    None    None
1614941269.668392    KEEPALIVE    None    None    None
1614941285.289222    KEEPALIVE    None    None    None
1614941287.120920    KEEPALIVE    None    None    None
1614941301.314094    KEEPALIVE    None    None    None
1614941302.270840    KEEPALIVE    None    None    None
1614941319.342323    KEEPALIVE    None    None    None
1614941321.92028    KEEPALIVE    None    None    None
1614941335.366858    KEEPALIVE    None    None    None
1614941337.417243    KEEPALIVE    None    None    None
1614941352.392059    KEEPALIVE    None    None    None
1614941356.43815    KEEPALIVE    None    None    None
1614941369.420711    KEEPALIVE    None    None    None
1614941373.705122    KEEPALIVE    None    None    None
1614941386.445406    KEEPALIVE    None    None    None
1614941391.885447    KEEPALIVE    None    None    None
1614941403.473187    KEEPALIVE    None    None    None
1614941407.172799    KEEPALIVE    None    None    None
1614941419.498776    KEEPALIVE    None    None    None

The 6th packet is 15min after the 5th packet then they spread back out 
to the intervals seen in the steady state.

3. Client trips off on timers until we get to a converged state on the 
sender;

2021-03-05 10:29:53 <RMT> bgp1: Error: Hold timer expired
2021-03-05 10:34:05 <RMT> bgp1: Error: Hold timer expired
2021-03-05 10:39:38 <RMT> bgp1: Error: Hold timer expired
2021-03-05 10:43:48 <RMT> bgp1: Error: Hold timer expired

I suspect this is down to the scheduling of BGP packets more generally 
as we also observed the case where a route addition into a bgp protocol 
(as reported by trace) was not seen as an UPDATE by the client for 
multiple minutes during a reconfigure.

- Damian



More information about the Bird-users mailing list