BGP keepalive packets delayed during reconfigure
Maria Matejka
maria.matejka at nic.cz
Fri Mar 5 14:59:43 CET 2021
Hello!
This is simply a result of a large configuration file. BIRD took more
than 16 minutes to parse and apply new config. As BIRD is
single-threaded, nothing is done while parsing the new config.
This is going to be fixed by using more CPU threads. Some preview of a
version doing exactly this with configuration may be found in GIT in
commit c78f88c91c7b75f530d70d503fa011c20f16f278.
Use with caution, we have not finished all tests yet.
Maria
On 3/5/21 2:41 PM, Damian Zaremba wrote:
> Hi,
>
> I recently observed a case where bird 2.0.7 stopped emitting keepalive
> packets to configured neighbors during a protocol reconfigure.
>
> To replicate this condition I have constructed a single BGP session that
> is fed a large number of filtered routes, the configuration structure
> resembles;
>
> protocol static -> table source -> protocol pipe [import filters] ->
> table neighbor -> protocol bgp
>
>
> Changing a route attribute in the source table and calling `reconfigure`
> results in the following observations;
>
> 1. Bird logs on the sender side show a slow reconfigure;
>
> 2021-03-05 10:28:19.872 <TRACE> bgp1: Sending KEEPALIVE
> 2021-03-05 10:28:34.886 <TRACE> bgp1: Sending KEEPALIVE
> 2021-03-05 10:28:53.772 <TRACE> bgp1: Sending KEEPALIVE
> 2021-03-05 10:28:56.259 <INFO> Reconfiguring
> 2021-03-05 10:45:13.606 <TRACE> bgp1: Sending KEEPALIVE
> 2021-03-05 10:45:13.611 <WARN> I/O loop cycle took 966448 ms for 1016
> events
> 2021-03-05 10:45:13.612 <INFO> Reconfigured
> 2021-03-05 10:45:13.612 <TRACE> bgp1: Got KEEPALIVE
> 2021-03-05 10:45:13.612 <TRACE> bgp1: Got KEEPALIVE
> 2021-03-05 10:45:13.612 <TRACE> bgp1: Got KEEPALIVE
> 2021-03-05 10:45:13.612 <RMT> bgp1: Received: Hold timer expired
> 2021-03-05 10:46:06.106 <TRACE> bgp1: Sending KEEPALIVE
> 2021-03-05 10:46:24.541 <TRACE> bgp1: Sending KEEPALIVE
> 2021-03-05 10:46:42.282 <TRACE> bgp1: Sending KEEPALIVE
>
>
> 2. Packet capture shows the client receiving no keepalives and then
> multiple close together;
>
> $ cat bgp.pcap | pbgpp --filter-message-type=KEEPALIVE --formatter=LINE -
> 1614940133.773725 KEEPALIVE None None None
> 1614940138.282074 KEEPALIVE None None None
> 1614940154.308565 KEEPALIVE None None None
> 1614940174.343514 KEEPALIVE None None None
> 1614940190.374193 KEEPALIVE None None None
> 1614941113.607580 KEEPALIVE None None None
> 1614941166.106192 KEEPALIVE None None None
> 1614941166.106316 KEEPALIVE None None None
> 1614941183.134392 KEEPALIVE None None None
> 1614941184.541810 KEEPALIVE None None None
> 1614941200.157543 KEEPALIVE None None None
> 1614941202.282814 KEEPALIVE None None None
> 1614941214.180433 KEEPALIVE None None None
> 1614941217.732128 KEEPALIVE None None None
> 1614941234.209164 KEEPALIVE None None None
> 1614941236.416377 KEEPALIVE None None None
> 1614941250.233110 KEEPALIVE None None None
> 1614941253.668823 KEEPALIVE None None None
> 1614941268.260864 KEEPALIVE None None None
> 1614941269.668392 KEEPALIVE None None None
> 1614941285.289222 KEEPALIVE None None None
> 1614941287.120920 KEEPALIVE None None None
> 1614941301.314094 KEEPALIVE None None None
> 1614941302.270840 KEEPALIVE None None None
> 1614941319.342323 KEEPALIVE None None None
> 1614941321.92028 KEEPALIVE None None None
> 1614941335.366858 KEEPALIVE None None None
> 1614941337.417243 KEEPALIVE None None None
> 1614941352.392059 KEEPALIVE None None None
> 1614941356.43815 KEEPALIVE None None None
> 1614941369.420711 KEEPALIVE None None None
> 1614941373.705122 KEEPALIVE None None None
> 1614941386.445406 KEEPALIVE None None None
> 1614941391.885447 KEEPALIVE None None None
> 1614941403.473187 KEEPALIVE None None None
> 1614941407.172799 KEEPALIVE None None None
> 1614941419.498776 KEEPALIVE None None None
>
> The 6th packet is 15min after the 5th packet then they spread back out
> to the intervals seen in the steady state.
>
> 3. Client trips off on timers until we get to a converged state on the
> sender;
>
> 2021-03-05 10:29:53 <RMT> bgp1: Error: Hold timer expired
> 2021-03-05 10:34:05 <RMT> bgp1: Error: Hold timer expired
> 2021-03-05 10:39:38 <RMT> bgp1: Error: Hold timer expired
> 2021-03-05 10:43:48 <RMT> bgp1: Error: Hold timer expired
>
> I suspect this is down to the scheduling of BGP packets more generally
> as we also observed the case where a route addition into a bgp protocol
> (as reported by trace) was not seen as an UPDATE by the client for
> multiple minutes during a reconfigure.
>
> - Damian
>
More information about the Bird-users
mailing list