BGP Keepalive timer wedging
Chris Caputo
ccaputo at alt.net
Wed Aug 20 17:00:04 CEST 2014
On Wed, 20 Aug 2014, Ondrej Zajicek wrote:
> On Wed, Aug 20, 2014 at 01:02:27AM +0000, Chris Caputo wrote:
> > At the Seattle IX we are using BIRD 1.4.4 for our native (non-VM) route
> > servers.
> >
> > With one particular IPv4 peer, on two different route servers, I am seeing
> > "Keepalive timer" count down to zero and then becoming wedged/stalled.
> > Tcpdump fails to show a keepalive message being sent, while it does show
> > them being received from the peer.
> ...
> > with Hold timer getting updated over time, but the Keepalive timer doesn't
> > change after it has its initial countdown to zero. The peer eventually
> > signals "ex: Received: Hold timer expired" once it goes 180 seconds
> > without a BGP update, since it also hasn't gotten any keepalive messages.
> >
> > I've looked at the code and haven't found a problem. The other 64
> > similarly configured peers on the route server are working fine.
> >
> > Has anyone seen this or have any suggestions?
>
> Hi
>
> I would guess that the problem is in the TCP connection to the peer - BGP
> packets are sent, not acknowledged, TX queue became full and TX hook is
> not called anymore (Keepalive timer is restarted in TX hook when
> previously scheduled Keepalive is sent). You should check whether other
> packets are propagated (e.g. updates from both sides), esp. when the
> connection is already in keepalive 0/60 state.
Ondrej,
You are correct:
Proto Recv-Q Send-Q Local Address Foreign Address State
tcp 0 42340 206.81.80.2:179 206.81.80.xx:35237 ESTABLISHED
I should have caught that.
Thank you,
Chris
More information about the Bird-users
mailing list