BGP graceful restart and BFD

Ondrej Zajicek santiago at crfreenet.org
Fri Dec 2 17:11:47 CET 2016


On Fri, Dec 02, 2016 at 11:42:25AM +0100, Vincent Bernat wrote:
> Hey!
> 
> I am trying to make BGP graceful restart work. First, I noticed that BGP
> graceful restart can only work if BIRD doesn't close cleanly the BGP
> session. Otherwise, an administrative shutdown is sent and the other end
> (also BIRD) cleans all routes and don't consider this as a graceful
> restart.
> 
>     2016-12-02 10:09:24 <RMT> R1: Received: Administrative shutdown
>     2016-12-02 10:09:24 <TRACE> R1: BGP session closed
>     2016-12-02 10:09:24 <TRACE> R1: State changed to stop
>     2016-12-02 10:09:24 <TRACE> R1 > removed [sole] 203.0.113.0/24 via 192.0.2.1 on eth0
> 
> Is that an expected behavior?

Hi

There are three different cases:

1) regular (administartive) shutdown/restart
2) planned graceful restart (e.g. software version update)
3) unplanned graceful restart (e.g. software crash and respawn)

Regular shutdown command does (1), so it is expected to see regular BGP
session shutdown. Case (3) should work without much problems. But there
is no explicit support for case (2), you have to use kill -9 as we are
missing some command that explicitly activates graceful restart.


> The second problem I run into is when using BFD. If I kill -9 bird, BFD
> will quickly detects the problem and shutdown the BGP session. It will
> not be considered a graceful restart either.

We should have better handling of C-bit in BFD (for example, we have
the same behavior regardless of neighbor's C-bit value). But still
there is a fundamental limitation of having BFD in control plane or
even in the same process.

There is one potential solution - for case (2), we could explicitly
shutdown BFD sessions when graceful restart is requested. As graceful
restart is just an avisory mechanism, BGP should survive shutdown of
BFD session, then regular BGP graceful restart should work.

Case (3) is more problematic. RFC 5882 specifies that with C-bit zero,
helper should avoid abort of graceful restart when BFD session fails. But
that will work only if graceful restart is detected before BFD session
failure is detected. I guess that may work in some cases (bird is killed
and OS immediately closes TCP socket for BGP session, which is detected
by other side).


> Unrelated to BGP restart but related to BFD, if one BGP peer has a
> temporary network issue, BFD will quickly close the session and then
> require a startup delay for the session. When the network outage is
> solved and one peer tries to reconnect, the session is rejected because
> of this startup delay:
> 
>     2016-12-02 11:03:55 <TRACE> R1: State changed to start
>     2016-12-02 11:03:55 <TRACE> R1: Startup delayed by 60 seconds due to errors
>     2016-12-02 11:04:02 <TRACE> R1: Incoming connection from 192.0.2.1 (port 49205) rejected
>     2016-12-02 11:04:07 <TRACE> R1: Incoming connection from 192.0.2.1 (port 36449) rejected
> 
> The delay can be configured to a lower value, but is it the expected
> behavior?

Yes, it was designed so any crash that causes tearing down of
an established BGP session is limited by this delay. So you don't
get session flapping with full BGP feed every few seconds.

Note that this should not happen when neighbor did graceful restart as
BGP stays in BSS_CONNECT in that case.


 The current code is:
> 
>     acc = (p->p.proto_state == PS_START || p->p.proto_state == PS_UP) &&
>       (p->start_state >= BSS_CONNECT) && (!p->incoming_conn.sk);
> 
> Could this be changed to?
> 
>     acc = (p->p.proto_state == PS_START || p->p.proto_state == PS_UP) &&
>       (p->start_state >= BSS_DELAY) && (!p->incoming_conn.sk);

That would just eliminate the delay for incoming connections altogether.

-- 
Elen sila lumenn' omentielvo

Ondrej 'Santiago' Zajicek (email: santiago at crfreenet.org)
OpenPGP encrypted e-mails preferred (KeyID 0x11DEADC3, wwwkeys.pgp.net)
"To err is human -- to blame it on a computer is even more so."
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 181 bytes
Desc: Digital signature
URL: <http://trubka.network.cz/pipermail/bird-users/attachments/20161202/2f5d59a8/attachment.asc>


More information about the Bird-users mailing list