bird OSPF lsupd bug (FULL/EXCHANGE problem)

Alexander V. Chernikov melifaro at yandex-team.ru
Tue Sep 10 21:00:39 CEST 2013


Hello list!

There is a problem in OSPFv2/v3 lsupdate flooding code triggering 
incorrect state machine change.

The problem is triggered under the following OSPF instability conditions:
a) bird falls down to init state
b) DR router LSA seqnum immediately increases after that
c) some problems (like CoPP/policer/high CPU load) preventing DR to send 
DBD packets fast.

In that case the following can happen:
* Our local/remote fsm state is EXCHANGE
* Some small number of LSA's are sent (so we have no outstanding 
LSRequests other than DR router LSA)
* DR is a bit slow on sending the next portion
* Given router LSA is received via other neighbor (so we have empty LSR 
list)
* We are changing state to FULL while other side is stuck in EXCHANGE state.

(So in practice we can end with up to 50% neighbors stuck in EXCHANGE 
state (from DR point of view) in case of OSPF flapping..)



-------------- next part --------------
diff --git a/proto/ospf/lsupd.c b/proto/ospf/lsupd.c
index a5da425..55b7971 100644
--- a/proto/ospf/lsupd.c
+++ b/proto/ospf/lsupd.c
@@ -205,7 +205,7 @@ ospf_lsupd_flood(struct proto_ospf *po,
 	    en->lsa_body = NULL;
 	    DBG("Removing from lsreq list for neigh %R\n", nn->rid);
 	    ospf_hash_delete(nn->lsrqh, en);
-	    if (EMPTY_SLIST(nn->lsrql))
+	    if ((EMPTY_SLIST(nn->lsrql)) && (nn->state == NEIGHBOR_LOADING))
 	      ospf_neigh_sm(nn, INM_LOADDONE);
 	    continue;
 	    break;
@@ -216,7 +216,7 @@ ospf_lsupd_flood(struct proto_ospf *po,
 	    en->lsa_body = NULL;
 	    DBG("Removing from lsreq list for neigh %R\n", nn->rid);
 	    ospf_hash_delete(nn->lsrqh, en);
-	    if (EMPTY_SLIST(nn->lsrql))
+	    if ((EMPTY_SLIST(nn->lsrql)) && (nn->state == NEIGHBOR_LOADING))
 	      ospf_neigh_sm(nn, INM_LOADDONE);
 	    break;
 	  default:


More information about the Bird-users mailing list