bird OSPF lsupd bug (FULL/EXCHANGE problem)
Alexander V. Chernikov
melifaro at yandex-team.ru
Tue Sep 10 21:00:39 CEST 2013
Hello list!
There is a problem in OSPFv2/v3 lsupdate flooding code triggering
incorrect state machine change.
The problem is triggered under the following OSPF instability conditions:
a) bird falls down to init state
b) DR router LSA seqnum immediately increases after that
c) some problems (like CoPP/policer/high CPU load) preventing DR to send
DBD packets fast.
In that case the following can happen:
* Our local/remote fsm state is EXCHANGE
* Some small number of LSA's are sent (so we have no outstanding
LSRequests other than DR router LSA)
* DR is a bit slow on sending the next portion
* Given router LSA is received via other neighbor (so we have empty LSR
list)
* We are changing state to FULL while other side is stuck in EXCHANGE state.
(So in practice we can end with up to 50% neighbors stuck in EXCHANGE
state (from DR point of view) in case of OSPF flapping..)
-------------- next part --------------
diff --git a/proto/ospf/lsupd.c b/proto/ospf/lsupd.c
index a5da425..55b7971 100644
--- a/proto/ospf/lsupd.c
+++ b/proto/ospf/lsupd.c
@@ -205,7 +205,7 @@ ospf_lsupd_flood(struct proto_ospf *po,
en->lsa_body = NULL;
DBG("Removing from lsreq list for neigh %R\n", nn->rid);
ospf_hash_delete(nn->lsrqh, en);
- if (EMPTY_SLIST(nn->lsrql))
+ if ((EMPTY_SLIST(nn->lsrql)) && (nn->state == NEIGHBOR_LOADING))
ospf_neigh_sm(nn, INM_LOADDONE);
continue;
break;
@@ -216,7 +216,7 @@ ospf_lsupd_flood(struct proto_ospf *po,
en->lsa_body = NULL;
DBG("Removing from lsreq list for neigh %R\n", nn->rid);
ospf_hash_delete(nn->lsrqh, en);
- if (EMPTY_SLIST(nn->lsrql))
+ if ((EMPTY_SLIST(nn->lsrql)) && (nn->state == NEIGHBOR_LOADING))
ospf_neigh_sm(nn, INM_LOADDONE);
break;
default:
More information about the Bird-users
mailing list