OSPF stuck in 2-way state
Kenth Eriksson
Kenth.Eriksson at infinera.com
Mon Oct 7 14:23:56 CEST 2019
On Thu, 2019-10-03 at 19:04 +0200, Ondrej Zajicek wrote:
> CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you recognize the sender and know the content is safe.
>
>
> On Thu, Oct 03, 2019 at 06:36:05PM +0200, Ondrej Zajicek wrote:
> > > me.dr is 0 because ifa->drip is 0, but should it have declared itself
> > > as an eligible DR?
> >
> > Oh, you meant DR IP, not neghbor IP. I think that it is correct - the
> > process should start with the node idea of DR IP, which is initially
> > zero (RFC 2328 9.4. the first paragraph).
>
> Just reviewed the election code and it seems consistent to the algorithm
> in RFC 2328. If no electable neighbor is there and me.dr / me.bdr is
> initially zero, then it should elect itself as BDR in the first round
> (also elect itself as DR by the ndr==NULL condition), and elect itself
> as DR in the second election.
>
> Could you add some debug output to check the value of DR and BDR (both
> ID and IP) after first and possibly second round?
>
Collected some more logs when the state machine becomes stuck. It
appears as the eth1 has been DR when this stuck state happens. Note
that interface transitions from Down to DROther in one step and then
maintains that state even after prio is changed from 0 to 5.
2019-10-04 19:34:17.443 <TRACE> ospfv2_1 < interface eth1 changes link
2019-10-04 19:34:17.443 <TRACE> ospfv2_1: Interface eth1 changed state from DR to Loopback
2019-10-04 19:34:17.443 <TRACE> ospfv2_1: Removing interface eth1 (10.210.138.68/30) from area 2.2.2.2
2019-10-04 19:34:17.443 <TRACE> ospfv2_1: Neighbor 10.210.138.70 on eth1 changed state from Full to Down
2019-10-04 19:34:17.443 <TRACE> ospfv2_1: Neighbor 10.210.138.70 on eth1 removed
...
2019-10-04 19:34:17.443 <TRACE> kernel1 < interface eth1 changes link
2019-10-04 19:34:17.443 <TRACE> kernel2 < interface eth1 changes link
2019-10-04 19:34:17.443 <TRACE> direct1 < interface eth1 changes link
...
2019-10-04 19:34:22.562 <TRACE> ospfv2_1: Interface eth1 changed state from Loopback to Waiting
2019-10-04 19:34:22.611 <TRACE> ospfv2_1: HELLO packet received from nbr 10.210.138.70 on eth1
2019-10-04 19:34:22.611 <TRACE> ospfv2_1: New neighbor 10.210.138.70 on eth1, IP address 10.210.138.70
2019-10-04 19:34:22.611 <TRACE> ospfv2_1: Neighbor 10.210.138.70 on eth1 changed state from Down to Init
2019-10-04 19:34:22.611 <TRACE> ospfv2_1: Neighbor 10.210.138.70 on eth1 changed state from Init to 2-Way
...
2019-10-04 19:34:32.564 <TRACE> ospfv2_1: HELLO packet sent via eth1
2019-10-04 19:34:32.570 <TRACE> ospfv2_1: DBDES packet received from nbr 10.210.138.70 on eth1
2019-10-04 19:34:32.570 <TRACE> ospfv2_1: length 32
2019-10-04 19:34:32.570 <TRACE> ospfv2_1: router 10.210.138.70
2019-10-04 19:34:32.570 <TRACE> ospfv2_1: mtu 1500
2019-10-04 19:34:32.570 <TRACE> ospfv2_1: imms I M MS
2019-10-04 19:34:32.570 <TRACE> ospfv2_1: ddseq 948166447
2019-10-04 19:34:32.570 <TRACE> ospfv2_1: DBDES packet ignored - lesser state than ExStart
...
2019-10-04 19:34:47.370 <TRACE> ospfv2_1: HELLO packet sent via eth1
2019-10-04 19:34:47.370 <TRACE> ospfv2_1: Interface eth1 changed state from Waiting to Down
2019-10-04 19:34:47.370 <TRACE> ospfv2_1: Removing interface eth1 (10.210.138.68/30) from area 2.2.2.2
2019-10-04 19:34:47.370 <TRACE> ospfv2_1: Neighbor 10.210.138.70 on eth1 changed state from 2-Way to Down
2019-10-04 19:34:47.370 <TRACE> ospfv2_1: Neighbor 10.210.138.70 on eth1 removed
...
2019-10-04 19:35:40.066 <TRACE> ospfv2_1: Interface eth1 changed state from Down to DROther
2019-10-04 19:35:40.066 <TRACE> ospfv2_1: HELLO packet sent via eth1
2019-10-04 19:35:40.076 <TRACE> ospfv2_1: HELLO packet received from nbr 10.210.138.70 on eth1
2019-10-04 19:35:40.076 <TRACE> ospfv2_1: New neighbor 10.210.138.70 on eth1, IP address 10.210.138.70
2019-10-04 19:35:40.076 <TRACE> ospfv2_1: Neighbor 10.210.138.70 on eth1 changed state from Down to Init
2019-10-04 19:35:40.076 <TRACE> ospfv2_1: Neighbor 10.210.138.70 on eth1 changed state from Init to 2-Way
2019-10-04 19:35:40.109 <INFO> Reconfiguring
2019-10-04 19:35:40.109 <TRACE> ospfv2_1: Changing priority of eth1 from 0 to 5
2019-10-04 19:35:40.109 <TRACE> ospfv2_1: Reconfigured
...
2019-10-04 19:35:50.066 <TRACE> ospfv2_1: HELLO packet sent via eth1
2019-10-04 19:35:50.072 <TRACE> ospfv2_1: DBDES packet received from nbr 10.210.138.70 on eth1
2019-10-04 19:35:50.072 <TRACE> ospfv2_1: length 32
2019-10-04 19:35:50.072 <TRACE> ospfv2_1: router 10.210.138.70
2019-10-04 19:35:50.072 <TRACE> ospfv2_1: mtu 1500
2019-10-04 19:35:50.072 <TRACE> ospfv2_1: imms I M MS
2019-10-04 19:35:50.072 <TRACE> ospfv2_1: ddseq 948166575
2019-10-04 19:35:50.072 <TRACE> ospfv2_1: DBDES packet ignored - lesser state than ExStart
Next some more logs collected with LOCAL_DEBUG. The final call of
ospf_dr_election results in both DR and BDR 0. From that point, no moreelection attempts are done.
ospfv2_1: Neighbor? on iface eth1
ospfv2_1: Iface eth1 can_do_adj=0
OSPF: RX hook called (iface eth1, src 10.210.138.70, dst 10.210.138.69)
Neighbor state machine for 10.210.138.70 on eth1, event HelloReceived
SM on eth1. Event is 'WaitTimer'
(B)DR election.
1: ndr 00000000, bdr bfbb8100
2: ndr bfbb8100, bdr 00000000
DR=10.210.138.69, BDR=0.0.0.0
Neighbor state machine for 10.210.138.70 on eth1, event AdjOK?
ospfv2_1: Iface eth1 can_do_adj=1
OSPF: RX hook called (iface eth1, src 10.210.138.70, dst 10.210.138.69)
Neighbor state machine for 10.210.138.70 on eth1, event HelloReceived
Neighbor state machine for 10.210.138.70 on eth1, event NegotiationDone
OSPF: RX hook called (iface eth1, src 10.210.138.70, dst 10.210.138.69)
Neighbor state machine for 10.210.138.70 on eth1, event HelloReceived
Neighbor state machine for 10.210.138.70 on eth1, event ExchangeDone
...
OSPF: RX hook called (iface eth1, src 10.210.138.70, dst 224.0.0.6)
Neighbor state machine for 10.210.138.70 on eth1, event HelloReceived
Deleting LSA (Type: 2001 Id: 10.210.138.69 Rt: 10.210.138.69) from lsrtl for neighbor 10.210.138.70
Deleting LSA (Type: 2002 Id: 10.210.138.69 Rt: 10.210.138.69) from lsrtl for neighbor 10.210.138.70
SM on eth1. Event is 'LoopInd'
Neighbor state machine for 10.210.138.70 on eth1, event KillNbr
SM on eth1. Event is 'NeighborChange'
OSPF: RX hook called (iface eth1, src 10.210.138.70, dst 10.210.138.69)
OSPF: RX hook called (iface eth1, src 10.210.138.70, dst 224.0.0.5)
SM on eth1. Event is 'UnloopInd'
SM on eth1. Event is 'InterfaceUp'
OSPF: RX hook called (iface eth1, src 10.210.138.70, dst 224.0.0.5)
Allocating OSPF hash of order 6: 64 hash_entries, 0 low, 256 high
Allocating OSPF hash of order 6: 64 hash_entries, 0 low, 256 high
Neighbor state machine for 10.210.138.70 on eth1, event HelloReceived
Neighbor state machine for 10.210.138.70 on eth1, event 2-WayReceived
SM on eth1. Event is 'NeighborChange'
ospfv2_1: Neighbor? on iface eth1
ospfv2_1: Iface eth1 can_do_adj=0
SM on eth1. Event is 'InterfaceDown'
Neighbor state machine for 10.210.138.70 on eth1, event KillNbr
SM on eth1. Event is 'NeighborChange'
SM on eth1. Event is 'InterfaceUp'
OSPF: RX hook called (iface eth1, src 10.210.138.70, dst 224.0.0.5)
Allocating OSPF hash of order 6: 64 hash_entries, 0 low, 256 high
Allocating OSPF hash of order 6: 64 hash_entries, 0 low, 256 high
Neighbor state machine for 10.210.138.70 on eth1, event HelloReceived
Neighbor state machine for 10.210.138.70 on eth1, event 2-WayReceived
SM on eth1. Event is 'NeighborChange'
(B)DR election.
1: ndr 00000000, bdr 00000000
2: ndr 00000000, bdr 00000000
DR=0.0.0.0, BDR=0.0.0.0
ospfv2_1: Iface eth1 can_do_adj=0
> --
> Elen sila lumenn' omentielvo
>
> Ondrej 'Santiago' Zajicek (email: santiago at crfreenet.org)
> OpenPGP encrypted e-mails preferred (KeyID 0x11DEADC3,
> wwwkeys.pgp.net)
> "To err is human -- to blame it on a computer is even more so."
More information about the Bird-users
mailing list