Bug in OSPF
Ondrej Zajicek
santiago at crfreenet.org
Thu Aug 21 21:16:08 CEST 2008
Hello
After some time i looked again on bug in OSPF that i met before:
"Sometimes the link between two routers hangs. Bird reports for
example ptp/exstart at one side and ptp/exchange at the other side. Or
full/ptp at one side and nothing at the other side."
I observed three cases:
1) first node stuck in init/ptp, second node don't even see first node
as neighbour. According to tcpdump, first node don't send hello
packets.
2) first node (192.168.36.130) stuck in loading/ptp
tcpdump:
...
11:54:38.126821 IP 192.168.36.130 > 224.0.0.5: OSPFv2, Hello (1), length: 48
11:54:38.127048 IP 192.168.36.2 > 224.0.0.5: OSPFv2, Hello (1), length: 48
11:54:39.126493 IP 192.168.36.130 > 224.0.0.5: OSPFv2, Hello (1), length: 48
11:54:39.128045 IP 192.168.36.2 > 224.0.0.5: OSPFv2, Hello (1), length: 48
11:54:38.127613 IP 192.168.36.130 > 192.168.36.2: OSPFv2, LS-Request (3), length: 36
11:54:38.129563 IP 192.168.36.2 > 192.168.36.130: OSPFv2, LS-Update (4), length: 88
11:54:38.131034 IP 192.168.36.130 > 192.168.36.2: OSPFv2, LS-Ack (5), length: 44
11:54:39.126067 IP 192.168.36.130 > 192.168.36.2: OSPFv2, LS-Request (3), length: 36
11:54:39.129339 IP 192.168.36.2 > 192.168.36.130: OSPFv2, LS-Update (4), length: 88
11:54:39.130246 IP 192.168.36.130 > 192.168.36.2: OSPFv2, LS-Ack (5), length: 44
11:54:40.125492 IP 192.168.36.2 > 224.0.0.5: OSPFv2, Hello (1), length: 48
11:54:40.126112 IP 192.168.36.130 > 224.0.0.5: OSPFv2, Hello (1), length: 48
11:54:40.126615 IP 192.168.36.130 > 192.168.36.2: OSPFv2, LS-Request (3), length: 36
11:54:40.127747 IP 192.168.36.2 > 192.168.36.130: OSPFv2, LS-Update (4), length: 88
11:54:40.129674 IP 192.168.36.130 > 192.168.36.2: OSPFv2, LS-Ack (5), length: 44
11:54:41.126490 IP 192.168.36.130 > 224.0.0.5: OSPFv2, Hello (1), length: 48
11:54:41.128826 IP 192.168.36.2 > 224.0.0.5: OSPFv2, Hello (1), length: 48
11:54:41.126068 IP 192.168.36.130 > 192.168.36.2: OSPFv2, LS-Request (3), length: 36
11:54:41.129284 IP 192.168.36.2 > 192.168.36.130: OSPFv2, LS-Update (4), length: 88
11:54:41.130056 IP 192.168.36.130 > 192.168.36.2: OSPFv2, LS-Ack (5), length: 44
...
3) first node (192.168.36.130) oscillating between loading/ptp and exstart/ptp
We can see, that first node also don't send hello packets.
tcpdump:
14:24:07.822578 IP 192.168.37.130 > 224.0.0.5: OSPFv2, Hello (1), length: 48
14:24:07.823062 IP 192.168.37.130 > 192.168.36.130: OSPFv2, Database Description (2), length: 32
14:24:07.823588 IP 192.168.36.130 > 192.168.37.130: OSPFv2, Database Description (2), length: 32
14:24:08.824388 IP 192.168.37.130 > 192.168.36.130: OSPFv2, Database Description (2), length: 32
14:24:08.825092 IP 192.168.36.130 > 192.168.37.130: OSPFv2, Database Description (2), length: 692
14:24:08.825660 IP 192.168.36.130 > 192.168.37.130: OSPFv2, LS-Request (3), length: 60
14:24:08.826215 IP 192.168.37.130 > 224.0.0.5: OSPFv2, Hello (1), length: 48
14:24:08.827254 IP 192.168.37.130 > 192.168.36.130: OSPFv2, Database Description (2), length: 692
14:24:08.829550 IP 192.168.36.130 > 192.168.37.130: OSPFv2, Database Description (2), length: 32
14:24:09.822583 IP 192.168.37.130 > 224.0.0.5: OSPFv2, Hello (1), length: 48
14:24:09.822828 IP 192.168.37.130 > 192.168.36.130: OSPFv2, Database Description (2), length: 32
14:24:09.823636 IP 192.168.36.130 > 192.168.37.130: OSPFv2, Database Description (2), length: 32
14:24:10.820982 IP 192.168.37.130 > 192.168.36.130: OSPFv2, Database Description (2), length: 32
14:24:10.821474 IP 192.168.37.130 > 224.0.0.5: OSPFv2, Hello (1), length: 48
14:24:10.821816 IP 192.168.36.130 > 192.168.37.130: OSPFv2, Database Description (2), length: 692
14:24:10.822213 IP 192.168.36.130 > 192.168.37.130: OSPFv2, LS-Request (3), length: 60
14:24:10.823820 IP 192.168.37.130 > 192.168.36.130: OSPFv2, Database Description (2), length: 692
14:24:10.824445 IP 192.168.36.130 > 192.168.37.130: OSPFv2, Database Description (2), length: 32
14:24:11.822592 IP 192.168.37.130 > 224.0.0.5: OSPFv2, Hello (1), length: 48
14:24:11.822836 IP 192.168.37.130 > 192.168.36.130: OSPFv2, Database Description (2), length: 32
14:24:11.823629 IP 192.168.36.130 > 192.168.37.130: OSPFv2, Database Description (2), length: 32
14:24:12.821025 IP 192.168.37.130 > 192.168.36.130: OSPFv2, Database Description (2), length: 32
I assume that the problem is in timer handling code, which breaks during
system time change (for example by agressive NTP daemon). When i manually
fiddled with system time, i often triggered the problem.
Here are two patches to fix this problem. First patch si probably better
but don't work on linux 2.4.x. Second patch should work on it.
--
Elen sila lumenn' omentielvo
Ondrej 'SanTiago' Zajicek (email: santiago at crfreenet.org)
OpenPGP encrypted e-mails preferred (KeyID 0x11DEADC3, wwwkeys.pgp.net)
"To err is human -- to blame it on a computer is even more so."
-------------- next part --------------
A non-text attachment was scrubbed...
Name: monotonic_clock1.patch
Type: text/x-diff
Size: 1428 bytes
Desc: not available
URL: <http://trubka.network.cz/pipermail/bird-users/attachments/20080821/b4c03f2b/attachment.patch>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: monotonic_clock2.patch
Type: text/x-diff
Size: 1535 bytes
Desc: not available
URL: <http://trubka.network.cz/pipermail/bird-users/attachments/20080821/b4c03f2b/attachment-0001.patch>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 197 bytes
Desc: Digital signature
URL: <http://trubka.network.cz/pipermail/bird-users/attachments/20080821/b4c03f2b/attachment.asc>
More information about the Bird-users
mailing list