more understanding of the bird hang on ifdown/ifup

Florian Lohoff flo at rfc822.org
Wed Jun 20 12:24:02 CEST 2007


Hi,
i debugged further and build a version with debugging symbols and
put an assert into the code where we detect the out of sequence
netlink message. My understanding is that we do an krt_if_scan on a
regular basis - Now we go through the notification chain and end up
sending another netlink message before krt_if_scan pulled all netlink
messages. nl_get_reply gets an out-of-sequence netlink message and drops
it although nl_get_scan would need it for an end-of-messages marker. Thus
returning from our notification chain nl_get_scan ends up polling for the
next message which already got removed via nl_send_route -> nl_exchange ->
nl_get_reply as an out-of-sequence. As the netlink FD is blocking we
deadlock here.

One solution i see would be to convert all nl_get_scan users to first
poll ALL messages before starting to process them which could mean a lot 
of memory usage especially on full routing BGP where we need to poll all
routes first. 

Another solution would be to make some generic polling/callback based
approach, or probably replacing the netlink.c with an API wrapper around
libnl (http://people.suug.ch/~tgr/libnl/)

(gdb) bt
#0  0xb7e3e947 in raise () from /lib/tls/libc.so.6
#1  0xb7e400c9 in abort () from /lib/tls/libc.so.6
#2  0xb7e3805f in __assert_fail () from /lib/tls/libc.so.6
#3  0x08078a09 in nl_get_reply () at netlink.c:135
#4  0x08078b0d in nl_exchange (pkt=0xbfe3f2b4) at netlink.c:193
#5  0x08079562 in nl_send_route (p=0x80905a0, e=0x809921c, new=0) at netlink.c:542
#6  0x080795d2 in krt_set_notify (p=0x80905a0, n=0x80981c4, new=0x0, old=0x809921c)
    at netlink.c:561
#7  0x0807614e in krt_notify (P=0x80905a0, net=0x80981c4, new=0x0, old=0x809921c,
    attrs=0x0) at krt.c:698
#8  0x08049910 in do_rte_announce (a=0x8095e40, net=0x80981c4, new=0x80991cc,
    old=0x809921c, tmpa=0x0, class=4100)
    at /home/flo/p/root/bird-1.0.11/./nest/rt-table.c:227
#9  0x08049521 in rte_announce (tab=0x808f4b0, net=0x80981c4, new=0x80991cc,
    old=0x809921c, tmpa=0x0) at /home/flo/p/root/bird-1.0.11/./nest/rt-table.c:261
#10 0x08049b94 in rte_recalculate (table=0x808f4b0, net=0x80981c4, p=0x80906c8,
    new=0x80991cc, tmpa=0x0) at /home/flo/p/root/bird-1.0.11/./nest/rt-table.c:368
#11 0x08049ff1 in rte_update (table=0x808f4b0, net=0x80981c4, p=0x80906c8,
    new=0x80991cc) at /home/flo/p/root/bird-1.0.11/./nest/rt-table.c:514
#12 0x0804fd41 in dev_ifa_notify (p=0x80906c8, c=1, ad=0x8095c80)
    at /home/flo/p/root/bird-1.0.11/./nest/rt-dev.c:69
#13 0x0804e962 in ifa_send_notify (p=0x80906c8, c=1, a=0x8095c80)
    at /home/flo/p/root/bird-1.0.11/./nest/iface.c:148
#14 0x0804e87e in ifa_notify_change (c=1, a=0x8095c80)
    at /home/flo/p/root/bird-1.0.11/./nest/iface.c:159
#15 0x0804ea7d in if_notify_change (c=1, i=0x8095b38)
    at /home/flo/p/root/bird-1.0.11/./nest/iface.c:211
#16 0x0804ed24 in if_update (new=0xbfe3f5ec)
    at /home/flo/p/root/bird-1.0.11/./nest/iface.c:280
#17 0x08078f5c in nl_parse_link (h=0x80919b8, scan=1) at netlink.c:334
#18 0x080792a1 in krt_if_scan (p=0x8090778) at netlink.c:445
#19 0x08074fd5 in kif_scan (t=0x8095af8) at krt.c:94
#20 0x08075004 in kif_force_scan () at krt.c:102
#21 0x08076018 in krt_scan (t=0x8095a48) at krt.c:655
#22 0x08073005 in tm_shot () at io.c:298
#23 0x08074886 in io_loop () at io.c:1126
#24 0x080774ef in main (argc=Cannot access memory at address 0x67ec
) at main.c:462 


-- 
Florian Lohoff                  flo at rfc822.org             +49-171-2280134
	Those who would give up a little freedom to get a little 
          security shall soon have neither - Benjamin Franklin
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
URL: <http://trubka.network.cz/pipermail/bird-users/attachments/20070620/f0fa2c4b/attachment.asc>


More information about the Bird-users mailing list