BGP password behaviour

Dan Rimal danrimal at gmail.com
Sun Apr 28 17:48:32 CEST 2013


Hello,

today i have had very strange experience with bird. Few days ago (approx 
6 days) i have established bgp session with bgp password attribute. 
Since established this session, i have got messages in system log like this:

kernel: [2956936.985539] MD5 Hash failed for (195.81.xxx.yy, 
179)->(195.81.xxx.yy, 46595)
kernel: [2956936.985761] huh, entered softirq 3 NET_RX ffffffff814e7060 
preempt_count 00000100, exited with 00000101?

But peer was UP, routes was installed, everything seemed OK. I didn't 
notice this messages for few days althoug messages was added very often.

After few days, it changes to:

Apr 28 00:33:59 r1 kernel: [3432467.222576] BUG: scheduling while 
atomic: bird/6356/0x00000001
Apr 28 00:33:59 r1 kernel: [3432467.222656] Modules linked in: 
binfmt_misc ip_vs nf_conntrack libcrc32c 8021q garp stp llc dummy 
ipmi_devintf ipmi_si ipmi_msghandler mlx4_en mlx4_core joydev i2c_i801 
i2c_core iTCO_wdt iTCO_vendor_support ixgbe i7core_edac ioatdma mdio eda
c_core igb dca raid1 [last unloaded: scsi_wait_scan]
Apr 28 00:33:59 r1 kernel: [3432467.223323] Pid: 6356, comm: bird Not 
tainted 3.3.1-3.fc16.x86_64 #1
Apr 28 00:33:59 r1 kernel: [3432467.223423] Call Trace:
Apr 28 00:33:59 r1 kernel: [3432467.223510] [<ffffffff815eaec3>] 
__schedule_bug+0x68/0x6c
Apr 28 00:33:59 r1 kernel: [3432467.223582] [<ffffffff815f2b52>] 
__schedule+0x6f2/0x7b0
Apr 28 00:33:59 r1 kernel: [3432467.223658] [<ffffffff815f2f2f>] 
schedule+0x3f/0x60
Apr 28 00:33:59 r1 kernel: [3432467.223746] [<ffffffff815f20dc>] 
schedule_hrtimeout_range_clock+0x12c/0x170
Apr 28 00:33:59 r1 kernel: [3432467.223928] [<ffffffff8107da00>] ? 
update_rmtp+0x70/0x70
Apr 28 00:33:59 r1 kernel: [3432467.224052] [<ffffffff8107e724>] ? 
hrtimer_start_range_ns+0x14/0x20
Apr 28 00:33:59 r1 kernel: [3432467.224132] [<ffffffff815f2133>] 
schedule_hrtimeout_range+0x13/0x20
Apr 28 00:33:59 r1 kernel: [3432467.224251] [<ffffffff81193cc9>] 
poll_schedule_timeout+0x49/0x70
Apr 28 00:33:59 r1 kernel: [3432467.224367] [<ffffffff81194656>] 
do_select+0x586/0x6d0
Apr 28 00:33:59 r1 kernel: [3432467.224494] [<ffffffff8107f8bf>] ? 
up+0x2f/0x50
Apr 28 00:33:59 r1 kernel: [3432467.224583] [<ffffffff81193dd0>] ? 
poll_freewait+0xe0/0xe0
Apr 28 00:33:59 r1 kernel: [3432467.224663] [<ffffffff81193ec0>] ? 
__pollwait+0xf0/0xf0
Apr 28 00:33:59 r1 kernel: [3432467.224743] [<ffffffff81193ec0>] ? 
__pollwait+0xf0/0xf0
Apr 28 00:33:59 r1 kernel: [3432467.224813] [<ffffffff81193ec0>] ? 
__pollwait+0xf0/0xf0
Apr 28 00:33:59 r1 kernel: [3432467.224911] [<ffffffff81193ec0>] ? 
__pollwait+0xf0/0xf0
Apr 28 00:33:59 r1 kernel: [3432467.224991] [<ffffffff81193ec0>] ? 
__pollwait+0xf0/0xf0
Apr 28 00:33:59 r1 kernel: [3432467.225066] [<ffffffff81193ec0>] ? 
__pollwait+0xf0/0xf0
Apr 28 00:33:59 r1 kernel: [3432467.225136] [<ffffffff81193ec0>] ? 
__pollwait+0xf0/0xf0
Apr 28 00:33:59 r1 kernel: [3432467.225206] [<ffffffff81193ec0>] ? 
__pollwait+0xf0/0xf0
Apr 28 00:33:59 r1 kernel: [3432467.225311] [<ffffffff81193ec0>] ? 
__pollwait+0xf0/0xf0
Apr 28 00:33:59 r1 kernel: [3432467.225405] [<ffffffff8119498c>] 
core_sys_select+0x1ec/0x370
Apr 28 00:33:59 r1 kernel: [3432467.225491] [<ffffffff8101b6f9>] ? 
read_tsc+0x9/0x20
Apr 28 00:33:59 r1 kernel: [3432467.225567] [<ffffffff810a4edd>] ? 
ktime_get_ts+0xad/0xe0
Apr 28 00:33:59 r1 kernel: [3432467.225728] [<ffffffff81194bd0>] 
sys_select+0xc0/0x100
Apr 28 00:33:59 r1 kernel: [3432467.225866] [<ffffffff815fbfe9>] 
system_call_fastpath+0x16/0x1b




"MD5 hash failed" and "huh, entered.." message was completely replaced 
by BUG message with high frequency. After another three hours bird 
received signal HUP (logrotate) and it was deadly for bird, see two last 
lines:



Apr 28 03:32:01 r1 kernel: [3443133.541261] BUG: scheduling while 
atomic: bird/6356/0x00000001
Apr 28 03:32:01 r1 kernel: [3443133.541341] Modules linked in: 
binfmt_misc ip_vs nf_conntrack libcrc32c 8021q garp stp llc dummy 
ipmi_devintf ipmi_si ipmi_msghandler mlx4_en mlx4_core joydev i2c_i801 
i2c_core iTCO_wdt iTCO_vendor_support ixgbe i7core_edac ioatdma mdio eda
c_core igb dca raid1 [last unloaded: scsi_wait_scan]
Apr 28 03:32:01 r1 kernel: [3443133.541697] Pid: 6356, comm: bird Not 
tainted 3.3.1-3.fc16.x86_64 #1
Apr 28 03:32:01 r1 kernel: [3443133.541768] Call Trace:
Apr 28 03:32:01 r1 kernel: [3443133.541838] [<ffffffff815eaec3>] 
__schedule_bug+0x68/0x6c
Apr 28 03:32:01 r1 kernel: [3443133.541913] [<ffffffff815f2b52>] 
__schedule+0x6f2/0x7b0
Apr 28 03:32:01 r1 kernel: [3443133.542003] [<ffffffff815f2f2f>] 
schedule+0x3f/0x60
Apr 28 03:32:01 r1 kernel: [3443133.542081] [<ffffffff815f20dc>] 
schedule_hrtimeout_range_clock+0x12c/0x170
Apr 28 03:32:01 r1 kernel: [3443133.542224] [<ffffffff8107da00>] ? 
update_rmtp+0x70/0x70
Apr 28 03:32:01 r1 kernel: [3443133.542294] [<ffffffff8107e724>] ? 
hrtimer_start_range_ns+0x14/0x20
Apr 28 03:32:01 r1 kernel: [3443133.542367] [<ffffffff815f2133>] 
schedule_hrtimeout_range+0x13/0x20
Apr 28 03:32:01 r1 kernel: [3443133.542441] [<ffffffff81193cc9>] 
poll_schedule_timeout+0x49/0x70
Apr 28 03:32:01 r1 kernel: [3443133.542534] [<ffffffff81194656>] 
do_select+0x586/0x6d0
Apr 28 03:32:01 r1 kernel: [3443133.542608] [<ffffffff8151f330>] ? 
ip_forward_options+0x1f0/0x1f0
Apr 28 03:32:01 r1 kernel: [3443133.542681] [<ffffffff81193dd0>] ? 
poll_freewait+0xe0/0xe0
Apr 28 03:32:01 r1 kernel: [3443133.542754] [<ffffffff81193ec0>] ? 
__pollwait+0xf0/0xf0
Apr 28 03:32:01 r1 kernel: [3443133.542826] [<ffffffff81193ec0>] ? 
__pollwait+0xf0/0xf0
Apr 28 03:32:01 r1 kernel: [3443133.542897] [<ffffffff81193ec0>] ? 
__pollwait+0xf0/0xf0
Apr 28 03:32:01 r1 kernel: [3443133.542986] [<ffffffff81193ec0>] ? 
__pollwait+0xf0/0xf0
Apr 28 03:32:01 r1 kernel: [3443133.543064] [<ffffffff81193ec0>] ? 
__pollwait+0xf0/0xf0
Apr 28 03:32:01 r1 kernel: [3443133.543134] [<ffffffff81193ec0>] ? 
__pollwait+0xf0/0xf0
Apr 28 03:32:01 r1 kernel: [3443133.543201] [<ffffffff81193ec0>] ? 
__pollwait+0xf0/0xf0
Apr 28 03:32:01 r1 kernel: [3443133.543285] [<ffffffff81193ec0>] ? 
__pollwait+0xf0/0xf0
Apr 28 03:32:01 r1 kernel: [3443133.543354] [<ffffffff81193ec0>] ? 
__pollwait+0xf0/0xf0
Apr 28 03:32:01 r1 kernel: [3443133.543431] [<ffffffff8119498c>] 
core_sys_select+0x1ec/0x370
Apr 28 03:32:01 r1 kernel: [3443133.543503] [<ffffffff811be1a4>] ? 
fsnotify+0x1e4/0x290
Apr 28 03:32:01 r1 kernel: [3443133.543579] [<ffffffff8101b6f9>] ? 
read_tsc+0x9/0x20
Apr 28 03:32:01 r1 kernel: [3443133.543650] [<ffffffff810a4edd>] ? 
ktime_get_ts+0xad/0xe0
Apr 28 03:32:01 r1 kernel: [3443133.543728] [<ffffffff81194bd0>] 
sys_select+0xc0/0x100
Apr 28 03:32:01 r1 kernel: [3443133.543797] [<ffffffff815fbfe9>] 
system_call_fastpath+0x16/0x1b
Apr 28 03:32:01 r1 kernel: [3443133.620802] bird[6356]: segfault at 
7fb756e54000 ip 00007fb7566f0319 sp 00007fffaf69c350 error 6
Apr 28 03:32:01 r1 kernel: [3443133.620970] note: bird[6356] exited with 
preempt_count 1



I don't know exactly if this bug is kernel related or bird related, i 
don't know if BUG message is related to his precedesor "MD5 hash failed" 
message. I read a little bit about "BUG: scheduling while atomic".. 
always was related to kernel code, not userspace.. it is a bit 
confusing. After that, i have disabled md5 auth from affected BGP 
session and messages are completely gone.

Environment: Fedora 16 / 3.3.1-3.fc16.x86_64, BIRD 1.3.9 with this 
patch: http://marc.info/?l=bird-users&m=136172837801057&w=2



Regards,

Daniel



More information about the Bird-users mailing list