Strange MD5 Auth problem in BIRD 1.3.8
Michael Vallaly
bird at nolatency.com
Thu Oct 10 23:49:41 CEST 2013
Hi,
I recently had an interesting problem surrounding socket option buffers
and its use in Bird on Linux 3.6 which I hope someone could shed some
light on.
Quite frequently see the following in our logs when enabling/starting
BGP sessions when configured to use MD5 auth.
<snip>
Sep 24 23:12:46 rtr2 bird: sk_set_md5_auth_int: setsockopt: No such
file or directory
</snip>
These never seem to cause any functionality problems, but seemed
strange / maybe related to my new ongoing issue. ;)
## My functionality impacting problem ##
Yesterday after some upstream BGP peers has connectivity issues (Hold
timer expired), all of my previously working BGP sessions (using MD5
auth) attempted to reconnect and gave me the following in the logs:
<snip>
Oct 9 18:02:08 rtr2 bird: plxhq: Error: Hold timer expired
Oct 9 18:02:08 rtr2 bird: plxhq: BGP session closed
Oct 9 18:02:08 rtr2 bird: plxhq: State changed to flush
Oct 9 18:02:08 rtr2 bird: plxhq: State changed to stop
Oct 9 18:02:08 rtr2 bird: sk_set_md5_auth_int: setsockopt: No such
file or directory Oct 9 18:02:08 rtr2 bird: plxhq: Down
Oct 9 18:02:08 rtr2 bird: plxhq: Starting
Oct 9 18:02:08 rtr2 bird: sk_set_md5_auth_int: setsockopt: Cannot
allocate memory
</snip>
At which point the BGP session fails to establish/start, and all
subsequent BGP sessions that are started (with MD5 Auth) also fail
with the same message.
Looking through the bird code it seems bird issues some socket control
messages to update the TCP socket with MD5 parameters.
And after digging around in the Linux system it seems I was
running out of socket option memory buffers (duh!).
Thusly I was able to "fix" this by issuing:
<snip>
echo 40960 > /proc/sys/net/core/optmem_max # Defaults to 20480
</snip>
Is this expected? Any insight on how to properly size the socket option
memory buffers used by bird? Is this some sort of a socket buffer leak?
<snip>
bird> show memory
BIRD memory usage
Routing tables: 307 MB
Route attributes: 106 MB
ROA tables: 192 B
Protocols: 388 kB
Total: 413 MB
$ uptime
16:45:03 up 343 days, 1:23, 1 user, load average: 0.00, 0.03, 0.05
</snip>
I have multiple identical machines running the same
os/software/configuration and so far only one of
them has shown this behavior.
Thanks!
-Mike
--
Michael Vallaly <mvallaly at nolatency.com>
More information about the Bird-users
mailing list