Socket error: TCP_MD5SIG: Cannot allocate memory

Michael Vallaly bird at nolatency.com
Tue Aug 25 22:45:47 CEST 2015


For context on my end; this issue was experienced on physical hardware
(64bit) with Intel 1Gbit NICs (no offloading).

We only noticed this after some length of time, (> 180 days) during
which we likely had < 40 BGP session flaps on our end via Bird. 

optmem_max: Maximum ancillary buffer size allowed per socket. Ancillary
data is a sequence of struct cmsghdr structures with appended data. The
default size is 10240 bytes.

According to Eric Dumazet back in 2012 [1]: 

<snip>
There is no limit on number of MD5 keys an application can attach to a
tcp socket.

This patch adds a per tcp socket limit based
on /proc/sys/net/core/optmem_max

With current default optmem_max values, this allows about 150 keys on
64bit arches, and 88 keys on 32bit arches.
</snip>

Maybe we are getting multiple/duplicate MD5 keys assigned to the TCP
session somehow?  

-Mike

[1] https://patchwork.ozlabs.org/patch/138861/

On Tue, 25 Aug 2015 15:48:44 -0400
Brian Rak <brak at gameservers.com> wrote:

> I haven't tried the optmem_max option, but I did some more experimenting..
> 
> We have a virtual machine running a nearly identical BIRD config that's 
> not showing this issue.
> 
> The machine with the issue is physical, and has a Mellanox ConnectX 
> NIC.  I'm wondering if there's some limitation with TCP offload there 
> that's responsible.  Disabling TCP offload didn't seem to help though.
> 
> On 8/24/2015 4:59 PM, Michael Vallaly wrote:
> > I saw this problem back in 2013 on Bird 1.3.6 and 3.6+ kernels..
> > (Re: Strange MD5 Auth problem in BIRD 1.3.8)
> >
> > AFAIK it was related to kernel socket option memory (or lack there of)
> > and I can only surmise it was related to some sort of memory leak.
> > Ondrej Zajicek seemed to think this was an issue in the kernel itself,
> > but I wasn't able to prove that definitively.
> >
> > I was able to work around it (without rebooting) by:
> >
> > <snip>
> > echo 40960 > /proc/sys/net/core/optmem_max  # Defaults to 20480
> > </snip>
> >
> > Which seemed to have deferred the issue, long enough for us to reboot /
> > not run into it constantly.
> >
> > If anyone else has any details or info, I would still be interested in
> > the root-cause analysis and hopefully permanent fix.
> >
> > -Mike
> >
> > On Mon, 24 Aug 2015 15:59:06 -0400
> > Brian Rak <brak at gameservers.com> wrote:
> >
> >> I have a machine running BIRD 1.4.5, and I'm seeing a lot of these
> >> messages when I start it up:
> >>
> >> 2015-08-24 15:54:26 <ERR> xxxx: Socket error: TCP_MD5SIG: Cannot
> >> allocate memory
> >> 2015-08-24 15:54:26 <ERR> yyyy: Socket error: TCP_MD5SIG: Cannot
> >> allocate memory
> >>
> >> It also seems like the sessions that report that error do not come up,
> >> and show a status of 'Error: Kernel MD5 auth failed'.
> >>
> >> I'm only trying to configure around 200 BGP sessions here, most of which
> >> are advertising a very small number of prefixes.
> >>
> >> I don't really see any tunable settings here, any suggestions as to how
> >> I can correct this?
> >
> 


-- 
Michael Vallaly <mvallaly at nolatency.com>


More information about the Bird-users mailing list