Socket error: TCP_MD5SIG: Cannot allocate memory
Brian Rak
brak at gameservers.com
Tue Aug 25 22:48:24 CEST 2015
> With current default optmem_max values, this allows about 150 keys on
64bit arches, and 88 keys on 32bit arches.
I think you just solved it for me at least... I was seeing issues after
~147 BGP sessions, so running into this limit would make perfect sense.
I could reproduce it at will just by restarting BIRD, so my issue at
least wasn't any sort of leak.
On 8/25/2015 4:45 PM, Michael Vallaly wrote:
> For context on my end; this issue was experienced on physical hardware
> (64bit) with Intel 1Gbit NICs (no offloading).
>
> We only noticed this after some length of time, (> 180 days) during
> which we likely had < 40 BGP session flaps on our end via Bird.
>
> optmem_max: Maximum ancillary buffer size allowed per socket. Ancillary
> data is a sequence of struct cmsghdr structures with appended data. The
> default size is 10240 bytes.
>
> According to Eric Dumazet back in 2012 [1]:
>
> <snip>
> There is no limit on number of MD5 keys an application can attach to a
> tcp socket.
>
> This patch adds a per tcp socket limit based
> on /proc/sys/net/core/optmem_max
>
> With current default optmem_max values, this allows about 150 keys on
> 64bit arches, and 88 keys on 32bit arches.
> </snip>
>
> Maybe we are getting multiple/duplicate MD5 keys assigned to the TCP
> session somehow?
>
> -Mike
>
> [1] https://patchwork.ozlabs.org/patch/138861/
>
> On Tue, 25 Aug 2015 15:48:44 -0400
> Brian Rak <brak at gameservers.com> wrote:
>
>> I haven't tried the optmem_max option, but I did some more experimenting..
>>
>> We have a virtual machine running a nearly identical BIRD config that's
>> not showing this issue.
>>
>> The machine with the issue is physical, and has a Mellanox ConnectX
>> NIC. I'm wondering if there's some limitation with TCP offload there
>> that's responsible. Disabling TCP offload didn't seem to help though.
>>
>> On 8/24/2015 4:59 PM, Michael Vallaly wrote:
>>> I saw this problem back in 2013 on Bird 1.3.6 and 3.6+ kernels..
>>> (Re: Strange MD5 Auth problem in BIRD 1.3.8)
>>>
>>> AFAIK it was related to kernel socket option memory (or lack there of)
>>> and I can only surmise it was related to some sort of memory leak.
>>> Ondrej Zajicek seemed to think this was an issue in the kernel itself,
>>> but I wasn't able to prove that definitively.
>>>
>>> I was able to work around it (without rebooting) by:
>>>
>>> <snip>
>>> echo 40960 > /proc/sys/net/core/optmem_max # Defaults to 20480
>>> </snip>
>>>
>>> Which seemed to have deferred the issue, long enough for us to reboot /
>>> not run into it constantly.
>>>
>>> If anyone else has any details or info, I would still be interested in
>>> the root-cause analysis and hopefully permanent fix.
>>>
>>> -Mike
>>>
>>> On Mon, 24 Aug 2015 15:59:06 -0400
>>> Brian Rak <brak at gameservers.com> wrote:
>>>
>>>> I have a machine running BIRD 1.4.5, and I'm seeing a lot of these
>>>> messages when I start it up:
>>>>
>>>> 2015-08-24 15:54:26 <ERR> xxxx: Socket error: TCP_MD5SIG: Cannot
>>>> allocate memory
>>>> 2015-08-24 15:54:26 <ERR> yyyy: Socket error: TCP_MD5SIG: Cannot
>>>> allocate memory
>>>>
>>>> It also seems like the sessions that report that error do not come up,
>>>> and show a status of 'Error: Kernel MD5 auth failed'.
>>>>
>>>> I'm only trying to configure around 200 BGP sessions here, most of which
>>>> are advertising a very small number of prefixes.
>>>>
>>>> I don't really see any tunable settings here, any suggestions as to how
>>>> I can correct this?
>
More information about the Bird-users
mailing list