BIRD memory usage
Pavel Tvrdík
pavel.tvrdik at nic.cz
Mon Sep 12 09:16:57 CEST 2016
Hi, Justin.
On 2016-09-09 10:45, Justin Cattle wrote:
> Hi Pavel,
>
> This is looking good for us :)
> It's been in the lab for 3 days across 25 hosts, and memory usage
> looks absolutely static after process start.
>
> We have a couple of canary hosts in production too, and they are
> showing the same results.
>
> Previous to installing the patched version , the process on this host
> was using about 17g of Virt Mem - now at about 80Mg, which is a nice
> optimisation ;-)
Good!
> I am planning to roll this out over production next week if possible.
> I can report back in a few weeks if you like, but it certainly seems
> like this is resolved.
A colleague Ondra Zajicek noticed me that the solution could lead to
reading from freed memory. I made a fixup commit (d9c6d180) at the top
of branch krt-export-filtr-fix. Please apply the commit too.
https://gitlab.labs.nic.cz/labs/bird/commit/d9c6d180e41c7246ccbde8ae4d828d87daa12cf4
It should fix the bug in the better way.
>
> Thanks again for your help on this - we really appreciate it :)
>
You're welcome :)
Cheers,
Pavel
> Cheers,
> Just
> On 7 September 2016 at 09:08, Pavel Tvrdík <pavel.tvrdik at nic.cz>
> wrote:
>
>> Hi, Just.
>>
>> On 2016-09-06 22:50, Justin Cattle wrote:
>>
>>> I found some time to package using a patch to the latest 1.6.0
>>> release, created from a diff of origin/krt-export-filtr-fix
>>> against
>>> v1.6.0-34-g768d013 [ seems to be the top three commits ].
>>
>> Yes, the top three commits, exactly!
>>
>>> I hope that's valid. That patch applied without issue, and I
>>> wrapped
>>> it into a debian patch.
>>>
>>> I've installed on a few hosts, and I'll report back tomorrow if I
>>> get
>>> a chance.
>>
>> Great!
>>
>>> Thanks again for the speedy code :)
>>>
>>> Here's my debian package patch for reference:
>>>
>>> cat bird-1.6.0/debian/patches/001-krt-export-filtr-fix.patch
>>> filter/tree: prefer xmalloc/xfree to malloc/free
>>> rt-table: fix kernel protocol export filter memory bug
>>> Index: bird-1.6.0/filter/tree.c
>>>
>> ===================================================================
>>> --- bird-1.6.0.orig/filter/tree.c 2013-11-23 12:29:53.000000000
>>> +0000
>>> +++ bird-1.6.0/filter/tree.c 2016-09-06 21:30:15.435090279 +0100
>>> @@ -82,7 +82,7 @@
>>> if (len <= 1024)
>>> buf = alloca(len * sizeof(struct f_tree *));
>>> else
>>> - buf = malloc(len * sizeof(struct f_tree *));
>>> + buf = xmalloc(len * sizeof(struct f_tree *));
>>>
>>> /* Convert a degenerated tree into an sorted array */
>>> i = 0;
>>> @@ -94,7 +94,7 @@
>>> root = build_tree_rec(buf, 0, len);
>>>
>>> if (len > 1024)
>>> - free(buf);
>>> + xfree(buf);
>>>
>>> return root;
>>> }
>>> Index: bird-1.6.0/nest/rt-table.c
>>>
>> ===================================================================
>>> --- bird-1.6.0.orig/nest/rt-table.c 2016-04-29 10:13:23.000000000
>>> +0100
>>> +++ bird-1.6.0/nest/rt-table.c 2016-09-06 21:30:15.435090279 +0100
>>> @@ -60,6 +60,21 @@
>>> static inline void rt_schedule_prune(rtable *tab);
>>>
>>> +static int rte_update_nest_cnt; /* Nesting counter to allow
>>> recursive
>>> updates */
>>> +
>>> +static inline void
>>> +rte_update_lock(void)
>>> +{
>>> + rte_update_nest_cnt++;
>>> +}
>>> +
>>> +static inline void
>>> +rte_update_unlock(void)
>>> +{
>>> + if (!--rte_update_nest_cnt)
>>> + lp_flush(rte_update_pool);
>>> +}
>>> +
>>> static inline struct ea_list *
>>> make_tmp_attrs(struct rte *rt, struct linpool *pool)
>>> {
>>> @@ -609,10 +624,18 @@
>>> if (!rte_is_valid(best0))
>>> return NULL;
>>>
>>> + /* This non-static function could be called from outside
>>> rt-table.c
>>> file and
>>> + * we need to ensure that a temporary allocated linpool memory
>>> @rte_update_pool
>>> + * will be freed */
>>> + rte_update_lock();
>>> +
>>> best = export_filter(ah, best0, rt_free, tmpa, silent);
>>>
>>> if (!best || !rte_is_reachable(best))
>>> + {
>>> + rte_update_unlock();
>>> return best;
>>> + }
>>>
>>> for (rt0 = best0->next; rt0; rt0 = rt0->next)
>>> {
>>> @@ -646,6 +669,8 @@
>>> if (best != best0)
>>> *rt_free = best;
>>>
>>> + rte_update_unlock();
>>> +
>>> return best;
>>> }
>>>
>>> @@ -1097,21 +1122,6 @@
>>> rte_free_quick(old);
>>> }
>>>
>>> -static int rte_update_nest_cnt; /* Nesting counter to allow
>>> recursive
>>> updates */
>>> -
>>> -static inline void
>>> -rte_update_lock(void)
>>> -{
>>> - rte_update_nest_cnt++;
>>> -}
>>> -
>>> -static inline void
>>> -rte_update_unlock(void)
>>> -{
>>> - if (!--rte_update_nest_cnt)
>>> - lp_flush(rte_update_pool);
>>> -}
>>> -
>>> static inline void
>>> rte_hide_dummy_routes(net *net, rte **dummy)
>>> {
>>
>> Looks fine :)
>>
>> Cheers,
>> Just
>> On 6 September 2016 at 18:03, Justin Cattle <j at ocado.com> wrote:
>>
>> Hi Pavel,
>>
>> Thanks for quick response! I will try that as soon as I can,
>> hopefully in the next couple of days.
>> I'll report back as soon as I know.
>>
>> Cheers,
>> Just
>>
>> On 6 September 2016 at 16:46, Pavel Tvrdík <pavel.tvrdik at nic.cz>
>> wrote:
>> Hi Justin,
>>
>> On 2016-09-05 16:21, Justin Cattle wrote:
>> Hi,
>>
>> A colleague of mine reported a memory usage issue with the bird
>> daemon
>> last year, which resulted in a request for a core dump, but we never
>> followed it up.
>> I'd like to re-open this discussion and see if anything can be done
>> to
>> fix it.
>>
>> I'll provide some information regarding a production environment,
>> where the problem is most obvious. But any further details and
>> diagnostics will have to come from our lab environment.
>> Please note, in production we mostly run 1.5, but in the lab we are
>> on
>> 1.6, however we see the same symptoms in both environments on both
>> versions.
>>
>> The symptoms are twofold, but potentially related - greater than
>> expected memory usage reported by the bird daemon itself for the
>> number of routes, but also massively more memory actually used by
>> the
>> daemon process.
>>
>> When the process is started, we see "normal" memory usage, which
>> then
>> seems to grow indefinitely in distinct steps, separated by a period
>> of
>> a few hours.
>>
>> In production, this consumes most of the 32G of memory until the
>> kernel oom-killer to intervenes.
>>
>> Production:
>>
>> BIRD 1.5.0 ready.
>>
>> bird> show memory
>>
>> BIRD memory usage
>>
>> Routing tables: 1405 MB
>>
>> Route attributes: 84 kB
>>
>> ROA tables: 192 B
>>
>> Protocols: 45 kB
>>
>> Total: 1405 MB
>>
>> bird> show route count
>>
>> 2273 of 2273 routes for 1142 networks
>>
>> # ps u -p 3441
>>
>> USER PID %CPU %MEM VSZ RSS TTY STAT START TIME
>> COMMAND
>>
>> bird 3441 0.1 55.4 18275124 18241540 ? Ssl Aug10 73:39
>> /usr/sbin/bird -f -u bird -g bird
>>
>> ..so that's ~1.4G reported by bird, and ~18G actually consumed by
>> the
>> process.
>>
>> Lab:
>>
>> BIRD 1.6.0 ready.
>>
>> bird> show mem
>>
>> BIRD memory usage
>>
>> Routing tables: 693 MB
>>
>> Route attributes: 28 kB
>>
>> ROA tables: 192 B
>>
>> Protocols: 41 kB
>>
>> Total: 693 MB
>>
>> bird> show route count
>>
>> 175 of 175 routes for 91 networks
>>
>> # ps u -p 29085
>>
>> USER PID %CPU %MEM VSZ RSS TTY STAT START TIME
>> COMMAND
>>
>> bird 29085 0.0 14.9 4994852 4915032 ? Ssl Aug05 19:41
>> /usr/sbin/bird -f -u bird -g bird
>>
>> Thanks for this report. I successfully simulated this weird
>> behavior
>> too. The setting of kernel protocol with some export filter will
>> cause
>> memory leak bug. I prepared fixing commits in branch
>> `krt-export-filtr-fix'
>>
>> https://gitlab.labs.nic.cz/labs/bird/commits/krt-export-filtr-fix
>> [1] [1]
>>
>> Can you please download it and confirm, that the bug is fixed?
>>
>> Best,
>> Pavel
>>
>> ..so that's ~ 0.7G reported by bird, and ~5G actually consumed by
>> the
>> process.
>>
>> I also attached the bird config from the lab.
>>
>> Any help is much appreciated!
>> Thanks.
>>
>> Cheers,
>> Just
>> Notice: This email is confidential and may contain copyright
>> material
>> of members of the Ocado Group. Opinions and views expressed in this
>> message may not necessarily reflect the opinions and views of the
>> members of the Ocado Group.
>>
>> If you are not the intended recipient, please notify us immediately
>> and delete all copies of this message. Please note that it is your
>> responsibility to scan this message for viruses.
>>
>> Fetch and Sizzle are trading names of Speciality Stores Limited and
>> Fabled is a trading name of Marie Claire Beauty Limited, both
>> members
>> of the Ocado Group.
>>
>> References to the “Ocado Group” are to Ocado Group plc
>> (registered
>> in England and Wales with number 7098618) and its subsidiary
>> undertakings (as that expression is defined in the Companies Act
>> 2006)
>> from time to time. The registered office of Ocado Group plc is
>> Titan
>> Court, 3 Bishops Square, Hatfield Business Park, Hatfield, Herts.
>> AL10
>> 9NE.
>>
>> Notice: This email is confidential and may contain copyright
>> material
>> of members of the Ocado Group. Opinions and views expressed in this
>> message may not necessarily reflect the opinions and views of the
>> members of the Ocado Group.
>>
>> If you are not the intended recipient, please notify us immediately
>> and delete all copies of this message. Please note that it is your
>> responsibility to scan this message for viruses.
>>
>> Fetch and Sizzle are trading names of Speciality Stores Limited and
>> Fabled is a trading name of Marie Claire Beauty Limited, both
>> members
>> of the Ocado Group.
>>
>> References to the “Ocado Group” are to Ocado Group plc
>> (registered
>> in England and Wales with number 7098618) and its subsidiary
>> undertakings (as that expression is defined in the Companies Act
>> 2006)
>> from time to time. The registered office of Ocado Group plc is
>> Titan
>> Court, 3 Bishops Square, Hatfield Business Park, Hatfield, Herts.
>> AL10
>> 9NE.
>>
>> Links:
>> ------
>> [1]
>> https://gitlab.labs.nic.cz/labs/bird/commits/krt-export-filtr-fix
>> [1]
>
> Notice: This email is confidential and may contain copyright material
> of members of the Ocado Group. Opinions and views expressed in this
> message may not necessarily reflect the opinions and views of the
> members of the Ocado Group.
>
> If you are not the intended recipient, please notify us immediately
> and delete all copies of this message. Please note that it is your
> responsibility to scan this message for viruses.
>
> Fetch and Sizzle are trading names of Speciality Stores Limited and
> Fabled is a trading name of Marie Claire Beauty Limited, both members
> of the Ocado Group.
>
> References to the “Ocado Group” are to Ocado Group plc (registered
> in England and Wales with number 7098618) and its subsidiary
> undertakings (as that expression is defined in the Companies Act 2006)
> from time to time. The registered office of Ocado Group plc is Titan
> Court, 3 Bishops Square, Hatfield Business Park, Hatfield, Herts. AL10
> 9NE.
>
> Links:
> ------
> [1] https://gitlab.labs.nic.cz/labs/bird/commits/krt-export-filtr-fix
More information about the Bird-users
mailing list