BIRD memory usage

Pavel Tvrdík pavel.tvrdik at nic.cz
Wed Sep 7 10:08:06 CEST 2016


Hi, Just.

On 2016-09-06 22:50, Justin Cattle wrote:
> I found some time to package using a patch to the latest 1.6.0
> release, created from a diff of origin/krt-export-filtr-fix against
> v1.6.0-34-g768d013  [ seems to be the top three commits ].

Yes, the top three commits, exactly!

> I hope that's valid.  That patch applied without issue, and I wrapped
> it into a debian patch.
> 
> I've installed on a few hosts, and I'll report back tomorrow if I get
> a chance.

Great!

> 
> Thanks again for the speedy code :)
> 
> Here's my debian package patch for reference:
> 
> cat bird-1.6.0/debian/patches/001-krt-export-filtr-fix.patch
> filter/tree: prefer xmalloc/xfree to malloc/free
> rt-table: fix kernel protocol export filter memory bug
> Index: bird-1.6.0/filter/tree.c
> ===================================================================
> --- bird-1.6.0.orig/filter/tree.c 2013-11-23 12:29:53.000000000 +0000
> +++ bird-1.6.0/filter/tree.c 2016-09-06 21:30:15.435090279 +0100
> @@ -82,7 +82,7 @@
>    if (len <= 1024)
>      buf = alloca(len * sizeof(struct f_tree *));
>    else
> -    buf = malloc(len * sizeof(struct f_tree *));
> +    buf = xmalloc(len * sizeof(struct f_tree *));
> 
>    /* Convert a degenerated tree into an sorted array */
>    i = 0;
> @@ -94,7 +94,7 @@
>    root = build_tree_rec(buf, 0, len);
> 
>    if (len > 1024)
> -    free(buf);
> +    xfree(buf);
> 
>    return root;
>  }
> Index: bird-1.6.0/nest/rt-table.c
> ===================================================================
> --- bird-1.6.0.orig/nest/rt-table.c 2016-04-29 10:13:23.000000000
> +0100
> +++ bird-1.6.0/nest/rt-table.c 2016-09-06 21:30:15.435090279 +0100
> @@ -60,6 +60,21 @@
>  static inline void rt_schedule_prune(rtable *tab);
> 
> +static int rte_update_nest_cnt; /* Nesting counter to allow recursive
> updates */
> +
> +static inline void
> +rte_update_lock(void)
> +{
> +  rte_update_nest_cnt++;
> +}
> +
> +static inline void
> +rte_update_unlock(void)
> +{
> +  if (!--rte_update_nest_cnt)
> +    lp_flush(rte_update_pool);
> +}
> +
>  static inline struct ea_list *
>  make_tmp_attrs(struct rte *rt, struct linpool *pool)
>  {
> @@ -609,10 +624,18 @@
>    if (!rte_is_valid(best0))
>      return NULL;
> 
> +  /* This non-static function could be called from outside rt-table.c
> file and
> +   * we need to ensure that a temporary allocated linpool memory
> @rte_update_pool
> +   * will be freed */
> +  rte_update_lock();
> +
>    best = export_filter(ah, best0, rt_free, tmpa, silent);
> 
>    if (!best || !rte_is_reachable(best))
> +  {
> +    rte_update_unlock();
>      return best;
> +  }
> 
>    for (rt0 = best0->next; rt0; rt0 = rt0->next)
>    {
> @@ -646,6 +669,8 @@
>    if (best != best0)
>      *rt_free = best;
> 
> +  rte_update_unlock();
> +
>    return best;
>  }
> 
> @@ -1097,21 +1122,6 @@
>      rte_free_quick(old);
>  }
> 
> -static int rte_update_nest_cnt; /* Nesting counter to allow recursive
> updates */
> -
> -static inline void
> -rte_update_lock(void)
> -{
> -  rte_update_nest_cnt++;
> -}
> -
> -static inline void
> -rte_update_unlock(void)
> -{
> -  if (!--rte_update_nest_cnt)
> -    lp_flush(rte_update_pool);
> -}
> -
>  static inline void
>  rte_hide_dummy_routes(net *net, rte **dummy)
>  {

Looks fine :)

> 
> Cheers,
> Just
> On 6 September 2016 at 18:03, Justin Cattle <j at ocado.com> wrote:
> 
>> Hi Pavel,
>> 
>> Thanks for quick response! I will try that as soon as I can,
>> hopefully in the next couple of days.
>> I'll report back as soon as I know.
>> 
>> Cheers,
>> Just
>> 
>> On 6 September 2016 at 16:46, Pavel Tvrdík <pavel.tvrdik at nic.cz>
>> wrote:
>> Hi Justin,
>> 
>> On 2016-09-05 16:21, Justin Cattle wrote:
>> Hi,
>> 
>> A colleague of mine reported a memory usage issue with the bird
>> daemon
>> last year, which resulted in a request for a core dump, but we never
>> followed it up.
>> I'd like to re-open this discussion and see if anything can be done
>> to
>> fix it.
>> 
>> I'll provide some information regarding a production environment,
>> where the problem is most obvious.  But any further details and
>> diagnostics will have to come from our lab environment.
>> Please note, in production we mostly run 1.5, but in the lab we are
>> on
>> 1.6, however we see the same symptoms in both environments on both
>> versions.
>> 
>> The symptoms are twofold, but potentially related -  greater than
>> expected memory usage reported by the bird daemon itself for the
>> number of routes, but also massively more memory actually used by
>> the
>> daemon process.
>> 
>> When the process is started, we see "normal" memory usage, which
>> then
>> seems to grow indefinitely in distinct steps, separated by a period
>> of
>> a few hours.
>> 
>> In production, this consumes most of the 32G of memory until the
>> kernel oom-killer to intervenes.
>> 
>> Production:
>> 
>> BIRD 1.5.0 ready.
>> 
>> bird> show memory
>> 
>> BIRD memory usage
>> 
>> Routing tables:   1405 MB
>> 
>> Route attributes:   84 kB
>> 
>> ROA tables:        192  B
>> 
>> Protocols:          45 kB
>> 
>> Total:            1405 MB
>> 
>> bird> show route count
>> 
>> 2273 of 2273 routes for 1142 networks
>> 
>> # ps u  -p 3441
>> 
>> USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME
>> COMMAND
>> 
>> bird      3441  0.1 55.4 18275124 18241540 ?   Ssl  Aug10  73:39
>> /usr/sbin/bird -f -u bird -g bird
>> 
>> ..so that's ~1.4G reported by bird, and ~18G actually consumed by
>> the
>> process.
>> 
>> Lab:
>> 
>> BIRD 1.6.0 ready.
>> 
>> bird> show mem
>> 
>> BIRD memory usage
>> 
>> Routing tables:    693 MB
>> 
>> Route attributes:   28 kB
>> 
>> ROA tables:        192  B
>> 
>> Protocols:          41 kB
>> 
>> Total:             693 MB
>> 
>> bird> show route count
>> 
>> 175 of 175 routes for 91 networks
>> 
>> # ps u -p 29085
>> 
>> USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME
>> COMMAND
>> 
>> bird     29085  0.0 14.9 4994852 4915032 ?     Ssl  Aug05  19:41
>> /usr/sbin/bird -f -u bird -g bird
> 
>  Thanks for this report. I successfully simulated this weird behavior
> too. The setting of kernel protocol with some export filter will cause
> memory leak bug. I prepared fixing commits in branch
> `krt-export-filtr-fix'
> 
> https://gitlab.labs.nic.cz/labs/bird/commits/krt-export-filtr-fix [1]
> 
> Can you please download it and confirm, that the bug is fixed?
> 
> Best,
> Pavel
> 
>> ..so that's ~ 0.7G reported by bird, and ~5G actually consumed by
>> the
>> process.
>> 
>> I also attached the bird config from the lab.
>> 
>> Any help is much appreciated!
>> Thanks.
>> 
>> Cheers,
>> Just
>> Notice:  This email is confidential and may contain copyright
>> material
>> of members of the Ocado Group. Opinions and views expressed in this
>> message may not necessarily reflect the opinions and views of the
>> members of the Ocado Group.
>> 
>> If you are not the intended recipient, please notify us immediately
>> and delete all copies of this message. Please note that it is your
>> responsibility to scan this message for viruses.
>> 
>> Fetch and Sizzle are trading names of Speciality Stores Limited and
>> Fabled is a trading name of Marie Claire Beauty Limited, both
>> members
>> of the Ocado Group.
>> 
>> References to the “Ocado Group” are to Ocado Group plc
>> (registered
>> in England and Wales with number 7098618) and its subsidiary
>> undertakings (as that expression is defined in the Companies Act
>> 2006)
>> from time to time.  The registered office of Ocado Group plc is
>> Titan
>> Court, 3 Bishops Square, Hatfield Business Park, Hatfield, Herts.
>> AL10
>> 9NE.
> 
> Notice:  This email is confidential and may contain copyright material
> of members of the Ocado Group. Opinions and views expressed in this
> message may not necessarily reflect the opinions and views of the
> members of the Ocado Group.
> 
> If you are not the intended recipient, please notify us immediately
> and delete all copies of this message. Please note that it is your
> responsibility to scan this message for viruses.
> 
> Fetch and Sizzle are trading names of Speciality Stores Limited and
> Fabled is a trading name of Marie Claire Beauty Limited, both members
> of the Ocado Group.
> 
> References to the “Ocado Group” are to Ocado Group plc (registered
> in England and Wales with number 7098618) and its subsidiary
> undertakings (as that expression is defined in the Companies Act 2006)
> from time to time.  The registered office of Ocado Group plc is Titan
> Court, 3 Bishops Square, Hatfield Business Park, Hatfield, Herts. AL10
> 9NE.
> 
> Links:
> ------
> [1] https://gitlab.labs.nic.cz/labs/bird/commits/krt-export-filtr-fix


More information about the Bird-users mailing list