Route attributes overwritten during reconfiguration
Lennert Buytenhek
buytenh at wantstofly.org
Fri Feb 17 20:57:41 CET 2017
On Fri, Feb 17, 2017 at 02:22:11PM +0100, Ondrej Zajicek wrote:
> > I work with João and I managed to cut our (huge) config down to a
> > minimal reproducer. Our setup does something like this:
>
> Hi
>
> Thanks for trying to track the problem.
It was fun to do. :-) Thanks for the reply!
> > There seems to be a problem related to the memory management of
> > strings used in per static route implicit filter expressions when a
> > reconfiguration happens.
>
> Does the problem disappear if you replace strings with numbers in your
> test case?
I had another look at this, and it's a lot easier to make it crash with
strings for some reason, but if you look purely at valgrind warnings, it
seems that you can trigger those without using strings. Sorry for the
initial misinformation.
I reproduced the valgrind warnings with this config file:
===
router id 1.2.3.4;
log stderr all;
debug protocols all;
protocol device {
scan time 3600;
}
function function1() {
print 1234;
}
protocol static {
route 1234::/32 via "lo" {
function1();
};
}
===
When I run this with the following script:
===
#!/bin/sh
valgrind ./bird -c b.conf -d &
sleep 0.75
kill -HUP %1
sleep 0.1
kill -HUP %1
sleep 0.1
kill %1
===
I get this (again, warnings on the second reconfigure):
===
# ./test.sh
==15175== Memcheck, a memory error detector
==15175== Copyright (C) 2002-2012, and GNU GPL'd, by Julian Seward et al.
==15175== Using Valgrind-3.8.1 and LibVEX; rerun with -h for copyright info
==15175== Command: ./bird -c b.conf -d
==15175==
2017-02-17 21:39:52 <TRACE> device1: Initializing
2017-02-17 21:39:52 <TRACE> static1: Initializing
2017-02-17 21:39:52 <TRACE> device1: Starting
2017-02-17 21:39:52 <TRACE> device1: Scanning interfaces
2017-02-17 21:39:52 <TRACE> device1: Connected to table master
2017-02-17 21:39:52 <TRACE> device1: State changed to feed
2017-02-17 21:39:52 <TRACE> static1: Starting
2017-02-17 21:39:52 <TRACE> static1: Connected to table master
2017-02-17 21:39:52 <TRACE> static1: State changed to feed
2017-02-17 21:39:52 <INFO> Started
2017-02-17 21:39:52 <TRACE> device1: State changed to up
2017-02-17 21:39:52 <TRACE> static1 < interface lo goes up
2017-02-17 21:39:52 <INFO> 1234
2017-02-17 21:39:52 <TRACE> static1 > added [best] 1234::/32 dev lo
2017-02-17 21:39:52 <TRACE> static1 < interface eth0 goes up
2017-02-17 21:39:52 <TRACE> static1: State changed to up
2017-02-17 21:39:52 <INFO> Reconfiguration requested by SIGHUP
2017-02-17 21:39:52 <INFO> Reconfiguring
2017-02-17 21:39:52 <TRACE> device1: Reconfigured
2017-02-17 21:39:52 <TRACE> static1: Reconfigured
2017-02-17 21:39:52 <INFO> Reconfigured
2017-02-17 21:39:52 <INFO> Reconfiguration requested by SIGHUP
2017-02-17 21:39:52 <INFO> Reconfiguring
2017-02-17 21:39:52 <TRACE> device1: Reconfigured
==15175== Invalid read of size 2
==15175== at 0x805A499: i_same (filter.c:1539)
==15175== by 0x805A893: i_same (filter.c:1632)
==15175== by 0x8085D4C: static_match.isra.13 (static.c:501)
==15175== by 0x8085F95: static_reconfigure (static.c:557)
==15175== by 0x8053FB4: protos_commit (proto.c:420)
==15175== by 0x8095F42: config_do_commit (conf.c:271)
==15175== by 0x8096172: config_commit (conf.c:361)
==15175== by 0x809D5C3: async_config (main.c:238)
==15175== by 0x8099B37: io_loop (io.c:2135)
==15175== by 0x804A733: main (main.c:885)
==15175== Address 0x4217a46 is 3,326 bytes inside a block of size 4,088 free'd
==15175== at 0x4028F4F: free (vg_replace_malloc.c:446)
==15175== by 0x809E3E9: lp_free (mempool.c:211)
==15175== by 0x80A2020: pool_free (resource.c:81)
==15175== by 0x80A20AF: rfree (resource.c:165)
==15175== by 0x8095E30: config_do_commit (conf.c:250)
==15175== by 0x8096172: config_commit (conf.c:361)
==15175== by 0x809D5C3: async_config (main.c:238)
==15175== by 0x8099B37: io_loop (io.c:2135)
==15175== by 0x804A733: main (main.c:885)
==15175==
==15175== Invalid read of size 2
==15175== at 0x805A4A7: i_same (filter.c:1541)
==15175== by 0x805A893: i_same (filter.c:1632)
==15175== by 0x8085D4C: static_match.isra.13 (static.c:501)
==15175== by 0x8085F95: static_reconfigure (static.c:557)
==15175== by 0x8053FB4: protos_commit (proto.c:420)
==15175== by 0x8095F42: config_do_commit (conf.c:271)
==15175== by 0x8096172: config_commit (conf.c:361)
==15175== by 0x809D5C3: async_config (main.c:238)
==15175== by 0x8099B37: io_loop (io.c:2135)
==15175== by 0x804A733: main (main.c:885)
==15175== Address 0x4217a44 is 3,324 bytes inside a block of size 4,088 free'd
==15175== at 0x4028F4F: free (vg_replace_malloc.c:446)
==15175== by 0x809E3E9: lp_free (mempool.c:211)
==15175== by 0x80A2020: pool_free (resource.c:81)
==15175== by 0x80A20AF: rfree (resource.c:165)
==15175== by 0x8095E30: config_do_commit (conf.c:250)
==15175== by 0x8096172: config_commit (conf.c:361)
==15175== by 0x809D5C3: async_config (main.c:238)
==15175== by 0x8099B37: io_loop (io.c:2135)
==15175== by 0x804A733: main (main.c:885)
==15175==
==15175== Invalid read of size 4
==15175== at 0x805A677: i_same (filter.c:1626)
==15175== by 0x805A893: i_same (filter.c:1632)
==15175== by 0x8085D4C: static_match.isra.13 (static.c:501)
==15175== by 0x8085F95: static_reconfigure (static.c:557)
==15175== by 0x8053FB4: protos_commit (proto.c:420)
==15175== by 0x8095F42: config_do_commit (conf.c:271)
==15175== by 0x8096172: config_commit (conf.c:361)
==15175== by 0x809D5C3: async_config (main.c:238)
==15175== by 0x8099B37: io_loop (io.c:2135)
==15175== by 0x804A733: main (main.c:885)
==15175== Address 0x4217a48 is 3,328 bytes inside a block of size 4,088 free'd
==15175== at 0x4028F4F: free (vg_replace_malloc.c:446)
==15175== by 0x809E3E9: lp_free (mempool.c:211)
==15175== by 0x80A2020: pool_free (resource.c:81)
==15175== by 0x80A20AF: rfree (resource.c:165)
==15175== by 0x8095E30: config_do_commit (conf.c:250)
==15175== by 0x8096172: config_commit (conf.c:361)
==15175== by 0x809D5C3: async_config (main.c:238)
==15175== by 0x8099B37: io_loop (io.c:2135)
==15175== by 0x804A733: main (main.c:885)
[...]
===
This is with a build for an older distro for i686 (32 bit), but I can
also reproduce the issue on the newest stock distro bird package (1.6.3)
that ships with Fedora 25 for x86_64:
===
[...]
2017-02-17 21:51:32 <INFO> Reconfigured
2017-02-17 21:51:32 <INFO> Reconfiguration requested by SIGHUP
2017-02-17 21:51:32 <INFO> Reconfiguring
2017-02-17 21:51:32 <TRACE> device1: Reconfigured
==15956== Invalid read of size 2
==15956== at 0x1201FA: ??? (in /usr/sbin/bird)
==15956== by 0x120669: ??? (in /usr/sbin/bird)
==15956== by 0x142FCC: ??? (in /usr/sbin/bird)
==15956== by 0x143194: ??? (in /usr/sbin/bird)
==15956== by 0x119A2C: ??? (in /usr/sbin/bird)
==15956== by 0x14E934: ??? (in /usr/sbin/bird)
==15956== by 0x14ED5A: ??? (in /usr/sbin/bird)
==15956== by 0x15625F: ??? (in /usr/sbin/bird)
==15956== by 0x15263F: ??? (in /usr/sbin/bird)
==15956== by 0x1105FD: ??? (in /usr/sbin/bird)
==15956== by 0x5078400: (below main) (in /usr/lib64/libc-2.24.so)
==15956== Address 0x542846a is 1,338 bytes inside a block of size 4,096 free'd
==15956== at 0x4C2ED4A: free (vg_replace_malloc.c:530)
==15956== by 0x15719F: ??? (in /usr/sbin/bird)
==15956== by 0x15AF21: ??? (in /usr/sbin/bird)
==15956== by 0x15AFB1: ??? (in /usr/sbin/bird)
==15956== by 0x14E84C: ??? (in /usr/sbin/bird)
==15956== by 0x14ED5A: ??? (in /usr/sbin/bird)
==15956== by 0x15625F: ??? (in /usr/sbin/bird)
==15956== by 0x15263F: ??? (in /usr/sbin/bird)
==15956== by 0x1105FD: ??? (in /usr/sbin/bird)
==15956== by 0x5078400: (below main) (in /usr/lib64/libc-2.24.so)
==15956== Block was alloc'd at
==15956== at 0x4C2DB9D: malloc (vg_replace_malloc.c:299)
==15956== by 0x15DD3A: ??? (in /usr/sbin/bird)
==15956== by 0x15730A: ??? (in /usr/sbin/bird)
==15956== by 0x15735C: ??? (in /usr/sbin/bird)
==15956== by 0x15607E: ??? (in /usr/sbin/bird)
==15956== by 0x156198: ??? (in /usr/sbin/bird)
==15956== by 0x14F206: ??? (in /usr/sbin/bird)
==15956== by 0x155AE6: ??? (in /usr/sbin/bird)
==15956== by 0x1103C6: ??? (in /usr/sbin/bird)
==15956== by 0x5078400: (below main) (in /usr/lib64/libc-2.24.so)
[...]
===
> > The crash goes away if I apply this patch (not the right fix, I'm sure):
> >
> > ===
> > diff --git a/proto/static/static.c b/proto/static/static.c
> > index 0c088cd..a56771f 100644
> > --- a/proto/static/static.c
> > +++ b/proto/static/static.c
> > @@ -498,7 +498,8 @@ static_same_dest(struct static_route *x, struct static_route *y)
> > static inline int
> > static_same_rte(struct static_route *x, struct static_route *y)
> > {
> > - return static_same_dest(x, y) && i_same(x->cmds, y->cmds);
> > +// return static_same_dest(x, y) && i_same(x->cmds, y->cmds);
> > + return 0;
> > }
> >
> >
> > ===
> >
> > I think that the reconfiguration code finds the old and the new implicit
> > filter expression to be the same code, and therefore hangs onto the old
> > version, but then the underlying memory for some of the old objects gets
> > freed. Or something like that.
>
> Well, the comparison in static_same_rte() is just for decision whether
> the route should be propagated again to the (main) routing table.
>
> Internal structures of static protocols (like struct static_route) are
> replaced anyways.
I see. Well, I was hoping to avoid going too deeply into bird internals
(I've already been labeled our resident ornithologist), but I can dig
deeper if the problem doesn't reproduce on your end for some reason.
Thanks again!
Cheers,
Lennert
More information about the Bird-users
mailing list