[PATCH] Add the Babel routing protocol to Bird
Ondrej Zajicek
santiago at crfreenet.org
Thu Oct 8 23:49:20 CEST 2015
On Thu, Oct 08, 2015 at 05:12:36PM +0200, Toke Høiland-Jørgensen wrote:
> Ondrej Zajicek <santiago at crfreenet.org> writes:
>
> > I am sorry i didn't got around to review it sooner, but doing a proper
> > review for a whole new (and unfamiliar) protocol takes nontrivial
> > amount of time.
> >
> > Generally, the code looks good and i am determined to finally merge
> > it, but it definitely needs more refinement before that.
>
> Hi Ondrej
>
> Awesome! Many thanks for your detailed comments, and thank you for going
> easy on my (lack of) C skills :)
>
> I'll go through and do another revision and resubmit. Some answers /
> additional questions inline below.
>
> > First, there is an issue with route selection. Proper route selection in BIRD
> > should work this way:
> >
> > 1) Babel chooses the best route from received routes, say selected_in
> > 2) If selected_in changes, it is propagated to the core by rte_update()
> > 3) Core compares routes from multiple protocols and one is exported to the Babel
> > 4) This route is either Babel own route or external route
> > 5) In babel_rt_notify(), Babel receives that route, finds appropriate internal
> > route (or create a new one) and stores it in say selected_out.
> > 6) Even if the received selected_out is an external route, Babel
> > do not stop propagating selected_in to the core
> > 7) selected_out is propagated to the network in periodic updates.
> > If selected_out changes, it is an event for triggered updates.
>
> Ah, thank you! I struggled a great deal with figuring out how this was
> supposed to work. I'll fix that, and also try to write this up in a form
> that is suitable for the documentation and include that in the next
> version of the patch.
The new RIP code uses this approach for route propagation (For OSPF it is
not directly applicable as it is link-state protocol, and BGP does not
use internal routing table so it is much simpler in this regard).
You could check rip_rte_update(), rip_rte_announce() and rip_rt_notify().
There, selected_in is stored implicitly as the first position in the list
of routes, while selected_out is directly the content of rip_entry.
Also one thing i forgot to notice. For generated external routes, you
initialize metric to 0 (r->metric = 0 in babel_rt_notify()). You should
use ea_get_int(attrs, EA_BABEL_METRIC, 0) to allow user set it in
export filters.
Default metric probably should be more than zero (zero makes sense
for /128 local address, while more than zero for locally reachable
network prefix).
> Speaking of documentation: What kind of documentation do you expect in
> the patch? Obviously I'll include user's documentation on the
> configuration options, but some of the other protocols also has API
> documentation of (some of) their function hooks, and it's not
> immediately obvious to me what the purpose of those are? Should I add
> that for Babel as well?
There are three things in devel documentation:
1) Initial overall description. This is IMHO most important part,
should be expanded and should describe overall interactions in the code.
2) (mostly) one-line description for structure fields (and constants)
in header files (aligned usually to 40. column by tabs). Although these
are not used to generate developer documentation, i generally consider
them more useful than function description.
3) Documentation of functions. It is true that for protocols this is less
important than for public lib or nest functions, but in the case of
non-trivial / non-obvious functions it is a good idea. I would suggest to
document esp. functions manipulating with internal structures,
implementing some protocol processes and timer/event hooks. It is usually
useless to document the usual protocol glue (e.g. babel_start,
babel_make_tmp_attrs, babel_get_route_info), unless it does some
nontrivial things (like babel_rt_notify()).
> > Second, can we really expect that TLVs in received packets are
> > aligned? Although all TLVs looks like suited to 32-bit alignment, i
> > did not found mentioned it in Babel RFC and availability of Pad1, PadN
> > TLVs and per-byte compressible addresses suggest that they generally
> > are not aligned.
> >
> > In that case, we have to use get_uXX()/put_uXX() functions instead of
> > directly accessing babel_tlv_* structure fields. Or we could copy each
> > TLV to a local buffer before calling ntoh+handle callbacka, as each
> > TLV internally is aligned, and enforce alignment for outgoing packets.
> > I would generally prefer the first approach, but the second approach
> > is simpler way to fix it.
>
> No, I don't think they are necessarily 32-bit aligned. I struggled with
> alignment issues for the 64-bit values (and got it to work by using the
> __attribute__((packed)) attribute in the structure definitions). I'll
> try to figure out how to do this properly in general.
There are at least three options:
1) Use __attribute__((packed)) or some other GCC magic for the whole
structure. Although it should reduce alignment of all internal fields
to 1, i am not 100% sure whether it also guarantees that the whole
structure may be unaligned. Also each access would use more generic
code on some archs.
2) Just copy each TLV before processing to a local buffer, which
makes it aligned.
3) Use get_uXX()/put_uXX() in handle/send functions. That would also
eliminate a need for separate ntoh/hton functions. You could use
something like:
#define GET_U32(p,f) get_u32(((char *) p) + OFFSETOF(typeof(*p),f))
#define GET_U16(p,f) get_u16(((char *) p) + OFFSETOF(typeof(*p),f))
to access fields by GET_U16(tlv, seqno) instead of tlv->seqno.
BTW, using babel_tlv_header is also a bit problematic, as some
architectures enforce minimal structure alignment ant therefore
size (to e.g. 4). Probably declaring it packed should be enough.
> > Rest are comments specific to lines in the patch:
>
> Most of these seems obvious how to fix; one or two questions below:
>
> >> diff --git a/nest/route.h b/nest/route.h
> >> ...
> >> @@ -538,6 +546,7 @@ extern struct protocol *attr_class_to_protocol[EAP_MAX];
> >> #define DEF_PREF_RIP 120 /* RIP */
> >> #define DEF_PREF_BGP 100 /* BGP */
> >> #define DEF_PREF_PIPE 70 /* Routes piped from other tables */
> >> +#define DEF_PREF_BABEL 42 /* Babel */
> >
> > Default could be higher, e.g. 130
>
> OK. What do these actually mean? Is it simply that a higher priority
> wins if two protocols have the same route?
Yes.
>
> >> +enum babel_iface_type_t {
> >> + BABEL_IFACE_TYPE_WIRED,
> >> + BABEL_IFACE_TYPE_WIRELESS,
> >> + BABEL_IFACE_TYPE_MAX
> >> +};
> >
> > I would prefer to have explicit values here. Perhaps also reserve 0
> > for undefined?
> Well, undefined == wired in this case. I.e. use the simple metric
> computation mechanism unless explicitly told to treat the interface as
> wireless.
>
> I don't suppose Bird has a way of telling automatically what type of
> interface we are dealing with?
Well, definitely not now.
> >> +static u16
> >> +babel_compute_rxcost(struct babel_neighbor *bn)
> >> +{
> >> + struct babel_iface *bif = bn->bif;
> >> + struct babel_proto *p = bif->proto;
> >> + u8 n, missed;
> >> + u16 map=bn->hello_map;
> >> +
> >> + if(!map) return BABEL_INFINITY;
> >> + n = __builtin_popcount(map); // number of bits set
> >
> > This should be defined as our inline function in lib/bitopts.h, then
> > used here.
>
> So just alias __builtin_popcount to something in bitopts.h?
Yes, may be u32_popcount().
> Not entirely sure what compilers you are targeting; that macro seems to be
> GCC-specific, but is also supported in LLVM since 1.5:
> http://llvm.org/releases/1.5/docs/ReleaseNotes.html
We are mainly targetting GCC, but i guess we could support LLVM with some minor
tweaks. Sane GCC extensions could be used, things like __buildin_* should be
used through some define.
> >> +int
> >> +babel_handle_ihu(struct babel_tlv_header *hdr, struct babel_parse_state *state)
> >> +{
> >> ...
> >> + struct babel_neighbor *bn = babel_get_neighbor(bif, state->saddr);
> >> + bn->txcost = tlv->rxcost;
> >> + bn->ihu_expiry = now + 1.5*(tlv->interval/100);
> >
> > Please avoid floating point ops.
>
> So, substitute 3/2 for 1.5?
Well, obviously not (3/2)*(tlv->interval/100), but (tlv->interval*3/2)/100
> >> +static void
> >> +babel_iface_linkdown(struct babel_iface *bif)
> >> +{
> >> + struct babel_neighbor *bn;
> >> + struct babel_route *r;
> >> + node *n;
> >> + WALK_LIST(bn, bif->neigh_list) {
> >> + WALK_LIST(n, bn->routes) {
> >> + r = SKIP_BACK(struct babel_route, neigh_route, n);
> >> + r->metric = BABEL_INFINITY;
> >> + r->expires = now + r->expiry_interval;
> >> + babel_select_route(r->e);
> >> + }
> >> + }
> >> +}
> >
> > Is there any reason why route handing is different in this case and in
> > kill_iface() case?
>
> My assumption was that linkdown means that the interface is likely to
> come back up, so I wanted to keep the routes and be able to refresh them
> when it does (and keep the hello seqno etc so the other hosts on the
> link don't refresh their metrics completely if the interface goes down
> for a short time). This came about while playing around with a laptop
> that was connected by wifi and ethernet to the same network (and running
> babel on both), then repeatedly unplugging and re-plugging the ethernet
> cable.
I will look at that.
> >> +static void
> >> +babel_copy_config(struct proto_config *dest, struct proto_config *src)
> >
> > You could just remove this function.
>
> Yeah, well, was planning to actually make it do something useful instead
> (i.e. support proper reconfiguring).
This function is not related to reconfiguration, it is related to protocol
templates, these do not make much sense for non-BGP protocols.
> >> +void
> >> +babel_hton_seqno_request(struct babel_tlv_header *hdr)
> >> +{
> >> + struct babel_tlv_seqno_request *tlv = (struct babel_tlv_seqno_request *)hdr;
> >> + tlv->seqno = htons(tlv->seqno);
> >> + tlv->router_id = htobe64(tlv->router_id);
> >> +}
> >
> > There are some issues with portability of names and including
> > appropriate header for htobe64() / be64toh(). Perhaps this should be
> > handled somewhere in lib/.
>
> Not sure what you mean here? Are you referring to htobe64() being
> glibc-specific? An how do you suggest to handle it? :)
Well, according to man page, these functions are not standardized
(although widely available), be64toh is named betoh64 on OpenBSD,
and different header file should be included on Linux and BSDs.
We could have lib/endian.h, which can include appropriate headers
and define aliases.
> >> +struct babel_tlv_header *
> >> +babel_add_tlv_size(struct babel_iface *bif, u16 type, int len)
> >> +{
> >> + struct babel_header *hdr = bif->current_buf;
> >> + struct babel_tlv_header *tlv;
> >> + int pktlen = sizeof(struct babel_header)+hdr->length;
> >> + if(pktlen+len > bif->max_pkt_len) {
> >> + babel_send_queue(bif);
> >> + pktlen = sizeof(struct babel_header)+hdr->length;
> >> + }
> >
> > This seems broken - TLV is added to current_buf, which may be either
> > tlv_buf or just sock->tbuf based on multicast vs unicast. But if there
> > si not enough space, then babel_send_queue() is called, which assumes
> > that data are in tlv_buf, and copies it to sock->tbuf.
>
> Ah, was trying to be careful with this; obviously didn't work. Will give
> it another go.
BTW, why is babel_send_queue() / babel_copy_tlv() so complicated?
If i understand it correctly, you have one TLV buffer (used for
accumulation of TLVs) which has the same size as socket TX buffer,
therefore you can (if TX buffer is empty) just copy whole TLV buffer
to TX buffer and send the packet.
> >> +int
> >
> >> + bif->sock->rbsize = 10240;
> >> + bif->sock->iface = bif->iface;
> >> + bif->sock->tbuf = mb_alloc( bif->pool, bif->iface->mtu);
> >
> > Why not just use:
> >
> > bif->sock->rbsize = MIN(512, bif->iface->mtu);
>
> You mean MAX?
Yes, of course :-)
BTW, i missed one answer - in:
> +static void
> +babel_select_route(struct babel_entry *e)
> +{
> + struct babel_proto *p = e->proto;
> + net *n = net_get(p->p.table, e->n.prefix, e->n.pxlen);
> + struct babel_route *r, *cur = e->selected;
> +
> + /* try to find the best feasible route */
> + WALK_LIST(r, e->routes)
> + if((!cur || r->metric < cur->metric)
> + && is_feasible(babel_find_source(e, r->router_id),
> + r->seqno, r->advert_metric))
> + cur = r;
Why feasibility is checked here? I would expect that feasibility is checked when
the route is received, but all accepted to e->routes stays feasible.
Is that true or the routes in e->routes may later turn infeasible?
--
Elen sila lumenn' omentielvo
Ondrej 'Santiago' Zajicek (email: santiago at crfreenet.org)
OpenPGP encrypted e-mails preferred (KeyID 0x11DEADC3, wwwkeys.pgp.net)
"To err is human -- to blame it on a computer is even more so."
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 181 bytes
Desc: Digital signature
URL: <http://trubka.network.cz/pipermail/bird-users/attachments/20151008/af054a4b/attachment.asc>
More information about the Bird-users
mailing list