Merging bird and bird6

Thu Jul 21 23:47:14 CEST 2011

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Ondrej Zajicek wrote:
> On Tue, Jul 12, 2011 at 04:47:06AM +0400, Alexander V. Chernikov wrote:
>> To show "overall" view we have to describe what we will add and what
>> will be required from BIRD first.
> 
> 
> Thanks for the great overview. Sorry for a late answer, it took
> a while for me to get into MPLS and think about it.
> 
Sorry for a late answer, too. ETIME issues :(
>> * UNDER THE HOOD
>> *** KERNEL INTERACTION ***
> 
> 
> So essentially, there are three kinds of routes:
> 
>  * Standard IP routes with IP nexthops, which we already support.
> 
>  * MPLS routes, keyed by MPLS label and with MPLS action (NHLFE), these
> form a MPLS routing table (ILM). I will call these MPLS routes.
> 
>  * IP routes with MPLS action, used for encapsulation of incoming
> IP packets (FTN mapping), these share a routing table with standard IP
> routes (because depending of which route is chosen packet is either
> forwarded in a standard way, or encapsulated to MPLS). I will call these
> encapsulating routes.
> 
> 
> If i understand correctly your mail, you use some EAP_ADDITIONAL
> external attribute to represent encapsulating routes and use some new
> hook to attach these routes by third party protocol. I think this is not
> a good idea - to be semantically consistent, i think encapsulation
> routes should be represented by routes with new destination type
> (RTD_MPLS in dest field of struct rta) and whole NHLFE should be stored
> in new struct rta_mpls (or rta_nhlfe), which would be extension of
> struct rta (containing struct rta in the first field and NHLFE after
> that). Such structure could be easily passed as struct rta and functions
> from rt-attr.c can work with that, with jome some minor modifications
> (allocating, freeing and printing) dispatched based on dest field.
> Otherwise, they are very similar to standard IP routes and probably
> would need just some minor tweaks (and obviously kernel protocol support).
> 
> Therefore, such encapsulating route should be generated in a standard
> way as a new route - by rte_update, with LDP (or some other protocol)
> as true originator (in rta->proto and rta->source). I will comment
> that later.
Understood. This is much better than calling some protocol hooks directly.
> 
> 
> MPLS routes could use the same struct rta_mpls as encapsulating routes,
> but struct network (their fib_node) contains MPLS label instead of IP address.
> As MPLS label is small (and complex action is outside) i don't see any problems
> in reuse ip_addr prefix. Most things would work without modifications.
> There should be AF field in struct rtable and struct rte to distinguish
> routes.
Storing AF in rte makes much more sense if we use separate AFs for
inet/inet6.
> 
> 
> Therefore there would be two types of routing tables - IP and MPLS. I
> don't think it is a good idea to mix these. This may look inconsistent
> with idea of embedding IPv4 to IPv6, but IP protocols are much more
> similar, have a natural way to embed one in the other, have similar
> roles and protocol structure. MPLS routing table could be used to LDP -
> kernel interaction (routes imported from LDP and exported to kernel).
> This solves your Case 2 without any hacks.
So, from user point of view, I define
table xxx; for both ipv4 and IPv6 routes and
mpls table yyy; for MPLS routing table?
There should be base MPLS rtable (mpls_default, for example) as in IP.
We can also add a hack for automatically subscribe protocols for MPLS
routing table by type and other attributes. For example, every LDP
instance gets connected to an MPLS table (default or defined in config).
Kernel protocol instance gets connected to MPLS table only if its IP
table is the default one (GRT) or 'mpls table' keyword is supplied
explicitely. What about VPNv4/VPNv6 ? The same approach?

 Btw, how we will distinguish inet/inet6 rtes? (I'm talking about MP-BGP
/ IPv4-mapped cases)

> 
>> Case 1:
>> Route update can happen differently: we can install updated route IFF
>> * LDP label exists
>> * IGP nexthop is one of advertised LDP neighbour nexthops.
> 
> I think it is possible to handle all these cases and protocol
> interaction in an elegant way. LDP protocol, instead of just import and
> export to one table, could be connected to more tables, with different
> meanings. There are four interactions of LDP protocol - generating MPLS
> routes, generating encapsulating routes, importing label requests (can
> be handled as routes) [*] and tracking IGP table (to update nexthops of
> generated routes). These all can be handled as import or export of
> routes to proper tables. Standard table connection (to IP table) could
> be used for import (from LDP) of generated encapsulating routes and
> export (to LDP) of label requests. Another connection to MPLS table
> would be used for import (from LDP) of generated MPLS routes, and the
> last one is used for tracking IGP changes:
> 
> protocol ldp {
> 	export all; # label requests
> 	import all; # encapsulating routes
> 	mpls import all;  # MPLS routes
> 	# it is probably pointless to have configurable filters for IGP tracking
> 
> 	table t1; # table for import label requests and export encapsulating routes
> 	mpls table t2; # table for MPLS routes
> 	igp table t1; # table for tracking IGP routes, usually (and by default) the same as main table.
> }
> 
> [*] when i wrote that i thought that labels are distributed just by LDP
> and the purpose of label request is to propagate the label through LDP
> area. i didn't noticed that BGP/MPLS also distributes labels so they
> need to know assigned labels. So the idea would need some modifications.
Not sure this will work. Since t1 is an IP table cases when we need to
request specific label for:
* AToM
* RSVP-TE tunnels
will not work since there are no prefixes that can be mapped to such
request.

> (I assume that LDP generates encapsulating routes as a true originator,
> as i wrote before, not just attaching some attribute to the existing
> route.)
> 
> So my idea of your Case 1 scenario is like this:
> 
> In both subcases (LDP LMAP arrives and internal table with LMAPs changed;
> rt_notify() on 'tracked IGP connection' is used to signalize that
> tracked table changed), the same procedure is executed:
> 
> Internal LMAP table is examined, tracked IGP table is examined. If both
> are ready (for given prefix), appropriate encapsulating and MPLS routes
> are generated and propagated using rte_update(), otherwise nothing is
> generated and the previously generated route is withdrawn (rte_update()
> with NULL is called) (or perhaps an unreachable route is generated if
> LMAP is here but IGP route is missing). Simple and elegant.
.. and in case of label release we should remove label only and keep
original route
> If the encapsulated routes are saved
> 
> Case 2 scenarions are trivial - just standard updates.
> 
> 
> There are some tricky parts of IGP tracking - it is problematic
> to use standard RA_OPTIMAL update for this purpose, because if
> generated encapsulating routes are imported to the same table,
> these probably became the optimal ones and IGP routes would be
> shaded. Solution would be to use RA_ANY, and ignore notifications
> containing encapsulating routes, similarly 'examining the tracked
> IGP table' means looking up the fib node and find the best route,
> ignoring encapsulating ones.
> 
> For implementation of this behavior, there are two minor changes that
> needs to be done to the rt table code: First, currently accept_ra_types
> (RA_OPTIMAL/RA_ANY) is a property of a protocol, it needs to be a
> property of an announce hook (as LDP would have two hooks with
> RA_OPTIMAL and one hook with RA_ANY). Second, rte_announce() for 
> both in rte_recalculate should be moved after the route list
> is updated/relinked.
Agreed. Distinguishing RA_OPTIMAL and RA_ANY in current code is not a
trivial task and requires internals understanding. Either announce type
should be passed to announce hook or new hook should be added for RA_ANY
 event. The latter is more appropriate IMHO since RA_ANY is used by pipe
protocol only. Kernel protocol should track RA_ANY protocol hooks
looking for update source (LDP / RSVP) and re-install appropriate
routes. The only downside is situation when LDP signalling starts faster
than IGP. In that case we will get 3 updates instead of one (at least in
RTSOCK):
* RTM_ADD for original prefix
* RTM_DEL for this prefix (as part of krt_set_notify())
* RTM_ADD for modified prefix

RTM_CHANGE can be used in notify, but still: this gives 2 updates
instead of one.

> 
> 
> BTW, this whole dependency 'IGP table -> LDP function' is a bit similar
> to situation with recursive nexthops in IBGP, where IGP change also
> leads to change of IBGP route nexthop. In IBGP case it is handled
> automatically by rtable code (see rta_set_recursive_next_hop()
> discussion in route.h, hostcache and hostentry), LDP situation is a bit
> different, but perhaps the same mechanism could be extended to call
> protocol hook instead just update nexthop. This mechanism is useful if
> protocol waits for a change of a result of some recursive lookup in tracked
> table. But the LDP situation is much simpler, it just waits for an exact match
> change in tracked table.
> 
> 
>> * There is no need to call all other protocols since they should not be
>> interested in such update
> 
> Not true, other protocol may have filters that changes answer if you
> do some changes to route attributes. Ignoring that would lead to
> inconsistencies in route propagation.
> 
>> * By upgrading FIB / rtable:
>> If (from the point of user) config tables will be not AF_bound (e.g.
>> IPv4+IPv6) we will have to do enhance FIB api.
>>
>> My vision is the following:
>> * make fib AF_ bound, specifying AF and sizeof(object) at fib_init (or
>> fib2_init)
>> * pass pointers to all fib_* related functions instead of addresses
>> * do compare by memcpy() for searching (and use AF-dependent hash based
>> on value passed in _init)
>> * Pass AF in appropriate protocol hooks
> 
> As i wrote above, if we consider just IP (v4 and v6) and MPLS routes,
> i think that fixed size fib would be enough. But problems are with
> VPNvX AFs.
> 
> Originally i thought that having FIB / rtable with VPNvX routes is not a
> good idea - these AFs are just some wire representation of multiple 
> independent IP spaces, and we already have better representation of that
>  - just multiple routing tables. Having both these representations seemed
> unnecessary and would require some conversion between the parts that
> request the first representation and the parts that request the second
> one. But not having VPNvX routes is also cumbersome - protocols that
> uses these have to be bound to multple routing tables through some
> multiplexer. So it is probably easier to have tables with VPNvX
> AFs.
> 
> 
> Therefore, it is probably a good idea to extend FIBs in a way you
> suggested, with minor details changed. FIB / rtables would be uniform
> (AF_ bound), but there are just three AFs (IP, MPLS, VPN) - IPv4 and IPv6
> could be handled as one AF, embedded, the same for VPNv4 and VPNv6). To
> minimize code changes, struct fib_node would have ip_addr prefix, but
> might be allocated larger. 
Okay, so for IPv4+IPv6-enabled daemon we will allocate an ip_addr large
enough for holding IPv6 address? This can bump memory consumption for
setups with several full-views significantly.
> 
> Because each protocol and each its announce_hook have appropriate role,
> it is IMHO unnecessary to have AF in protocol hooks, but there should be
> check whether protocol/announce_hook is connected to appropriate rtable.
> 

To summarize required changes (please correct me):
1) Differentiate between RA_ANY and RA_OPTIMAL (new hook, possibly)
2) Add 3 AFs (AF_IP, AF_MPLS, AF_VPN) to the following structures:
* rtable
* fib
* rte
3) Add fib2_init with sizeof(AF object) supplied. Add appropriate field
to struct fib to hold this value.
4) Move to memcmp() in fib_find / fib_get
5) Set up default rtable for every supported AF. Connect protocol
instances to such default AFs based on protocol types
...

Most of this are more or less trivial changes not MPLS-bound (VPNv4/6
can be used in case of bird used as RR in MPLS network, for example).
Should I supply patches for these? What are your plans about commit
routemap ?

)

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.14 (FreeBSD)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAk4onmIACgkQwcJ4iSZ1q2lOcwCfT3CcT/bsxIlg1UiiArLWPq4k
w9EAnAzx7YifSgszTpHBcdwAvf01KI7S
=6LzQ
-----END PGP SIGNATURE-----