Merging bird and bird6

Tue Jul 26 11:10:44 CEST 2011

On 22.07.2011 14:52, Ondrej Zajicek wrote:
> On Fri, Jul 22, 2011 at 01:47:14AM +0400, Alexander V. Chernikov wrote:
>>> Therefore there would be two types of routing tables - IP and MPLS. I
>>> don't think it is a good idea to mix these. This may look inconsistent
>>> with idea of embedding IPv4 to IPv6, but IP protocols are much more
>>> similar, have a natural way to embed one in the other, have similar
>>> roles and protocol structure. MPLS routing table could be used to LDP -
>>> kernel interaction (routes imported from LDP and exported to kernel).
>>> This solves your Case 2 without any hacks.
>> So, from user point of view, I define
>> table xxx; for both ipv4 and IPv6 routes and
>> mpls table yyy; for MPLS routing table?
>
> Yes.
>
>> There should be base MPLS rtable (mpls_default, for example) as in IP.
>> We can also add a hack for automatically subscribe protocols for MPLS
>> routing table by type and other attributes. For example, every LDP
>> instance gets connected to an MPLS table (default or defined in config).
>> Kernel protocol instance gets connected to MPLS table only if its IP
>> table is the default one (GRT) or 'mpls table' keyword is supplied
>> explicitely. What about VPNv4/VPNv6 ? The same approach?
>
> Perhaps even default MPLS table should be explicitly configured [*] (as i guess
> not many BIRD users would use MPLS). Protocols requiring MPLS table would
> fail if it is not configured, protocol with optional MPLS support (kernel,
> static?) just do not connect to MPLS in that case. The same approach
> for VPNvX table.
>
> [*] probably like: mpls table XXX default;
Maybe it's better to turn on "general" mpls support?
e.g. 'mpls support;' or just 'mpls;' instead of propagating some table 
to be default?
>
>>   Btw, how we will distinguish inet/inet6 rtes? (I'm talking about MP-BGP
>> / IPv4-mapped cases)
>
> I planned to use IPv4-mapped prefix (::ffff:0:0/96), which is used for
> similar purposes in IP stack. But this should not be checked directly
> in protocols, there should be some macros in lib/ipv6.h for that.
>
>>> [*] when i wrote that i thought that labels are distributed just by LDP
>>> and the purpose of label request is to propagate the label through LDP
>>> area. i didn't noticed that BGP/MPLS also distributes labels so they
>>> need to know assigned labels. So the idea would need some modifications.
>> Not sure this will work. Since t1 is an IP table cases when we need to
>> request specific label for:
>> * AToM
>> * RSVP-TE tunnels
>> will not work since there are no prefixes that can be mapped to such
>> request.
>
> You are probably right. I originally thought about some specific
> 'request table' (where requests coded as routes with specific AF),
> but perhaps there should be used some other mechanism / other protocol
> hook. But it should be generic enough (some bus, allows at least more
> 'producers' and perhaps more 'consumers').
Okay, i see this as follows:
New rtable hook, service_hook, with uint32_3 bitmask specifying request 
classes we are responsible to:
/* Defined classes */
#define RCLASS_LABEL 0x01 /* MPLS label request */

Some request function:
int
request_data(rtable *t, struct service_request *req, void **buf, size_t 
*bufsize)

struct service_request {
    uint32_t    request; /* Single request class set */
    uint32_t    subclass; /* Subclass specific for request */
    proto       *p; /* caller protocol */
    char        data[0]; /* request-specific data follows */
}

function loops thru all registered hooks for given _class_ checking for 
reply until SR_OK or SR_FAIL is returned. It is up to protocol hook to 
check subclass.
#define SR_OK      0x01 /* Request successful */
#define SR_FAIL    0x02 /* Request failed */
#define SR_NEXT    0x03 /* Request skipped */
#define SR_UNAVAIL 0x04 /* No providers for this request */

As a result, caller get SR_UNAVAIL in case of no providers were able to 
serve request or SR_OK|SR_FAIL.

caller can setup buffer itself and pass pointer to pointer to buffer and 
pointer to buffer size to function, or request provider to allocate data 
for him setting *buf to NULL and bufsize to 0

struct service_reply { /* is returned in reply buffer */
   uint32_t    request;
   uint32_t    subclass;
   proto       *p; /* protocol, providing data */
   char        data[0]; /* request-specific data */
}

>
>>> Internal LMAP table is examined, tracked IGP table is examined. If both
>>> are ready (for given prefix), appropriate encapsulating and MPLS routes
>>> are generated and propagated using rte_update(), otherwise nothing is
>>> generated and the previously generated route is withdrawn (rte_update()
>>> with NULL is called) (or perhaps an unreachable route is generated if
>>> LMAP is here but IGP route is missing). Simple and elegant.
>> .. and in case of label release we should remove label only and keep
>> original route
>
> Yes.
>
>>> There are some tricky parts of IGP tracking - it is problematic
>>> to use standard RA_OPTIMAL update for this purpose, because if
>>> generated encapsulating routes are imported to the same table,
>>> these probably became the optimal ones and IGP routes would be
>>> shaded. Solution would be to use RA_ANY, and ignore notifications
>>> containing encapsulating routes, similarly 'examining the tracked
>>> IGP table' means looking up the fib node and find the best route,
>>> ignoring encapsulating ones.
>>>
>>> For implementation of this behavior, there are two minor changes that
>>> needs to be done to the rt table code: First, currently accept_ra_types
>>> (RA_OPTIMAL/RA_ANY) is a property of a protocol, it needs to be a
>>> property of an announce hook (as LDP would have two hooks with
>>> RA_OPTIMAL and one hook with RA_ANY). Second, rte_announce() for
>>> both in rte_recalculate should be moved after the route list
>>> is updated/relinked.
>
>> Agreed. Distinguishing RA_OPTIMAL and RA_ANY in current code is not a
>> trivial task and requires internals understanding. Either announce type
>> should be passed to announce hook or new hook should be added for RA_ANY
>>   event. The latter is more appropriate IMHO since RA_ANY is used by pipe
>> protocol only.
>
> I thought about that when i created RA_ANY and have chosen this approach.
> Probably best way is just to change rt_notify to have appropriate
> struct announce_hook as a second argument instead of struct rtable.
> struct announce_hook would contain RA_ANY/RA_OPTIMAL and possibly
> some protocol-specific data. As (probably) all protocols are in-tree,
> doing some wide but trivial changes is not a problem.
>
>> Kernel protocol should track RA_ANY protocol hooks
>> looking for update source (LDP / RSVP) and re-install appropriate
>> routes.
>
> I think kernel protocol should use RA_OPTIMAL as usual. This kind
> of RA_ANY usage is for protocols that export routes to the same
> table they listen (so 'source' routes would be shaded by their
> routes). These routes (LDP / RSVP) should have just highest
> priority.
>
>> The only downside is situation when LDP signalling starts faster
>> than IGP. In that case we will get 3 updates instead of one (at least in
>> RTSOCK):
>> * RTM_ADD for original prefix
>> * RTM_DEL for this prefix (as part of krt_set_notify())
>> * RTM_ADD for modified prefix
>>
>> RTM_CHANGE can be used in notify, but still: this gives 2 updates
>> instead of one.
>
> No, because RA_ANY is handled strictly before RA_OPTIMAL and routes
> are propagated synchronously depth-first:
>
> OSPF --RA_ANY-->  LDP
> LDP --RA_OPTIMAL-->  kernel
> OSPF --RA_OPTIMAL-->  kernel
>
Still I can't understand how exactly I can modify an announced IP route 
(still, from FreeBSD kernel point of view encapsulated route is a usual 
route with an attribute attached. From Linux point of view this should 
be more or less the same since an IP route lookup have to be done for 
incoming packet anyway and doing several different lookups is not a best 
idea). I've got RA_ANY hook called for a new route (and I should know 
that it is actually RA_OPTIMAL without some complex logic!), what I 
should do next ?

> But it is true that this is much dependent on internal implementation
> of route propagation. The first idea i had was to use separate
> tables for original and labeled routes (when just RA_OPTIMAL hooks),
> but that looks too cumbersome for users and ability to push a better
> route to the same (input) table has other possible usages.
>
>>> Therefore, it is probably a good idea to extend FIBs in a way you
>>> suggested, with minor details changed. FIB / rtables would be uniform
>>> (AF_ bound), but there are just three AFs (IP, MPLS, VPN) - IPv4 and IPv6
>>> could be handled as one AF, embedded, the same for VPNv4 and VPNv6). To
>>> minimize code changes, struct fib_node would have ip_addr prefix, but
>>> might be allocated larger.
>> Okay, so for IPv4+IPv6-enabled daemon we will allocate an ip_addr large
>> enough for holding IPv6 address? This can bump memory consumption for
>> setups with several full-views significantly.
>
> It increases memory consumtion, but not so much in a relative view - for
> each struct network there is at least one struct rte and in both of them
> there is just one ip_addr and both structures are nontrivial. So this
> relative increase would be about 1.15-1.2. Really big users would
> probably keep current splitted setting.
Okay, it's much easier from developer point of view. If you're not 
afraid of your users :)
>
>>> Because each protocol and each its announce_hook have appropriate role,
>>> it is IMHO unnecessary to have AF in protocol hooks, but there should be
>>> check whether protocol/announce_hook is connected to appropriate rtable.
>>>
>>
>> To summarize required changes (please correct me):
>> 1) Differentiate between RA_ANY and RA_OPTIMAL (new hook, possibly)
>> 2) Add 3 AFs (AF_IP, AF_MPLS, AF_VPN) to the following structures:
>> * rtable
>> * fib
>> * rte
>> 3) Add fib2_init with sizeof(AF object) supplied. Add appropriate field
>> to struct fib to hold this value.
>> 4) Move to memcmp() in fib_find / fib_get
>> 5) Set up default rtable for every supported AF. Connect protocol
>> instances to such default AFs based on protocol types
>
> 1a) other changes in rte_recalculate() related to propagation
> (clean up the table before calling RA_ANY hook).
>
> 1) and 1a) i will do myself and send you the patch, and also make
> some trivial example for exporting to the same table.
>
> 2) i am not sure if there is a reason to put explicit AF info
> to struct fib, AF compatibility could be handled on higher level
> (struct rtable in general, other direct users probably use just
> one AF).
No problem, I misinterpreted "FIB / rtables would be uniform (AF_ 
bound)" as "FIB / rtable needs AF infor in structure fields"
>
> 3) and hashing callback (and perhaps fib_route, but not sure if this is
> needed).
>
> 4) probably encapsulate that to some static inline key_equal() function.
>
> 5) see my related note above. Protocol binding to tables should check AFs.
>
> more:
>
> 6) RTD_MPLS in dest field, struct rta_mpls, as i wrote in the previous mail:
>
>>> i think encapsulation
>>> routes should be represented by routes with new destination type
>>> (RTD_MPLS in dest field of struct rta) and whole NHLFE should be stored
>>> in new struct rta_mpls (or rta_nhlfe), which would be extension of
>>> struct rta (containing struct rta in the first field and NHLFE after
>>> that). Such structure could be easily passed as struct rta and functions
>>> from rt-attr.c can work with that, with jome some minor modifications
>>> (allocating, freeing and printing) dispatched based on dest field.
>
>>> This rta could be used without changes also for MPLS routes.

I'll try to send you patches for all these as I see it in several days.
>
>
>> Most of this are more or less trivial changes not MPLS-bound (VPNv4/6
>> can be used in case of bird used as RR in MPLS network, for example).
>> Should I supply patches for these? What are your plans about commit
>> routemap ?
>
> I create GIT branch 'mpls' and would merge these patches to that branch
> soon. When we will have some major release, we could merge 'mpls' branch
> to master if there is some sufficient usage (i think that even just
> static and kernel protocol support for MPLS would be a good example
> usage). Other protocols (LDP, ...) probably should be merged when they
> are reasonable ready.
Will this branch available from official git repo ? It is not accessible 
(from its web interface at least).

Btw, some bird/LDP "status" report:

bird> show ldp neighbour
     Peer LDP Ident: 10.2.33.4:0; Local LDP Ident 10.0.0.88:0
          TCP connection: 10.2.33.4.11212 - 0.0.0.0.0
          State: Operational; Msgs sent/rcvd: 21/61; Downstream
          Up time: 00:02:27
          LDP discovery sources:
            em1, Src IP addr: 10.1.5.4
     Peer LDP Ident: 10.2.33.3:0; Local LDP Ident 10.0.0.88:0
          TCP connection: 10.2.33.3.11009 - 0.0.0.0.0
          State: Operational; Msgs sent/rcvd: 29/60; Downstream
          Up time: 00:02:20
          LDP discovery sources:
            em2, Src IP addr: 10.1.6.3
bird> show ldp bindings
   lib entry: 10.2.0.0/30
       local binding:  label: 25
       remote binding:  lsr: 10.2.33.4:0, label: ImpNULL
       remote binding:  lsr: 10.2.33.3:0, label: 23
   lib entry: 10.1.6.0/24
       remote binding:  lsr: 10.2.33.3:0, label: ImpNULL
       remote binding:  lsr: 10.2.33.4:0, label: 25
   lib entry: 10.0.0.0/24
       remote binding:  lsr: 10.2.33.3:0, label: 19
       remote binding:  lsr: 10.2.33.4:0, label: 23
   lib entry: 10.2.0.2/32
       local binding:  label: 26
       remote binding:  lsr: 10.2.33.4:0, label: 16
       remote binding:  lsr: 10.2.33.3:0, label: 24
   lib entry: 10.1.4.0/24
       local binding:  label: 29
       remote binding:  lsr: 10.2.33.4:0, label: ImpNULL
       remote binding:  lsr: 10.2.33.3:0, label: ImpNULL
   lib entry: 10.1.5.0/24
       remote binding:  lsr: 10.2.33.4:0, label: ImpNULL
       remote binding:  lsr: 10.2.33.3:0, label: ImpNULL
   lib entry: 1.2.3.5/32
       remote binding:  lsr: 10.2.33.3:0, label: 20
       remote binding:  lsr: 10.2.33.4:0, label: 21
   lib entry: 10.1.33.0/24
       local binding:  label: 28
       remote binding:  lsr: 10.2.33.4:0, label: ImpNULL
       remote binding:  lsr: 10.2.33.3:0, label: ImpNULL
   lib entry: 10.2.33.3/32
       local binding:  label: 31
       remote binding:  lsr: 10.2.33.3:0, label: ImpNULL
   lib entry: 10.2.33.4/32
       local binding:  label: 27
       remote binding:  lsr: 10.2.33.4:0, label: ImpNULL
       remote binding:  lsr: 10.2.33.3:0, label: 25
   lib entry: 10.1.6.88/32
       remote binding:  lsr: 10.2.33.3:0, label: 18
       remote binding:  lsr: 10.2.33.4:0, label: 19
   lib entry: 10.0.0.88/32
       remote binding:  lsr: 10.2.33.4:0, label: 17
       remote binding:  lsr: 10.2.33.3:0, label: 16
   lib entry: 10.1.5.88/32
       remote binding:  lsr: 10.2.33.3:0, label: 21
       remote binding:  lsr: 10.2.33.4:0, label: 18
bird> show ldp forwardingtable
Local  Outgoing       Prefix             Bytes Label    Outgoing   Next Hop
Label  Label or VC    or Tunnel Id       Switched       interface
20     SWAP           10.2.0.0/30        0              ? 
10.1.5.4
21     SWAP           10.2.0.2/32        0              ? 
10.1.5.4
22     SWAP           10.2.33.4/32       0              ? 
10.1.5.4
23     SWAP           10.1.33.0/24       0              ? 
10.1.5.4
24     SWAP           10.1.4.0/24        0              ? 
10.1.5.4
25     SWAP           10.2.0.0/30        0              ? 
10.1.5.4
26     SWAP           10.2.0.2/32        0              ? 
10.1.5.4
27     SWAP           10.2.33.4/32       0              ? 
10.1.5.4
28     SWAP           10.1.33.0/24       0              ? 
10.1.5.4
29     SWAP           10.1.4.0/24        0              ? 
10.1.5.4
30     SWAP           10.2.33.3/32       0              ? 
10.1.6.3
31     SWAP           10.2.33.3/32       0              ? 
10.1.6.3

>