Merging bird and bird6

Alexander V. Chernikov melifaro at ipfw.ru
Sun Jul 31 10:38:42 CEST 2011


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Alexander V. Chernikov wrote:
> On 22.07.2011 14:52, Ondrej Zajicek wrote:
>> On Fri, Jul 22, 2011 at 01:47:14AM +0400, Alexander V. Chernikov wrote:
>>>> Therefore there would be two types of routing tables - IP and MPLS. I
>>>> don't think it is a good idea to mix these. This may look inconsistent
>>>> with idea of embedding IPv4 to IPv6, but IP protocols are much more
>>>> similar, have a natural way to embed one in the other, have similar
>>>> roles and protocol structure. MPLS routing table could be used to LDP -
>>>> kernel interaction (routes imported from LDP and exported to kernel).
>>>> This solves your Case 2 without any hacks.
>>> So, from user point of view, I define
>>> table xxx; for both ipv4 and IPv6 routes and
>>> mpls table yyy; for MPLS routing table?
>>
>> Yes.

Patch permitting fibs to be used for any address family attached.
It should be considered as PoC patch for review. It works for my setup,
but I haven't tested it in production. netlink is not tested at all.

Some notes:
* fib has to have address type field (due to fib_get and other functions
using pointer to fib, not rtable)
* Due to address variable length we store it inside fib node this way:

|--------------------|
|  struct fib_node   |
|  *addr         --------\
|--------------------|   |
|  some user data    |   |
|                    |   |
|--------------------|   |
| address data   <-------/
|                    |
|--------------------|
* Since we've got pointer to address data instead of data (ip_addr)
itself, all 9000 places with "%I/%d" needs to be changed, so more
general fib_print and fib2_print functions are implemented

* Several net_* calls were converted to fib_*



Btw, some IPv4/IPv6 merging questions/thoughts:
* show route will show complete mess for table with both v4 and v6
routes. Some sorting or 'afi ipv4|ipv6' has to be implemented.
* fill_in_sockaddr|get_sockaddr from io.c are somehow inconsequent:
fill_* uses OS-dependent set_inaddr to fill actual address data but
get_* uses direct calls to memcpy and ipa_ntoh instead of existing
OS-dependent get_inaddr. Moreover, set_ and get_ implementations are the
same for linux, bsd (and they should be the same for other UNIX-like
systems AFAIR, at least for IPv4/IPv6)



>>
>>> There should be base MPLS rtable (mpls_default, for example) as in IP.
>>> We can also add a hack for automatically subscribe protocols for MPLS
>>> routing table by type and other attributes. For example, every LDP
>>> instance gets connected to an MPLS table (default or defined in config).
>>> Kernel protocol instance gets connected to MPLS table only if its IP
>>> table is the default one (GRT) or 'mpls table' keyword is supplied
>>> explicitely. What about VPNv4/VPNv6 ? The same approach?
>>
>> Perhaps even default MPLS table should be explicitly configured [*]
>> (as i guess
>> not many BIRD users would use MPLS). Protocols requiring MPLS table would
>> fail if it is not configured, protocol with optional MPLS support
>> (kernel,
>> static?) just do not connect to MPLS in that case. The same approach
>> for VPNvX table.
>>
>> [*] probably like: mpls table XXX default;
> Maybe it's better to turn on "general" mpls support?
> e.g. 'mpls support;' or just 'mpls;' instead of propagating some table
> to be default?
>>
>>>   Btw, how we will distinguish inet/inet6 rtes? (I'm talking about
>>> MP-BGP
>>> / IPv4-mapped cases)
>>
>> I planned to use IPv4-mapped prefix (::ffff:0:0/96), which is used for
>> similar purposes in IP stack. But this should not be checked directly
>> in protocols, there should be some macros in lib/ipv6.h for that.
>>
>>>> [*] when i wrote that i thought that labels are distributed just by LDP
>>>> and the purpose of label request is to propagate the label through LDP
>>>> area. i didn't noticed that BGP/MPLS also distributes labels so they
>>>> need to know assigned labels. So the idea would need some
>>>> modifications.
>>> Not sure this will work. Since t1 is an IP table cases when we need to
>>> request specific label for:
>>> * AToM
>>> * RSVP-TE tunnels
>>> will not work since there are no prefixes that can be mapped to such
>>> request.
>>
>> You are probably right. I originally thought about some specific
>> 'request table' (where requests coded as routes with specific AF),
>> but perhaps there should be used some other mechanism / other protocol
>> hook. But it should be generic enough (some bus, allows at least more
>> 'producers' and perhaps more 'consumers').
> Okay, i see this as follows:
> New rtable hook, service_hook, with uint32_3 bitmask specifying request
> classes we are responsible to:
> /* Defined classes */
> #define RCLASS_LABEL 0x01 /* MPLS label request */
> 
> Some request function:
> int
> request_data(rtable *t, struct service_request *req, void **buf, size_t
> *bufsize)
> 
> struct service_request {
>    uint32_t    request; /* Single request class set */
>    uint32_t    subclass; /* Subclass specific for request */
>    proto       *p; /* caller protocol */
>    char        data[0]; /* request-specific data follows */
> }
> 
> function loops thru all registered hooks for given _class_ checking for
> reply until SR_OK or SR_FAIL is returned. It is up to protocol hook to
> check subclass.
> #define SR_OK      0x01 /* Request successful */
> #define SR_FAIL    0x02 /* Request failed */
> #define SR_NEXT    0x03 /* Request skipped */
> #define SR_UNAVAIL 0x04 /* No providers for this request */
> 
> As a result, caller get SR_UNAVAIL in case of no providers were able to
> serve request or SR_OK|SR_FAIL.
> 
> caller can setup buffer itself and pass pointer to pointer to buffer and
> pointer to buffer size to function, or request provider to allocate data
> for him setting *buf to NULL and bufsize to 0
> 
> struct service_reply { /* is returned in reply buffer */
>   uint32_t    request;
>   uint32_t    subclass;
>   proto       *p; /* protocol, providing data */
>   char        data[0]; /* request-specific data */
> }
> 
> 
> 
>>
>>>> Internal LMAP table is examined, tracked IGP table is examined. If both
>>>> are ready (for given prefix), appropriate encapsulating and MPLS routes
>>>> are generated and propagated using rte_update(), otherwise nothing is
>>>> generated and the previously generated route is withdrawn (rte_update()
>>>> with NULL is called) (or perhaps an unreachable route is generated if
>>>> LMAP is here but IGP route is missing). Simple and elegant.
>>> .. and in case of label release we should remove label only and keep
>>> original route
>>
>> Yes.
>>
>>>> There are some tricky parts of IGP tracking - it is problematic
>>>> to use standard RA_OPTIMAL update for this purpose, because if
>>>> generated encapsulating routes are imported to the same table,
>>>> these probably became the optimal ones and IGP routes would be
>>>> shaded. Solution would be to use RA_ANY, and ignore notifications
>>>> containing encapsulating routes, similarly 'examining the tracked
>>>> IGP table' means looking up the fib node and find the best route,
>>>> ignoring encapsulating ones.
>>>>
>>>> For implementation of this behavior, there are two minor changes that
>>>> needs to be done to the rt table code: First, currently accept_ra_types
>>>> (RA_OPTIMAL/RA_ANY) is a property of a protocol, it needs to be a
>>>> property of an announce hook (as LDP would have two hooks with
>>>> RA_OPTIMAL and one hook with RA_ANY). Second, rte_announce() for
>>>> both in rte_recalculate should be moved after the route list
>>>> is updated/relinked.
>>
>>> Agreed. Distinguishing RA_OPTIMAL and RA_ANY in current code is not a
>>> trivial task and requires internals understanding. Either announce type
>>> should be passed to announce hook or new hook should be added for RA_ANY
>>>   event. The latter is more appropriate IMHO since RA_ANY is used by
>>> pipe
>>> protocol only.
>>
>> I thought about that when i created RA_ANY and have chosen this approach.
>> Probably best way is just to change rt_notify to have appropriate
>> struct announce_hook as a second argument instead of struct rtable.
>> struct announce_hook would contain RA_ANY/RA_OPTIMAL and possibly
>> some protocol-specific data. As (probably) all protocols are in-tree,
>> doing some wide but trivial changes is not a problem.
>>
>>> Kernel protocol should track RA_ANY protocol hooks
>>> looking for update source (LDP / RSVP) and re-install appropriate
>>> routes.
>>
>> I think kernel protocol should use RA_OPTIMAL as usual. This kind
>> of RA_ANY usage is for protocols that export routes to the same
>> table they listen (so 'source' routes would be shaded by their
>> routes). These routes (LDP / RSVP) should have just highest
>> priority.
>>
>>> The only downside is situation when LDP signalling starts faster
>>> than IGP. In that case we will get 3 updates instead of one (at least in
>>> RTSOCK):
>>> * RTM_ADD for original prefix
>>> * RTM_DEL for this prefix (as part of krt_set_notify())
>>> * RTM_ADD for modified prefix
>>>
>>> RTM_CHANGE can be used in notify, but still: this gives 2 updates
>>> instead of one.
>>
>> No, because RA_ANY is handled strictly before RA_OPTIMAL and routes
>> are propagated synchronously depth-first:
>>
>> OSPF --RA_ANY-->  LDP
>> LDP --RA_OPTIMAL-->  kernel
>> OSPF --RA_OPTIMAL-->  kernel
>>
> Still I can't understand how exactly I can modify an announced IP route
> (still, from FreeBSD kernel point of view encapsulated route is a usual
> route with an attribute attached. From Linux point of view this should
> be more or less the same since an IP route lookup have to be done for
> incoming packet anyway and doing several different lookups is not a best
> idea). I've got RA_ANY hook called for a new route (and I should know
> that it is actually RA_OPTIMAL without some complex logic!), what I
> should do next ?
> 
>> But it is true that this is much dependent on internal implementation
>> of route propagation. The first idea i had was to use separate
>> tables for original and labeled routes (when just RA_OPTIMAL hooks),
>> but that looks too cumbersome for users and ability to push a better
>> route to the same (input) table has other possible usages.
>>
>>>> Therefore, it is probably a good idea to extend FIBs in a way you
>>>> suggested, with minor details changed. FIB / rtables would be uniform
>>>> (AF_ bound), but there are just three AFs (IP, MPLS, VPN) - IPv4 and
>>>> IPv6
>>>> could be handled as one AF, embedded, the same for VPNv4 and VPNv6). To
>>>> minimize code changes, struct fib_node would have ip_addr prefix, but
>>>> might be allocated larger.
>>> Okay, so for IPv4+IPv6-enabled daemon we will allocate an ip_addr large
>>> enough for holding IPv6 address? This can bump memory consumption for
>>> setups with several full-views significantly.
>>
>> It increases memory consumtion, but not so much in a relative view - for
>> each struct network there is at least one struct rte and in both of them
>> there is just one ip_addr and both structures are nontrivial. So this
>> relative increase would be about 1.15-1.2. Really big users would
>> probably keep current splitted setting.
> Okay, it's much easier from developer point of view. If you're not
> afraid of your users :)
>>
>>>> Because each protocol and each its announce_hook have appropriate role,
>>>> it is IMHO unnecessary to have AF in protocol hooks, but there
>>>> should be
>>>> check whether protocol/announce_hook is connected to appropriate
>>>> rtable.
>>>>
>>>
>>> To summarize required changes (please correct me):
>>> 1) Differentiate between RA_ANY and RA_OPTIMAL (new hook, possibly)
>>> 2) Add 3 AFs (AF_IP, AF_MPLS, AF_VPN) to the following structures:
>>> * rtable
>>> * fib
>>> * rte
>>> 3) Add fib2_init with sizeof(AF object) supplied. Add appropriate field
>>> to struct fib to hold this value.
>>> 4) Move to memcmp() in fib_find / fib_get
>>> 5) Set up default rtable for every supported AF. Connect protocol
>>> instances to such default AFs based on protocol types
>>
>> 1a) other changes in rte_recalculate() related to propagation
>> (clean up the table before calling RA_ANY hook).
>>
>> 1) and 1a) i will do myself and send you the patch, and also make
>> some trivial example for exporting to the same table.
>>
>> 2) i am not sure if there is a reason to put explicit AF info
>> to struct fib, AF compatibility could be handled on higher level
>> (struct rtable in general, other direct users probably use just
>> one AF).
> No problem, I misinterpreted "FIB / rtables would be uniform (AF_
> bound)" as "FIB / rtable needs AF infor in structure fields"
>>
>> 3) and hashing callback (and perhaps fib_route, but not sure if this is
>> needed).
>>
>> 4) probably encapsulate that to some static inline key_equal() function.
>>
>> 5) see my related note above. Protocol binding to tables should check
>> AFs.
>>
>> more:
>>
>> 6) RTD_MPLS in dest field, struct rta_mpls, as i wrote in the previous
>> mail:
>>
>>>> i think encapsulation
>>>> routes should be represented by routes with new destination type
>>>> (RTD_MPLS in dest field of struct rta) and whole NHLFE should be stored
>>>> in new struct rta_mpls (or rta_nhlfe), which would be extension of
>>>> struct rta (containing struct rta in the first field and NHLFE after
>>>> that). Such structure could be easily passed as struct rta and
>>>> functions
>>>> from rt-attr.c can work with that, with jome some minor modifications
>>>> (allocating, freeing and printing) dispatched based on dest field.
>>
>>>> This rta could be used without changes also for MPLS routes.
> 
> I'll try to send you patches for all these as I see it in several days.
>>
>>
>>> Most of this are more or less trivial changes not MPLS-bound (VPNv4/6
>>> can be used in case of bird used as RR in MPLS network, for example).
>>> Should I supply patches for these? What are your plans about commit
>>> routemap ?
>>
>> I create GIT branch 'mpls' and would merge these patches to that branch
>> soon. When we will have some major release, we could merge 'mpls' branch
>> to master if there is some sufficient usage (i think that even just
>> static and kernel protocol support for MPLS would be a good example
>> usage). Other protocols (LDP, ...) probably should be merged when they
>> are reasonable ready.
> Will this branch available from official git repo ? It is not accessible
> (from its web interface at least).
> 
> 
> Btw, some bird/LDP "status" report:
> 
> bird> show ldp neighbour
>     Peer LDP Ident: 10.2.33.4:0; Local LDP Ident 10.0.0.88:0
>          TCP connection: 10.2.33.4.11212 - 0.0.0.0.0
>          State: Operational; Msgs sent/rcvd: 21/61; Downstream
>          Up time: 00:02:27
>          LDP discovery sources:
>            em1, Src IP addr: 10.1.5.4
>     Peer LDP Ident: 10.2.33.3:0; Local LDP Ident 10.0.0.88:0
>          TCP connection: 10.2.33.3.11009 - 0.0.0.0.0
>          State: Operational; Msgs sent/rcvd: 29/60; Downstream
>          Up time: 00:02:20
>          LDP discovery sources:
>            em2, Src IP addr: 10.1.6.3
> bird> show ldp bindings
>   lib entry: 10.2.0.0/30
>       local binding:  label: 25
>       remote binding:  lsr: 10.2.33.4:0, label: ImpNULL
>       remote binding:  lsr: 10.2.33.3:0, label: 23
>   lib entry: 10.1.6.0/24
>       remote binding:  lsr: 10.2.33.3:0, label: ImpNULL
>       remote binding:  lsr: 10.2.33.4:0, label: 25
>   lib entry: 10.0.0.0/24
>       remote binding:  lsr: 10.2.33.3:0, label: 19
>       remote binding:  lsr: 10.2.33.4:0, label: 23
>   lib entry: 10.2.0.2/32
>       local binding:  label: 26
>       remote binding:  lsr: 10.2.33.4:0, label: 16
>       remote binding:  lsr: 10.2.33.3:0, label: 24
>   lib entry: 10.1.4.0/24
>       local binding:  label: 29
>       remote binding:  lsr: 10.2.33.4:0, label: ImpNULL
>       remote binding:  lsr: 10.2.33.3:0, label: ImpNULL
>   lib entry: 10.1.5.0/24
>       remote binding:  lsr: 10.2.33.4:0, label: ImpNULL
>       remote binding:  lsr: 10.2.33.3:0, label: ImpNULL
>   lib entry: 1.2.3.5/32
>       remote binding:  lsr: 10.2.33.3:0, label: 20
>       remote binding:  lsr: 10.2.33.4:0, label: 21
>   lib entry: 10.1.33.0/24
>       local binding:  label: 28
>       remote binding:  lsr: 10.2.33.4:0, label: ImpNULL
>       remote binding:  lsr: 10.2.33.3:0, label: ImpNULL
>   lib entry: 10.2.33.3/32
>       local binding:  label: 31
>       remote binding:  lsr: 10.2.33.3:0, label: ImpNULL
>   lib entry: 10.2.33.4/32
>       local binding:  label: 27
>       remote binding:  lsr: 10.2.33.4:0, label: ImpNULL
>       remote binding:  lsr: 10.2.33.3:0, label: 25
>   lib entry: 10.1.6.88/32
>       remote binding:  lsr: 10.2.33.3:0, label: 18
>       remote binding:  lsr: 10.2.33.4:0, label: 19
>   lib entry: 10.0.0.88/32
>       remote binding:  lsr: 10.2.33.4:0, label: 17
>       remote binding:  lsr: 10.2.33.3:0, label: 16
>   lib entry: 10.1.5.88/32
>       remote binding:  lsr: 10.2.33.3:0, label: 21
>       remote binding:  lsr: 10.2.33.4:0, label: 18
> bird> show ldp forwardingtable
> Local  Outgoing       Prefix             Bytes Label    Outgoing   Next Hop
> Label  Label or VC    or Tunnel Id       Switched       interface
> 20     SWAP           10.2.0.0/30        0              ? 10.1.5.4
> 21     SWAP           10.2.0.2/32        0              ? 10.1.5.4
> 22     SWAP           10.2.33.4/32       0              ? 10.1.5.4
> 23     SWAP           10.1.33.0/24       0              ? 10.1.5.4
> 24     SWAP           10.1.4.0/24        0              ? 10.1.5.4
> 25     SWAP           10.2.0.0/30        0              ? 10.1.5.4
> 26     SWAP           10.2.0.2/32        0              ? 10.1.5.4
> 27     SWAP           10.2.33.4/32       0              ? 10.1.5.4
> 28     SWAP           10.1.33.0/24       0              ? 10.1.5.4
> 29     SWAP           10.1.4.0/24        0              ? 10.1.5.4
> 30     SWAP           10.2.33.3/32       0              ? 10.1.6.3
> 31     SWAP           10.2.33.3/32       0              ? 10.1.6.3
> 
> 
>>
> 
> 

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.14 (FreeBSD)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAk41FJEACgkQwcJ4iSZ1q2kZNwCfZHk19PuXn2esNZ/KrvXOir5v
zTMAoKe78CsexI0pPJ4li50e8teBCcpa
=yqPo
-----END PGP SIGNATURE-----
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: bird_rtables_20110731.diff
URL: <http://trubka.network.cz/pipermail/bird-users/attachments/20110731/84129826/attachment-0001.diff>


More information about the Bird-users mailing list