Merging bird and bird6

Tapio Haapala tapio.haapala at f-solutions.fi
Sun Jul 31 12:35:39 CEST 2011


It is sad to see how mutch effort there is for causing new problems and 
bugs buy trying merge chicken and dog to gether. In our system is over 
50 routers and over 500 routers via ospf. There is still buntch of bugs 
what cause un sync problems and domino effects and need to be fix. Two 
separated engine was purely good thing when we speak production 
networks. Actually on many large production networks ipv4 aind ipv6 
routers are on different machines for limiting problems.


31.7.2011 11:38, Alexander V. Chernikov kirjoitti:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Alexander V. Chernikov wrote:
>> On 22.07.2011 14:52, Ondrej Zajicek wrote:
>>> On Fri, Jul 22, 2011 at 01:47:14AM +0400, Alexander V. Chernikov wrote:
>>>>> Therefore there would be two types of routing tables - IP and MPLS. I
>>>>> don't think it is a good idea to mix these. This may look inconsistent
>>>>> with idea of embedding IPv4 to IPv6, but IP protocols are much more
>>>>> similar, have a natural way to embed one in the other, have similar
>>>>> roles and protocol structure. MPLS routing table could be used to LDP -
>>>>> kernel interaction (routes imported from LDP and exported to kernel).
>>>>> This solves your Case 2 without any hacks.
>>>> So, from user point of view, I define
>>>> table xxx; for both ipv4 and IPv6 routes and
>>>> mpls table yyy; for MPLS routing table?
>>> Yes.
> Patch permitting fibs to be used for any address family attached.
> It should be considered as PoC patch for review. It works for my setup,
> but I haven't tested it in production. netlink is not tested at all.
>
> Some notes:
> * fib has to have address type field (due to fib_get and other functions
> using pointer to fib, not rtable)
> * Due to address variable length we store it inside fib node this way:
>
> |--------------------|
> |  struct fib_node   |
> |  *addr         --------\
> |--------------------|   |
> |  some user data    |   |
> |                    |   |
> |--------------------|   |
> | address data<-------/
> |                    |
> |--------------------|
> * Since we've got pointer to address data instead of data (ip_addr)
> itself, all 9000 places with "%I/%d" needs to be changed, so more
> general fib_print and fib2_print functions are implemented
>
> * Several net_* calls were converted to fib_*
>
>
>
> Btw, some IPv4/IPv6 merging questions/thoughts:
> * show route will show complete mess for table with both v4 and v6
> routes. Some sorting or 'afi ipv4|ipv6' has to be implemented.
> * fill_in_sockaddr|get_sockaddr from io.c are somehow inconsequent:
> fill_* uses OS-dependent set_inaddr to fill actual address data but
> get_* uses direct calls to memcpy and ipa_ntoh instead of existing
> OS-dependent get_inaddr. Moreover, set_ and get_ implementations are the
> same for linux, bsd (and they should be the same for other UNIX-like
> systems AFAIR, at least for IPv4/IPv6)
>
>
>
>>>> There should be base MPLS rtable (mpls_default, for example) as in IP.
>>>> We can also add a hack for automatically subscribe protocols for MPLS
>>>> routing table by type and other attributes. For example, every LDP
>>>> instance gets connected to an MPLS table (default or defined in config).
>>>> Kernel protocol instance gets connected to MPLS table only if its IP
>>>> table is the default one (GRT) or 'mpls table' keyword is supplied
>>>> explicitely. What about VPNv4/VPNv6 ? The same approach?
>>> Perhaps even default MPLS table should be explicitly configured [*]
>>> (as i guess
>>> not many BIRD users would use MPLS). Protocols requiring MPLS table would
>>> fail if it is not configured, protocol with optional MPLS support
>>> (kernel,
>>> static?) just do not connect to MPLS in that case. The same approach
>>> for VPNvX table.
>>>
>>> [*] probably like: mpls table XXX default;
>> Maybe it's better to turn on "general" mpls support?
>> e.g. 'mpls support;' or just 'mpls;' instead of propagating some table
>> to be default?
>>>>    Btw, how we will distinguish inet/inet6 rtes? (I'm talking about
>>>> MP-BGP
>>>> / IPv4-mapped cases)
>>> I planned to use IPv4-mapped prefix (::ffff:0:0/96), which is used for
>>> similar purposes in IP stack. But this should not be checked directly
>>> in protocols, there should be some macros in lib/ipv6.h for that.
>>>
>>>>> [*] when i wrote that i thought that labels are distributed just by LDP
>>>>> and the purpose of label request is to propagate the label through LDP
>>>>> area. i didn't noticed that BGP/MPLS also distributes labels so they
>>>>> need to know assigned labels. So the idea would need some
>>>>> modifications.
>>>> Not sure this will work. Since t1 is an IP table cases when we need to
>>>> request specific label for:
>>>> * AToM
>>>> * RSVP-TE tunnels
>>>> will not work since there are no prefixes that can be mapped to such
>>>> request.
>>> You are probably right. I originally thought about some specific
>>> 'request table' (where requests coded as routes with specific AF),
>>> but perhaps there should be used some other mechanism / other protocol
>>> hook. But it should be generic enough (some bus, allows at least more
>>> 'producers' and perhaps more 'consumers').
>> Okay, i see this as follows:
>> New rtable hook, service_hook, with uint32_3 bitmask specifying request
>> classes we are responsible to:
>> /* Defined classes */
>> #define RCLASS_LABEL 0x01 /* MPLS label request */
>>
>> Some request function:
>> int
>> request_data(rtable *t, struct service_request *req, void **buf, size_t
>> *bufsize)
>>
>> struct service_request {
>>     uint32_t    request; /* Single request class set */
>>     uint32_t    subclass; /* Subclass specific for request */
>>     proto       *p; /* caller protocol */
>>     char        data[0]; /* request-specific data follows */
>> }
>>
>> function loops thru all registered hooks for given _class_ checking for
>> reply until SR_OK or SR_FAIL is returned. It is up to protocol hook to
>> check subclass.
>> #define SR_OK      0x01 /* Request successful */
>> #define SR_FAIL    0x02 /* Request failed */
>> #define SR_NEXT    0x03 /* Request skipped */
>> #define SR_UNAVAIL 0x04 /* No providers for this request */
>>
>> As a result, caller get SR_UNAVAIL in case of no providers were able to
>> serve request or SR_OK|SR_FAIL.
>>
>> caller can setup buffer itself and pass pointer to pointer to buffer and
>> pointer to buffer size to function, or request provider to allocate data
>> for him setting *buf to NULL and bufsize to 0
>>
>> struct service_reply { /* is returned in reply buffer */
>>    uint32_t    request;
>>    uint32_t    subclass;
>>    proto       *p; /* protocol, providing data */
>>    char        data[0]; /* request-specific data */
>> }
>>
>>
>>
>>>>> Internal LMAP table is examined, tracked IGP table is examined. If both
>>>>> are ready (for given prefix), appropriate encapsulating and MPLS routes
>>>>> are generated and propagated using rte_update(), otherwise nothing is
>>>>> generated and the previously generated route is withdrawn (rte_update()
>>>>> with NULL is called) (or perhaps an unreachable route is generated if
>>>>> LMAP is here but IGP route is missing). Simple and elegant.
>>>> .. and in case of label release we should remove label only and keep
>>>> original route
>>> Yes.
>>>
>>>>> There are some tricky parts of IGP tracking - it is problematic
>>>>> to use standard RA_OPTIMAL update for this purpose, because if
>>>>> generated encapsulating routes are imported to the same table,
>>>>> these probably became the optimal ones and IGP routes would be
>>>>> shaded. Solution would be to use RA_ANY, and ignore notifications
>>>>> containing encapsulating routes, similarly 'examining the tracked
>>>>> IGP table' means looking up the fib node and find the best route,
>>>>> ignoring encapsulating ones.
>>>>>
>>>>> For implementation of this behavior, there are two minor changes that
>>>>> needs to be done to the rt table code: First, currently accept_ra_types
>>>>> (RA_OPTIMAL/RA_ANY) is a property of a protocol, it needs to be a
>>>>> property of an announce hook (as LDP would have two hooks with
>>>>> RA_OPTIMAL and one hook with RA_ANY). Second, rte_announce() for
>>>>> both in rte_recalculate should be moved after the route list
>>>>> is updated/relinked.
>>>> Agreed. Distinguishing RA_OPTIMAL and RA_ANY in current code is not a
>>>> trivial task and requires internals understanding. Either announce type
>>>> should be passed to announce hook or new hook should be added for RA_ANY
>>>>    event. The latter is more appropriate IMHO since RA_ANY is used by
>>>> pipe
>>>> protocol only.
>>> I thought about that when i created RA_ANY and have chosen this approach.
>>> Probably best way is just to change rt_notify to have appropriate
>>> struct announce_hook as a second argument instead of struct rtable.
>>> struct announce_hook would contain RA_ANY/RA_OPTIMAL and possibly
>>> some protocol-specific data. As (probably) all protocols are in-tree,
>>> doing some wide but trivial changes is not a problem.
>>>
>>>> Kernel protocol should track RA_ANY protocol hooks
>>>> looking for update source (LDP / RSVP) and re-install appropriate
>>>> routes.
>>> I think kernel protocol should use RA_OPTIMAL as usual. This kind
>>> of RA_ANY usage is for protocols that export routes to the same
>>> table they listen (so 'source' routes would be shaded by their
>>> routes). These routes (LDP / RSVP) should have just highest
>>> priority.
>>>
>>>> The only downside is situation when LDP signalling starts faster
>>>> than IGP. In that case we will get 3 updates instead of one (at least in
>>>> RTSOCK):
>>>> * RTM_ADD for original prefix
>>>> * RTM_DEL for this prefix (as part of krt_set_notify())
>>>> * RTM_ADD for modified prefix
>>>>
>>>> RTM_CHANGE can be used in notify, but still: this gives 2 updates
>>>> instead of one.
>>> No, because RA_ANY is handled strictly before RA_OPTIMAL and routes
>>> are propagated synchronously depth-first:
>>>
>>> OSPF --RA_ANY-->   LDP
>>> LDP --RA_OPTIMAL-->   kernel
>>> OSPF --RA_OPTIMAL-->   kernel
>>>
>> Still I can't understand how exactly I can modify an announced IP route
>> (still, from FreeBSD kernel point of view encapsulated route is a usual
>> route with an attribute attached. From Linux point of view this should
>> be more or less the same since an IP route lookup have to be done for
>> incoming packet anyway and doing several different lookups is not a best
>> idea). I've got RA_ANY hook called for a new route (and I should know
>> that it is actually RA_OPTIMAL without some complex logic!), what I
>> should do next ?
>>
>>> But it is true that this is much dependent on internal implementation
>>> of route propagation. The first idea i had was to use separate
>>> tables for original and labeled routes (when just RA_OPTIMAL hooks),
>>> but that looks too cumbersome for users and ability to push a better
>>> route to the same (input) table has other possible usages.
>>>
>>>>> Therefore, it is probably a good idea to extend FIBs in a way you
>>>>> suggested, with minor details changed. FIB / rtables would be uniform
>>>>> (AF_ bound), but there are just three AFs (IP, MPLS, VPN) - IPv4 and
>>>>> IPv6
>>>>> could be handled as one AF, embedded, the same for VPNv4 and VPNv6). To
>>>>> minimize code changes, struct fib_node would have ip_addr prefix, but
>>>>> might be allocated larger.
>>>> Okay, so for IPv4+IPv6-enabled daemon we will allocate an ip_addr large
>>>> enough for holding IPv6 address? This can bump memory consumption for
>>>> setups with several full-views significantly.
>>> It increases memory consumtion, but not so much in a relative view - for
>>> each struct network there is at least one struct rte and in both of them
>>> there is just one ip_addr and both structures are nontrivial. So this
>>> relative increase would be about 1.15-1.2. Really big users would
>>> probably keep current splitted setting.
>> Okay, it's much easier from developer point of view. If you're not
>> afraid of your users :)
>>>>> Because each protocol and each its announce_hook have appropriate role,
>>>>> it is IMHO unnecessary to have AF in protocol hooks, but there
>>>>> should be
>>>>> check whether protocol/announce_hook is connected to appropriate
>>>>> rtable.
>>>>>
>>>> To summarize required changes (please correct me):
>>>> 1) Differentiate between RA_ANY and RA_OPTIMAL (new hook, possibly)
>>>> 2) Add 3 AFs (AF_IP, AF_MPLS, AF_VPN) to the following structures:
>>>> * rtable
>>>> * fib
>>>> * rte
>>>> 3) Add fib2_init with sizeof(AF object) supplied. Add appropriate field
>>>> to struct fib to hold this value.
>>>> 4) Move to memcmp() in fib_find / fib_get
>>>> 5) Set up default rtable for every supported AF. Connect protocol
>>>> instances to such default AFs based on protocol types
>>> 1a) other changes in rte_recalculate() related to propagation
>>> (clean up the table before calling RA_ANY hook).
>>>
>>> 1) and 1a) i will do myself and send you the patch, and also make
>>> some trivial example for exporting to the same table.
>>>
>>> 2) i am not sure if there is a reason to put explicit AF info
>>> to struct fib, AF compatibility could be handled on higher level
>>> (struct rtable in general, other direct users probably use just
>>> one AF).
>> No problem, I misinterpreted "FIB / rtables would be uniform (AF_
>> bound)" as "FIB / rtable needs AF infor in structure fields"
>>> 3) and hashing callback (and perhaps fib_route, but not sure if this is
>>> needed).
>>>
>>> 4) probably encapsulate that to some static inline key_equal() function.
>>>
>>> 5) see my related note above. Protocol binding to tables should check
>>> AFs.
>>>
>>> more:
>>>
>>> 6) RTD_MPLS in dest field, struct rta_mpls, as i wrote in the previous
>>> mail:
>>>
>>>>> i think encapsulation
>>>>> routes should be represented by routes with new destination type
>>>>> (RTD_MPLS in dest field of struct rta) and whole NHLFE should be stored
>>>>> in new struct rta_mpls (or rta_nhlfe), which would be extension of
>>>>> struct rta (containing struct rta in the first field and NHLFE after
>>>>> that). Such structure could be easily passed as struct rta and
>>>>> functions
>>>>> from rt-attr.c can work with that, with jome some minor modifications
>>>>> (allocating, freeing and printing) dispatched based on dest field.
>>>>> This rta could be used without changes also for MPLS routes.
>> I'll try to send you patches for all these as I see it in several days.
>>>
>>>> Most of this are more or less trivial changes not MPLS-bound (VPNv4/6
>>>> can be used in case of bird used as RR in MPLS network, for example).
>>>> Should I supply patches for these? What are your plans about commit
>>>> routemap ?
>>> I create GIT branch 'mpls' and would merge these patches to that branch
>>> soon. When we will have some major release, we could merge 'mpls' branch
>>> to master if there is some sufficient usage (i think that even just
>>> static and kernel protocol support for MPLS would be a good example
>>> usage). Other protocols (LDP, ...) probably should be merged when they
>>> are reasonable ready.
>> Will this branch available from official git repo ? It is not accessible
>> (from its web interface at least).
>>
>>
>> Btw, some bird/LDP "status" report:
>>
>> bird>  show ldp neighbour
>>      Peer LDP Ident: 10.2.33.4:0; Local LDP Ident 10.0.0.88:0
>>           TCP connection: 10.2.33.4.11212 - 0.0.0.0.0
>>           State: Operational; Msgs sent/rcvd: 21/61; Downstream
>>           Up time: 00:02:27
>>           LDP discovery sources:
>>             em1, Src IP addr: 10.1.5.4
>>      Peer LDP Ident: 10.2.33.3:0; Local LDP Ident 10.0.0.88:0
>>           TCP connection: 10.2.33.3.11009 - 0.0.0.0.0
>>           State: Operational; Msgs sent/rcvd: 29/60; Downstream
>>           Up time: 00:02:20
>>           LDP discovery sources:
>>             em2, Src IP addr: 10.1.6.3
>> bird>  show ldp bindings
>>    lib entry: 10.2.0.0/30
>>        local binding:  label: 25
>>        remote binding:  lsr: 10.2.33.4:0, label: ImpNULL
>>        remote binding:  lsr: 10.2.33.3:0, label: 23
>>    lib entry: 10.1.6.0/24
>>        remote binding:  lsr: 10.2.33.3:0, label: ImpNULL
>>        remote binding:  lsr: 10.2.33.4:0, label: 25
>>    lib entry: 10.0.0.0/24
>>        remote binding:  lsr: 10.2.33.3:0, label: 19
>>        remote binding:  lsr: 10.2.33.4:0, label: 23
>>    lib entry: 10.2.0.2/32
>>        local binding:  label: 26
>>        remote binding:  lsr: 10.2.33.4:0, label: 16
>>        remote binding:  lsr: 10.2.33.3:0, label: 24
>>    lib entry: 10.1.4.0/24
>>        local binding:  label: 29
>>        remote binding:  lsr: 10.2.33.4:0, label: ImpNULL
>>        remote binding:  lsr: 10.2.33.3:0, label: ImpNULL
>>    lib entry: 10.1.5.0/24
>>        remote binding:  lsr: 10.2.33.4:0, label: ImpNULL
>>        remote binding:  lsr: 10.2.33.3:0, label: ImpNULL
>>    lib entry: 1.2.3.5/32
>>        remote binding:  lsr: 10.2.33.3:0, label: 20
>>        remote binding:  lsr: 10.2.33.4:0, label: 21
>>    lib entry: 10.1.33.0/24
>>        local binding:  label: 28
>>        remote binding:  lsr: 10.2.33.4:0, label: ImpNULL
>>        remote binding:  lsr: 10.2.33.3:0, label: ImpNULL
>>    lib entry: 10.2.33.3/32
>>        local binding:  label: 31
>>        remote binding:  lsr: 10.2.33.3:0, label: ImpNULL
>>    lib entry: 10.2.33.4/32
>>        local binding:  label: 27
>>        remote binding:  lsr: 10.2.33.4:0, label: ImpNULL
>>        remote binding:  lsr: 10.2.33.3:0, label: 25
>>    lib entry: 10.1.6.88/32
>>        remote binding:  lsr: 10.2.33.3:0, label: 18
>>        remote binding:  lsr: 10.2.33.4:0, label: 19
>>    lib entry: 10.0.0.88/32
>>        remote binding:  lsr: 10.2.33.4:0, label: 17
>>        remote binding:  lsr: 10.2.33.3:0, label: 16
>>    lib entry: 10.1.5.88/32
>>        remote binding:  lsr: 10.2.33.3:0, label: 21
>>        remote binding:  lsr: 10.2.33.4:0, label: 18
>> bird>  show ldp forwardingtable
>> Local  Outgoing       Prefix             Bytes Label    Outgoing   Next Hop
>> Label  Label or VC    or Tunnel Id       Switched       interface
>> 20     SWAP           10.2.0.0/30        0              ? 10.1.5.4
>> 21     SWAP           10.2.0.2/32        0              ? 10.1.5.4
>> 22     SWAP           10.2.33.4/32       0              ? 10.1.5.4
>> 23     SWAP           10.1.33.0/24       0              ? 10.1.5.4
>> 24     SWAP           10.1.4.0/24        0              ? 10.1.5.4
>> 25     SWAP           10.2.0.0/30        0              ? 10.1.5.4
>> 26     SWAP           10.2.0.2/32        0              ? 10.1.5.4
>> 27     SWAP           10.2.33.4/32       0              ? 10.1.5.4
>> 28     SWAP           10.1.33.0/24       0              ? 10.1.5.4
>> 29     SWAP           10.1.4.0/24        0              ? 10.1.5.4
>> 30     SWAP           10.2.33.3/32       0              ? 10.1.6.3
>> 31     SWAP           10.2.33.3/32       0              ? 10.1.6.3
>>
>>
>>
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v2.0.14 (FreeBSD)
> Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/
>
> iEYEARECAAYFAk41FJEACgkQwcJ4iSZ1q2kZNwCfZHk19PuXn2esNZ/KrvXOir5v
> zTMAoKe78CsexI0pPJ4li50e8teBCcpa
> =yqPo
> -----END PGP SIGNATURE-----


-- 
Kaikki viestissä ilmoitetut summat ovat alvittomia, ellei toisin ole kyseisen summan yhteydessä ilmoitettu.

--
F-Solutions Oy

Tapio Haapala

PL 7, 90571 Oulu
GSM   040-0998371
Skype burner-
IRC   Burner at ircnet





More information about the Bird-users mailing list