Next Previous Contents

2. Core

2.1 Forwarding Information Base

FIB is a data structure designed for storage of routes indexed by their network prefixes. It supports insertion, deletion, searching by prefix, `routing' (in CIDR sense, that is searching for a longest prefix matching a given IP address) and (which makes the structure very tricky to implement) asynchronous reading, that is enumerating the contents of a FIB while other modules add, modify or remove entries.

Internally, each FIB is represented as a collection of nodes of type fib_node indexed using a sophisticated hashing mechanism. We use two-stage hashing where we calculate a 16-bit primary hash key independent on hash table size and then we just divide the primary keys modulo table size to get a real hash key used for determining the bucket containing the node. The lists of nodes in each bucket are sorted according to the primary hash key, hence if we keep the total number of buckets to be a power of two, re-hashing of the structure keeps the relative order of the nodes.

To get the asynchronous reading consistent over node deletions, we need to keep a list of readers for each node. When a node gets deleted, its readers are automatically moved to the next node in the table.

Basic FIB operations are performed by functions defined by this module, enumerating of FIB contents is accomplished by using the FIB_WALK() macro or FIB_ITERATE_START() if you want to do it asynchronously.

For simple iteration just place the body of the loop between FIB_WALK() and FIB_WALK_END(). You can't modify the FIB during the iteration (you can modify data in the node, but not add or remove nodes).

If you need more freedom, you can use the FIB_ITERATE_*() group of macros. First, you initialize an iterator with FIB_ITERATE_INIT(). Then you can put the loop body in between FIB_ITERATE_START() and FIB_ITERATE_END(). In addition, the iteration can be suspended by calling FIB_ITERATE_PUT(). This'll link the iterator inside the FIB. While suspended, you may modify the FIB, exit the current function, etc. To resume the iteration, enter the loop again. You can use FIB_ITERATE_UNLINK() to unlink the iterator (while iteration is suspended) in cases like premature end of FIB iteration.

Note that the iterator must not be destroyed when the iteration is suspended, the FIB would then contain a pointer to invalid memory. Therefore, after each FIB_ITERATE_INIT() or FIB_ITERATE_PUT() there must be either FIB_ITERATE_START() or FIB_ITERATE_UNLINK() before the iterator is destroyed.


Function

void fib_init (struct fib * f, pool * p, uint addr_type, uint node_size, uint node_offset, uint hash_order, fib_init_fn init) -- initialize a new FIB

Arguments

struct fib * f

the FIB to be initialized (the structure itself being allocated by the caller)

pool * p

pool to allocate the nodes in

uint addr_type

-- undescribed --

uint node_size

node size to be used (each node consists of a standard header fib_node followed by user data)

uint node_offset

-- undescribed --

uint hash_order

initial hash order (a binary logarithm of hash table size), 0 to use default order (recommended)

fib_init_fn init

pointer a function to be called to initialize a newly created node

Description

This function initializes a newly allocated FIB and prepares it for use.


Function

void * fib_find (struct fib * f, const net_addr * a) -- search for FIB node by prefix

Arguments

struct fib * f

FIB to search in

const net_addr * a

-- undescribed --

Description

Search for a FIB node corresponding to the given prefix, return a pointer to it or NULL if no such node exists.


Function

void * fib_get (struct fib * f, const net_addr * a) -- find or create a FIB node

Arguments

struct fib * f

FIB to work with

const net_addr * a

-- undescribed --

Description

Search for a FIB node corresponding to the given prefix and return a pointer to it. If no such node exists, create it.


Function

void * fib_route (struct fib * f, const net_addr * n) -- CIDR routing lookup

Arguments

struct fib * f

FIB to search in

const net_addr * n

network address

Description

Search for a FIB node with longest prefix matching the given network, that is a node which a CIDR router would use for routing that network.


Function

void fib_delete (struct fib * f, void * E) -- delete a FIB node

Arguments

struct fib * f

FIB to delete from

void * E

entry to delete

Description

This function removes the given entry from the FIB, taking care of all the asynchronous readers by shifting them to the next node in the canonical reading order.


Function

void fib_free (struct fib * f) -- delete a FIB

Arguments

struct fib * f

FIB to be deleted

Description

This function deletes a FIB -- it frees all memory associated with it and all its entries.


Function

void fib_check (struct fib * f) -- audit a FIB

Arguments

struct fib * f

FIB to be checked

Description

This debugging function audits a FIB by checking its internal consistency. Use when you suspect somebody of corrupting innocent data structures.

2.2 Routing tables

Routing tables are probably the most important structures BIRD uses. They hold all the information about known networks, the associated routes and their attributes.

There are multiple routing tables (a primary one together with any number of secondary ones if requested by the configuration). Each table is basically a FIB containing entries describing the individual destination networks. For each network (represented by structure net), there is a one-way linked list of route entries (rte), the first entry on the list being the best one (i.e., the one we currently use for routing), the order of the other ones is undetermined.

The rte contains information about the route. There are net and src, which together forms a key identifying the route in a routing table. There is a pointer to a rta structure (see the route attribute module for a precise explanation) holding the route attributes, which are primary data about the route. There are several technical fields used by routing table code (route id, REF_* flags), There is also the pflags field, holding protocol-specific flags. They are not used by routing table code, but by protocol-specific hooks. In contrast to route attributes, they are not primary data and their validity is also limited to the routing table.

There are several mechanisms that allow automatic update of routes in one routing table (dst) as a result of changes in another routing table (src). They handle issues of recursive next hop resolving, flowspec validation and RPKI validation.

The first such mechanism is handling of recursive next hops. A route in the dst table has an indirect next hop address, which is resolved through a route in the src table (which may also be the same table) to get an immediate next hop. This is implemented using structure hostcache attached to the src table, which contains hostentry structures for each tracked next hop address. These structures are linked from recursive routes in dst tables, possibly multiple routes sharing one hostentry (as many routes may have the same indirect next hop). There is also a trie in the hostcache, which matches all prefixes that may influence resolving of tracked next hops.

When a best route changes in the src table, the hostcache is notified using rt_notify_hostcache(), which immediately checks using the trie whether the change is relevant and if it is, then it schedules asynchronous hostcache recomputation. The recomputation is done by rt_update_hostcache() (called from rt_event() of src table), it walks through all hostentries and resolves them (by rt_update_hostentry()). It also updates the trie. If a change in hostentry resolution was found, then it schedules asynchronous nexthop recomputation of associated dst table. That is done by rt_next_hop_update() (called from rt_event() of dst table), it iterates over all routes in the dst table and re-examines their hostentries for changes. Note that in contrast to hostcache update, next hop update can be interrupted by main loop. These two full-table walks (over hostcache and dst table) are necessary due to absence of direct lookups (route -> affected nexthop, nexthop -> its route).

The second mechanism is for flowspec validation, where validity of flowspec routes depends of resolving their network prefixes in IP routing tables. This is similar to the recursive next hop mechanism, but simpler as there are no intermediate hostcache and hostentries (because flows are less likely to share common net prefix than routes sharing a common next hop). In src table, there is a list of dst tables (list flowspec_links), this list is updated by flowpsec channels (by rt_flowspec_link() and rt_flowspec_unlink() during channel start/stop). Each dst table has its own trie of prefixes that may influence validation of flowspec routes in it (flowspec_trie).

When a best route changes in the src table, rt_flowspec_notify() immediately checks all dst tables from the list using their tries to see whether the change is relevant for them. If it is, then an asynchronous re-validation of flowspec routes in the dst table is scheduled. That is also done by function rt_next_hop_update(), like nexthop recomputation above. It iterates over all flowspec routes and re-validates them. It also recalculates the trie.

Note that in contrast to the hostcache update, here the trie is recalculated during the rt_next_hop_update(), which may be interleaved with IP route updates. The trie is flushed at the beginning of recalculation, which means that such updates may use partial trie to see if they are relevant. But it works anyway! Either affected flowspec was already re-validated and added to the trie, then IP route change would match the trie and trigger a next round of re-validation, or it was not yet re-validated and added to the trie, but will be re-validated later in this round anyway.

The third mechanism is used for RPKI re-validation of IP routes and it is the simplest. It is just a list of subscribers in src table, who are notified when any change happened, but only after a settle time. Also, in RPKI case the dst is not a table, but a channel, who refeeds routes through a filter.


Function

int net_roa_check (rtable * tab, const net_addr * n, u32 asn) -- check validity of route origination in a ROA table

Arguments

rtable * tab

ROA table

const net_addr * n

network prefix to check

u32 asn

AS number of network prefix

Description

Implements RFC 6483 route validation for the given network prefix. The procedure is to find all candidate ROAs - ROAs whose prefixes cover the given network prefix. If there is no candidate ROA, return ROA_UNKNOWN. If there is a candidate ROA with matching ASN and maxlen field greater than or equal to the given prefix length, return ROA_VALID. Otherwise, return ROA_INVALID. If caller cannot determine origin AS, 0 could be used (in that case ROA_VALID cannot happen). Table tab must have type NET_ROA4 or NET_ROA6, network n must have type NET_IP4 or NET_IP6, respectively.


Function

rte * rte_find (net * net, struct rte_src * src) -- find a route

Arguments

net * net

network node

struct rte_src * src

route source

Description

The rte_find() function returns a route for destination net which is from route source src.


Function

rte * rte_get_temp (rta * a, struct rte_src * src) -- get a temporary rte

Arguments

rta * a

attributes to assign to the new route (a rta; in case it's un-cached, rte_update() will create a cached copy automatically)

struct rte_src * src

route source

Description

Create a temporary rte and bind it with the attributes a.


Function

rte * rte_cow_rta (rte * r, linpool * lp) -- get a private writable copy of rte with writable rta

Arguments

rte * r

a route entry to be copied

linpool * lp

a linpool from which to allocate rta

Description

rte_cow_rta() takes a rte and prepares it and associated rta for modification. There are three possibilities: First, both rte and rta are private copies, in that case they are returned unchanged. Second, rte is private copy, but rta is cached, in that case rta is duplicated using rta_do_cow(). Third, both rte is shared and rta is cached, in that case both structures are duplicated by rte_do_cow() and rta_do_cow().

Note that in the second case, cached rta loses one reference, while private copy created by rta_do_cow() is a shallow copy sharing indirect data (eattrs, nexthops, ...) with it. To work properly, original shared rta should have another reference during the life of created private copy.

Result

a pointer to the new writable rte with writable rta.


Function

void rte_announce (rtable * tab, uint type, net * net, rte * new, rte * old, rte * new_best, rte * old_best) -- announce a routing table change

Arguments

rtable * tab

table the route has been added to

uint type

type of route announcement (RA_UNDEF or RA_ANY)

net * net

network in question

rte * new

the new or changed route

rte * old

the previous route replaced by the new one

rte * new_best

the new best route for the same network

rte * old_best

the previous best route for the same network

Description

This function gets a routing table update and announces it to all protocols that are connected to the same table by their channels.

There are two ways of how routing table changes are announced. First, there is a change of just one route in net (which may caused a change of the best route of the network). In this case new and old describes the changed route and new_best and old_best describes best routes. Other routes are not affected, but in sorted table the order of other routes might change.

Second, There is a bulk change of multiple routes in net, with shared best route selection. In such case separate route changes are described using type of RA_ANY, with new and old specifying the changed route, while new_best and old_best are NULL. After that, another notification is done where new_best and old_best are filled (may be the same), but new and old are NULL.

The function announces the change to all associated channels. For each channel, an appropriate preprocessing is done according to channel ra_mode. For example, RA_OPTIMAL channels receive just changes of best routes.

In general, we first call preexport() hook of a protocol, which performs basic checks on the route (each protocol has a right to veto or force accept of the route before any filter is asked). Then we consult an export filter of the channel and verify the old route in an export map of the channel. Finally, the rt_notify() hook of the protocol gets called.

Note that there are also calls of rt_notify() hooks due to feed, but that is done outside of scope of rte_announce().


Function

void rte_free (rte * e) -- delete a rte

Arguments

rte * e

rte to be deleted

Description

rte_free() deletes the given rte from the routing table it's linked to.


Function

void rte_update2 (struct channel * c, const net_addr * n, rte * new, struct rte_src * src) -- enter a new update to a routing table

Arguments

struct channel * c

channel doing the update

const net_addr * n

-- undescribed --

rte * new

a rte representing the new route or NULL for route removal.

struct rte_src * src

protocol originating the update

Description

This function is called by the routing protocols whenever they discover a new route or wish to update/remove an existing route. The right announcement sequence is to build route attributes first (either un-cached with aflags set to zero or a cached one using rta_lookup(); in this case please note that you need to increase the use count of the attributes yourself by calling rta_clone()), call rte_get_temp() to obtain a temporary rte, fill in all the appropriate data and finally submit the new rte by calling rte_update().

src specifies the protocol that originally created the route and the meaning of protocol-dependent data of new. If new is not NULL, src have to be the same value as new->attrs->proto. p specifies the protocol that called rte_update(). In most cases it is the same protocol as src. rte_update() stores p in new->sender;

When rte_update() gets any route, it automatically validates it (checks, whether the network and next hop address are valid IP addresses and also whether a normal routing protocol doesn't try to smuggle a host or link scope route to the table), converts all protocol dependent attributes stored in the rte to temporary extended attributes, consults import filters of the protocol to see if the route should be accepted and/or its attributes modified, stores the temporary attributes back to the rte.

Now, having a "public" version of the route, we automatically find any old route defined by the protocol src for network n, replace it by the new one (or removing it if new is NULL), recalculate the optimal route for this destination and finally broadcast the change (if any) to all routing protocols by calling rte_announce().

All memory used for attribute lists and other temporary allocations is taken from a special linear pool rte_update_pool and freed when rte_update() finishes.


Function

void rt_refresh_begin (rtable * t, struct channel * c) -- start a refresh cycle

Arguments

rtable * t

related routing table c related channel

struct channel * c

-- undescribed --

Description

This function starts a refresh cycle for given routing table and announce hook. The refresh cycle is a sequence where the protocol sends all its valid routes to the routing table (by rte_update()). After that, all protocol routes (more precisely routes with c as sender) not sent during the refresh cycle but still in the table from the past are pruned. This is implemented by marking all related routes as stale by REF_STALE flag in rt_refresh_begin(), then marking all related stale routes with REF_DISCARD flag in rt_refresh_end() and then removing such routes in the prune loop.


Function

void rt_refresh_end (rtable * t, struct channel * c) -- end a refresh cycle

Arguments

rtable * t

related routing table

struct channel * c

related channel

Description

This function ends a refresh cycle for given routing table and announce hook. See rt_refresh_begin() for description of refresh cycles.


Function

void rte_dump (rte * e) -- dump a route

Arguments

rte * e

rte to be dumped

Description

This functions dumps contents of a rte to debug output.


Function

void rt_dump (rtable * t) -- dump a routing table

Arguments

rtable * t

routing table to be dumped

Description

This function dumps contents of a given routing table to debug output.


Function

void rt_dump_all (void) -- dump all routing tables

Description

This function dumps contents of all routing tables to debug output.


Function

void rt_init (void) -- initialize routing tables

Description

This function is called during BIRD startup. It initializes the routing table module.


Function

void rt_prune_table (rtable * tab) -- prune a routing table

Arguments

rtable * tab

-- undescribed --

Description

The prune loop scans routing tables and removes routes belonging to flushing protocols, discarded routes and also stale network entries. It is called from rt_event(). The event is rescheduled if the current iteration do not finish the table. The pruning is directed by the prune state (prune_state), specifying whether the prune cycle is scheduled or running, and there is also a persistent pruning iterator (prune_fit).

The prune loop is used also for channel flushing. For this purpose, the channels to flush are marked before the iteration and notified after the iteration.


Function

struct f_trie * rt_lock_trie (rtable * tab) -- lock a prefix trie of a routing table

Arguments

rtable * tab

routing table with prefix trie to be locked

Description

The prune loop may rebuild the prefix trie and invalidate f_trie_walk_state structures. Therefore, asynchronous walks should lock the prefix trie using this function. That allows the prune loop to rebuild the trie, but postpones its freeing until all walks are done (unlocked by rt_unlock_trie()).

Return a current trie that will be locked, the value should be passed back to rt_unlock_trie() for unlocking.


Function

void rt_unlock_trie (rtable * tab, struct f_trie * trie) -- unlock a prefix trie of a routing table

Arguments

rtable * tab

routing table with prefix trie to be locked

struct f_trie * trie

value returned by matching rt_lock_trie()

Description

Done for trie locked by rt_lock_trie() after walk over the trie is done. It may free the trie and schedule next trie pruning.


Function

void rt_lock_table (rtable * r) -- lock a routing table

Arguments

rtable * r

routing table to be locked

Description

Lock a routing table, because it's in use by a protocol, preventing it from being freed when it gets undefined in a new configuration.


Function

void rt_unlock_table (rtable * r) -- unlock a routing table

Arguments

rtable * r

routing table to be unlocked

Description

Unlock a routing table formerly locked by rt_lock_table(), that is decrease its use count and delete it if it's scheduled for deletion by configuration changes.


Function

void rt_commit (struct config * new, struct config * old) -- commit new routing table configuration

Arguments

struct config * new

new configuration

struct config * old

original configuration or NULL if it's boot time config

Description

Scan differences between old and new configuration and modify the routing tables according to these changes. If new defines a previously unknown table, create it, if it omits a table existing in old, schedule it for deletion (it gets deleted when all protocols disconnect from it by calling rt_unlock_table()), if it exists in both configurations, leave it unchanged.


Function

int rt_feed_channel (struct channel * c) -- advertise all routes to a channel

Arguments

struct channel * c

channel to be fed

Description

This function performs one pass of advertisement of routes to a channel that is in the ES_FEEDING state. It is called by the protocol code as long as it has something to do. (We avoid transferring all the routes in single pass in order not to monopolize CPU time.)


Function

void rt_feed_channel_abort (struct channel * c) -- abort protocol feeding

Arguments

struct channel * c

channel

Description

This function is called by the protocol code when the protocol stops or ceases to exist during the feeding.


Function

net * net_find (rtable * tab, net_addr * addr) -- find a network entry

Arguments

rtable * tab

a routing table

net_addr * addr

address of the network

Description

net_find() looks up the given network in routing table tab and returns a pointer to its net entry or NULL if no such network exists.


Function

net * net_get (rtable * tab, net_addr * addr) -- obtain a network entry

Arguments

rtable * tab

a routing table

net_addr * addr

address of the network

Description

net_get() looks up the given network in routing table tab and returns a pointer to its net entry. If no such entry exists, it's created.


Function

rte * rte_cow (rte * r) -- copy a route for writing

Arguments

rte * r

a route entry to be copied

Description

rte_cow() takes a rte and prepares it for modification. The exact action taken depends on the flags of the rte -- if it's a temporary entry, it's just returned unchanged, else a new temporary entry with the same contents is created.

The primary use of this function is inside the filter machinery -- when a filter wants to modify rte contents (to change the preference or to attach another set of attributes), it must ensure that the rte is not shared with anyone else (and especially that it isn't stored in any routing table).

Result

a pointer to the new writable rte.

2.3 Route attribute cache

Each route entry carries a set of route attributes. Several of them vary from route to route, but most attributes are usually common for a large number of routes. To conserve memory, we've decided to store only the varying ones directly in the rte and hold the rest in a special structure called rta which is shared among all the rte's with these attributes.

Each rta contains all the static attributes of the route (i.e., those which are always present) as structure members and a list of dynamic attributes represented by a linked list of ea_list structures, each of them consisting of an array of eattr's containing the individual attributes. An attribute can be specified more than once in the ea_list chain and in such case the first occurrence overrides the others. This semantics is used especially when someone (for example a filter) wishes to alter values of several dynamic attributes, but it wants to preserve the original attribute lists maintained by another module.

Each eattr contains an attribute identifier (split to protocol ID and per-protocol attribute ID), protocol dependent flags, a type code (consisting of several bit fields describing attribute characteristics) and either an embedded 32-bit value or a pointer to a adata structure holding attribute contents.

There exist two variants of rta's -- cached and un-cached ones. Un-cached rta's can have arbitrarily complex structure of ea_list's and they can be modified by any module in the route processing chain. Cached rta's have their attribute lists normalized (that means at most one ea_list is present and its values are sorted in order to speed up searching), they are stored in a hash table to make fast lookup possible and they are provided with a use count to allow sharing.

Routing tables always contain only cached rta's.


Function

struct nexthop * nexthop_merge (struct nexthop * x, struct nexthop * y, int rx, int ry, int max, linpool * lp) -- merge nexthop lists

Arguments

struct nexthop * x

list 1

struct nexthop * y

list 2

int rx

reusability of list x

int ry

reusability of list y

int max

max number of nexthops

linpool * lp

linpool for allocating nexthops

Description

The nexthop_merge() function takes two nexthop lists x and y and merges them, eliminating possible duplicates. The input lists must be sorted and the result is sorted too. The number of nexthops in result is limited by max. New nodes are allocated from linpool lp.

The arguments rx and ry specify whether corresponding input lists may be consumed by the function (i.e. their nodes reused in the resulting list), in that case the caller should not access these lists after that. To eliminate issues with deallocation of these lists, the caller should use some form of bulk deallocation (e.g. stack or linpool) to free these nodes when the resulting list is no longer needed. When reusability is not set, the corresponding lists are not modified nor linked from the resulting list.


Function

eattr * ea_find (ea_list * e, unsigned id) -- find an extended attribute

Arguments

ea_list * e

attribute list to search in

unsigned id

attribute ID to search for

Description

Given an extended attribute list, ea_find() searches for a first occurrence of an attribute with specified ID, returning either a pointer to its eattr structure or NULL if no such attribute exists.


Function

eattr * ea_walk (struct ea_walk_state * s, uint id, uint max) -- walk through extended attributes

Arguments

struct ea_walk_state * s

walk state structure

uint id

start of attribute ID interval

uint max

length of attribute ID interval

Description

Given an extended attribute list, ea_walk() walks through the list looking for first occurrences of attributes with ID in specified interval from id to (id + max - 1), returning pointers to found eattr structures, storing its walk state in s for subsequent calls.

The function ea_walk() is supposed to be called in a loop, with initially zeroed walk state structure s with filled the initial extended attribute list, returning one found attribute in each call or NULL when no other attribute exists. The extended attribute list or the arguments should not be modified between calls. The maximum value of max is 128.


Function

uintptr_t ea_get_int (ea_list * e, unsigned id, uintptr_t def) -- fetch an integer attribute

Arguments

ea_list * e

attribute list

unsigned id

attribute ID

uintptr_t def

default value

Description

This function is a shortcut for retrieving a value of an integer attribute by calling ea_find() to find the attribute, extracting its value or returning a provided default if no such attribute is present.


Function

void ea_do_prune (ea_list * e)

Arguments

ea_list * e

-- undescribed --

Description

for this reason.


Function

void ea_sort (ea_list * e) -- sort an attribute list

Arguments

ea_list * e

list to be sorted

Description

This function takes a ea_list chain and sorts the attributes within each of its entries.

If an attribute occurs multiple times in a single ea_list, ea_sort() leaves only the first (the only significant) occurrence.


Function

unsigned ea_scan (ea_list * e) -- estimate attribute list size

Arguments

ea_list * e

attribute list

Description

This function calculates an upper bound of the size of a given ea_list after merging with ea_merge().


Function

void ea_merge (ea_list * e, ea_list * t) -- merge segments of an attribute list

Arguments

ea_list * e

attribute list

ea_list * t

buffer to store the result to

Description

This function takes a possibly multi-segment attribute list and merges all of its segments to one.

The primary use of this function is for ea_list normalization: first call ea_scan() to determine how much memory will the result take, then allocate a buffer (usually using alloca()), merge the segments with ea_merge() and finally sort and prune the result by calling ea_sort().


Function

int ea_same (ea_list * x, ea_list * y) -- compare two ea_list's

Arguments

ea_list * x

attribute list

ea_list * y

attribute list

Description

ea_same() compares two normalized attribute lists x and y and returns 1 if they contain the same attributes, 0 otherwise.


Function

void ea_show (struct cli * c, const eattr * e) -- print an eattr to CLI

Arguments

struct cli * c

destination CLI

const eattr * e

attribute to be printed

Description

This function takes an extended attribute represented by its eattr structure and prints it to the CLI according to the type information.

If the protocol defining the attribute provides its own get_attr() hook, it's consulted first.


Function

void ea_dump (ea_list * e) -- dump an extended attribute

Arguments

ea_list * e

attribute to be dumped

Description

ea_dump() dumps contents of the extended attribute given to the debug output.


Function

uint ea_hash (ea_list * e) -- calculate an ea_list hash key

Arguments

ea_list * e

attribute list

Description

ea_hash() takes an extended attribute list and calculated a hopefully uniformly distributed hash value from its contents.


Function

ea_list * ea_append (ea_list * to, ea_list * what) -- concatenate ea_list's

Arguments

ea_list * to

destination list (can be NULL)

ea_list * what

list to be appended (can be NULL)

Description

This function appends the ea_list what at the end of ea_list to and returns a pointer to the resulting list.


Function

rta * rta_lookup (rta * o) -- look up a rta in attribute cache

Arguments

rta * o

a un-cached rta

Description

rta_lookup() gets an un-cached rta structure and returns its cached counterpart. It starts with examining the attribute cache to see whether there exists a matching entry. If such an entry exists, it's returned and its use count is incremented, else a new entry is created with use count set to 1.

The extended attribute lists attached to the rta are automatically converted to the normalized form.


Function

void rta_dump (rta * a) -- dump route attributes

Arguments

rta * a

attribute structure to dump

Description

This function takes a rta and dumps its contents to the debug output.


Function

void rta_dump_all (void) -- dump attribute cache

Description

This function dumps the whole contents of route attribute cache to the debug output.


Function

void rta_init (void) -- initialize route attribute cache

Description

This function is called during initialization of the routing table module to set up the internals of the attribute cache.


Function

rta * rta_clone (rta * r) -- clone route attributes

Arguments

rta * r

a rta to be cloned

Description

rta_clone() takes a cached rta and returns its identical cached copy. Currently it works by just returning the original rta with its use count incremented.


Function

void rta_free (rta * r) -- free route attributes

Arguments

rta * r

a rta to be freed

Description

If you stop using a rta (for example when deleting a route which uses it), you need to call rta_free() to notify the attribute cache the attribute is no longer in use and can be freed if you were the last user (which rta_free() tests by inspecting the use count).

2.4 Routing protocols

Introduction

The routing protocols are the bird's heart and a fine amount of code is dedicated to their management and for providing support functions to them. (-: Actually, this is the reason why the directory with sources of the core code is called nest :-).

When talking about protocols, one need to distinguish between protocols and protocol instances. A protocol exists exactly once, not depending on whether it's configured or not and it can have an arbitrary number of instances corresponding to its "incarnations" requested by the configuration file. Each instance is completely autonomous, has its own configuration, its own status, its own set of routes and its own set of interfaces it works on.

A protocol is represented by a protocol structure containing all the basic information (protocol name, default settings and pointers to most of the protocol hooks). All these structures are linked in the protocol_list list.

Each instance has its own proto structure describing all its properties: protocol type, configuration, a resource pool where all resources belonging to the instance live, various protocol attributes (take a look at the declaration of proto in protocol.h), protocol states (see below for what do they mean), connections to routing tables, filters attached to the protocol and finally a set of pointers to the rest of protocol hooks (they are the same for all instances of the protocol, but in order to avoid extra indirections when calling the hooks from the fast path, they are stored directly in proto). The instance is always linked in both the global instance list (proto_list) and a per-status list (either active_proto_list for running protocols, initial_proto_list for protocols being initialized or flush_proto_list when the protocol is being shut down).

The protocol hooks are described in the next chapter, for more information about configuration of protocols, please refer to the configuration chapter and also to the description of the proto_commit function.

Protocol states

As startup and shutdown of each protocol are complex processes which can be affected by lots of external events (user's actions, reconfigurations, behavior of neighboring routers etc.), we have decided to supervise them by a pair of simple state machines -- the protocol state machine and a core state machine.

The protocol state machine corresponds to internal state of the protocol and the protocol can alter its state whenever it wants to. There are the following states:

PS_DOWN

The protocol is down and waits for being woken up by calling its start() hook.

PS_START

The protocol is waiting for connection with the rest of the network. It's active, it has resources allocated, but it still doesn't want any routes since it doesn't know what to do with them.

PS_UP

The protocol is up and running. It communicates with the core, delivers routes to tables and wants to hear announcement about route changes.

PS_STOP

The protocol has been shut down (either by being asked by the core code to do so or due to having encountered a protocol error).

Unless the protocol is in the PS_DOWN state, it can decide to change its state by calling the proto_notify_state function.

At any time, the core code can ask the protocol to shut itself down by calling its stop() hook.

Functions of the protocol module

The protocol module provides the following functions:


Function

struct channel * proto_find_channel_by_table (struct proto * p, struct rtable * t) -- find channel connected to a routing table

Arguments

struct proto * p

protocol instance

struct rtable * t

routing table

Description

Returns pointer to channel or NULL


Function

struct channel * proto_find_channel_by_name (struct proto * p, const char * n) -- find channel by its name

Arguments

struct proto * p

protocol instance

const char * n

channel name

Description

Returns pointer to channel or NULL


Function

struct channel * proto_add_channel (struct proto * p, struct channel_config * cf) -- connect protocol to a routing table

Arguments

struct proto * p

protocol instance

struct channel_config * cf

channel configuration

Description

This function creates a channel between the protocol instance p and the routing table specified in the configuration cf, making the protocol hear all changes in the table and allowing the protocol to update routes in the table.

The channel is linked in the protocol channel list and when active also in the table channel list. Channels are allocated from the global resource pool (proto_pool) and they are automatically freed when the protocol is removed.


Function

void channel_request_feeding (struct channel * c) -- request feeding routes to the channel

Arguments

struct channel * c

given channel

Description

Sometimes it is needed to send again all routes to the channel. This is called feeding and can be requested by this function. This would cause channel export state transition to ES_FEEDING (during feeding) and when completed, it will switch back to ES_READY. This function can be called even when feeding is already running, in that case it is restarted.


Function

void proto_setup_mpls_map (struct proto * p, uint rts, int hooks) -- automatically setup FEC map for protocol

Arguments

struct proto * p

affected protocol

uint rts

RTS_* value for generated MPLS routes

int hooks

whether to update rte_insert / rte_remove hooks

Description

Add, remove or reconfigure MPLS FEC map of the protocol p, depends on whether MPLS channel exists, and setup rte_insert / rte_remove hooks with default MPLS handlers. It is a convenience function supposed to be called from the protocol start and configure hooks, after reconfiguration of channels. For shutdown, use proto_shutdown_mpls_map(). If caller uses its own rte_insert / rte_remove hooks, it is possible to disable updating hooks and doing that manually.


Function

void proto_shutdown_mpls_map (struct proto * p, int hooks) -- automatically shutdown FEC map for protocol

Arguments

struct proto * p

affected protocol

int hooks

whether to update rte_insert / rte_remove hooks

Description

Remove MPLS FEC map of the protocol p during protocol shutdown.


Function

void * proto_new (struct proto_config * cf) -- create a new protocol instance

Arguments

struct proto_config * cf

-- undescribed --

Description

When a new configuration has been read in, the core code starts initializing all the protocol instances configured by calling their init() hooks with the corresponding instance configuration. The initialization code of the protocol is expected to create a new instance according to the configuration by calling this function and then modifying the default settings to values wanted by the protocol.


Function

void * proto_config_new (struct protocol * pr, int class) -- create a new protocol configuration

Arguments

struct protocol * pr

protocol the configuration will belong to

int class

SYM_PROTO or SYM_TEMPLATE

Description

Whenever the configuration file says that a new instance of a routing protocol should be created, the parser calls proto_config_new() to create a configuration entry for this instance (a structure staring with the proto_config header containing all the generic items followed by protocol-specific ones). Also, the configuration entry gets added to the list of protocol instances kept in the configuration.

The function is also used to create protocol templates (when class SYM_TEMPLATE is specified), the only difference is that templates are not added to the list of protocol instances and therefore not initialized during protos_commit()).


Function

void proto_copy_config (struct proto_config * dest, struct proto_config * src) -- copy a protocol configuration

Arguments

struct proto_config * dest

destination protocol configuration

struct proto_config * src

source protocol configuration

Description

Whenever a new instance of a routing protocol is created from the template, proto_copy_config() is called to copy a content of the source protocol configuration to the new protocol configuration. Name, class and a node in protos list of dest are kept intact. copy_config() protocol hook is used to copy protocol-specific data.


Function

void protos_preconfig (struct config * c) -- pre-configuration processing

Arguments

struct config * c

new configuration

Description

This function calls the preconfig() hooks of all routing protocols available to prepare them for reading of the new configuration.


Function

void protos_commit (struct config * new, struct config * old, int force_reconfig, int type) -- commit new protocol configuration

Arguments

struct config * new

new configuration

struct config * old

old configuration or NULL if it's boot time config

int force_reconfig

force restart of all protocols (used for example when the router ID changes)

int type

type of reconfiguration (RECONFIG_SOFT or RECONFIG_HARD)

Description

Scan differences between old and new configuration and adjust all protocol instances to conform to the new configuration.

When a protocol exists in the new configuration, but it doesn't in the original one, it's immediately started. When a collision with the other running protocol would arise, the new protocol will be temporarily stopped by the locking mechanism.

When a protocol exists in the old configuration, but it doesn't in the new one, it's shut down and deleted after the shutdown completes.

When a protocol exists in both configurations, the core decides whether it's possible to reconfigure it dynamically - it checks all the core properties of the protocol (changes in filters are ignored if type is RECONFIG_SOFT) and if they match, it asks the reconfigure() hook of the protocol to see if the protocol is able to switch to the new configuration. If it isn't possible, the protocol is shut down and a new instance is started with the new configuration after the shutdown is completed.

2.5 Graceful restart recovery

Graceful restart of a router is a process when the routing plane (e.g. BIRD) restarts but both the forwarding plane (e.g kernel routing table) and routing neighbors keep proper routes, and therefore uninterrupted packet forwarding is maintained.

BIRD implements graceful restart recovery by deferring export of routes to protocols until routing tables are refilled with the expected content. After start, protocols generate routes as usual, but routes are not propagated to them, until protocols report that they generated all routes. After that, graceful restart recovery is finished and the export (and the initial feed) to protocols is enabled.

When graceful restart recovery need is detected during initialization, then enabled protocols are marked with gr_recovery flag before start. Such protocols then decide how to proceed with graceful restart, participation is voluntary. Protocols could lock the recovery for each channel by function channel_graceful_restart_lock() (state stored in gr_lock flag), which means that they want to postpone the end of the recovery until they converge and then unlock it. They also could set gr_wait before advancing to PS_UP, which means that the core should defer route export to that channel until the end of the recovery. This should be done by protocols that expect their neigbors to keep the proper routes (kernel table, BGP sessions with BGP graceful restart capability).

The graceful restart recovery is finished when either all graceful restart locks are unlocked or when graceful restart wait timer fires.


Function

void graceful_restart_recovery (void) -- request initial graceful restart recovery

Graceful restart recovery

Called by the platform initialization code if the need for recovery after graceful restart is detected during boot. Have to be called before protos_commit().


Function

void graceful_restart_init (void) -- initialize graceful restart

Description

When graceful restart recovery was requested, the function starts an active phase of the recovery and initializes graceful restart wait timer. The function have to be called after protos_commit().


Function

void graceful_restart_done (timer *t UNUSED) -- finalize graceful restart

Arguments

timer *t UNUSED

-- undescribed --

Description

When there are no locks on graceful restart, the functions finalizes the graceful restart recovery. Protocols postponing route export until the end of the recovery are awakened and the export to them is enabled. All other related state is cleared. The function is also called when the graceful restart wait timer fires (but there are still some locks).


Function

void channel_graceful_restart_lock (struct channel * c) -- lock graceful restart by channel

Arguments

struct channel * c

-- undescribed --

Description

This function allows a protocol to postpone the end of graceful restart recovery until it converges. The lock is removed when the protocol calls channel_graceful_restart_unlock() or when the channel is closed.

The function have to be called during the initial phase of graceful restart recovery and only for protocols that are part of graceful restart (i.e. their gr_recovery is set), which means it should be called from protocol start hooks.


Function

void channel_graceful_restart_unlock (struct channel * c) -- unlock graceful restart by channel

Arguments

struct channel * c

-- undescribed --

Description

This function unlocks a lock from channel_graceful_restart_lock(). It is also automatically called when the lock holding protocol went down.


Function

void protos_dump_all (void) -- dump status of all protocols

Description

This function dumps status of all existing protocol instances to the debug output. It involves printing of general status information such as protocol states, its position on the protocol lists and also calling of a dump() hook of the protocol to print the internals.


Function

void proto_build (struct protocol * p) -- make a single protocol available

Arguments

struct protocol * p

the protocol

Description

After the platform specific initialization code uses protos_build() to add all the standard protocols, it should call proto_build() for all platform specific protocols to inform the core that they exist.


Function

void protos_build (void) -- build a protocol list

Description

This function is called during BIRD startup to insert all standard protocols to the global protocol list. Insertion of platform specific protocols (such as the kernel syncer) is in the domain of competence of the platform dependent startup code.


Function

void proto_set_message (struct proto * p, char * msg, int len) -- set administrative message to protocol

Arguments

struct proto * p

protocol

char * msg

message

int len

message length (-1 for NULL-terminated string)

Description

The function sets administrative message (string) related to protocol state change. It is called by the nest code for manual enable/disable/restart commands all routes to the protocol, and by protocol-specific code when the protocol state change is initiated by the protocol. Using NULL message clears the last message. The message string may be either NULL-terminated or with an explicit length.


Function

void channel_notify_limit (struct channel * c, struct channel_limit * l, int dir, u32 rt_count)

Arguments

struct channel * c

channel

struct channel_limit * l

limit being hit

int dir

limit direction (PLD_*)

u32 rt_count

the number of routes

Description

The function is called by the route processing core when limit l is breached. It activates the limit and tooks appropriate action according to l->action.


Function

void proto_notify_state (struct proto * p, uint state) -- notify core about protocol state change

Arguments

struct proto * p

protocol the state of which has changed

uint state

-- undescribed --

Description

Whenever a state of a protocol changes due to some event internal to the protocol (i.e., not inside a start() or shutdown() hook), it should immediately notify the core about the change by calling proto_notify_state() which will write the new state to the proto structure and take all the actions necessary to adapt to the new state. State change to PS_DOWN immediately frees resources of protocol and might execute start callback of protocol; therefore, it should be used at tail positions of protocol callbacks.

2.6 Protocol hooks

Each protocol can provide a rich set of hook functions referred to by pointers in either the proto or protocol structure. They are called by the core whenever it wants the protocol to perform some action or to notify the protocol about any change of its environment. All of the hooks can be set to NULL which means to ignore the change or to take a default action.


Function

void preconfig (struct protocol * p, struct config * c) -- protocol preconfiguration

Arguments

struct protocol * p

a routing protocol

struct config * c

new configuration

Description

The preconfig() hook is called before parsing of a new configuration.


Function

void postconfig (struct proto_config * c) -- instance post-configuration

Arguments

struct proto_config * c

instance configuration

Description

The postconfig() hook is called for each configured instance after parsing of the new configuration is finished.


Function

struct proto * init (struct proto_config * c) -- initialize an instance

Arguments

struct proto_config * c

instance configuration

Description

The init() hook is called by the core to create a protocol instance according to supplied protocol configuration.

Result

a pointer to the instance created


Function

int reconfigure (struct proto * p, struct proto_config * c) -- request instance reconfiguration

Arguments

struct proto * p

an instance

struct proto_config * c

new configuration

Description

The core calls the reconfigure() hook whenever it wants to ask the protocol for switching to a new configuration. If the reconfiguration is possible, the hook returns 1. Otherwise, it returns 0 and the core will shut down the instance and start a new one with the new configuration.

After the protocol confirms reconfiguration, it must no longer keep any references to the old configuration since the memory it's stored in can be re-used at any time.


Function

void dump (struct proto * p) -- dump protocol state

Arguments

struct proto * p

an instance

Description

This hook dumps the complete state of the instance to the debug output.


Function

int start (struct proto * p) -- request instance startup

Arguments

struct proto * p

protocol instance

Description

The start() hook is called by the core when it wishes to start the instance. Multitable protocols should lock their tables here.

Result

new protocol state


Function

int shutdown (struct proto * p) -- request instance shutdown

Arguments

struct proto * p

protocol instance

Description

The stop() hook is called by the core when it wishes to shut the instance down for some reason.

Returns

new protocol state


Function

void cleanup (struct proto * p) -- request instance cleanup

Arguments

struct proto * p

protocol instance

Description

The cleanup() hook is called by the core when the protocol became hungry/down, i.e. all protocol ahooks and routes are flushed. Multitable protocols should unlock their tables here.


Function

void get_status (struct proto * p, byte * buf) -- get instance status

Arguments

struct proto * p

protocol instance

byte * buf

buffer to be filled with the status string

Description

This hook is called by the core if it wishes to obtain an brief one-line user friendly representation of the status of the instance to be printed by the <cf/show protocols/ command.


Function

void get_route_info (rte * e, byte * buf, ea_list * attrs) -- get route information

Arguments

rte * e

a route entry

byte * buf

buffer to be filled with the resulting string

ea_list * attrs

extended attributes of the route

Description

This hook is called to fill the buffer buf with a brief user friendly representation of metrics of a route belonging to this protocol.


Function

int get_attr (eattr * a, byte * buf, int buflen) -- get attribute information

Arguments

eattr * a

an extended attribute

byte * buf

buffer to be filled with attribute information

int buflen

a length of the buf parameter

Description

The get_attr() hook is called by the core to obtain a user friendly representation of an extended route attribute. It can either leave the whole conversion to the core (by returning GA_UNKNOWN), fill in only attribute name (and let the core format the attribute value automatically according to the type field; by returning GA_NAME) or doing the whole conversion (used in case the value requires extra care; return GA_FULL).


Function

void if_notify (struct proto * p, unsigned flags, struct iface * i) -- notify instance about interface changes

Arguments

struct proto * p

protocol instance

unsigned flags

interface change flags

struct iface * i

the interface in question

Description

This hook is called whenever any network interface changes its status. The change is described by a combination of status bits (IF_CHANGE_xxx) in the flags parameter.


Function

void ifa_notify (struct proto * p, unsigned flags, struct ifa * a) -- notify instance about interface address changes

Arguments

struct proto * p

protocol instance

unsigned flags

address change flags

struct ifa * a

the interface address

Description

This hook is called to notify the protocol instance about an interface acquiring or losing one of its addresses. The change is described by a combination of status bits (IF_CHANGE_xxx) in the flags parameter.


Function

void rt_notify (struct proto * p, net * net, rte * new, rte * old, ea_list * attrs) -- notify instance about routing table change

Arguments

struct proto * p

protocol instance

net * net

a network entry

rte * new

new route for the network

rte * old

old route for the network

ea_list * attrs

extended attributes associated with the new entry

Description

The rt_notify() hook is called to inform the protocol instance about changes in the connected routing table table, that is a route old belonging to network net being replaced by a new route new with extended attributes attrs. Either new or old or both can be NULL if the corresponding route doesn't exist.

If the type of route announcement is RA_OPTIMAL, it is an announcement of optimal route change, new stores the new optimal route and old stores the old optimal route.

If the type of route announcement is RA_ANY, it is an announcement of any route change, new stores the new route and old stores the old route from the same protocol.

p->accept_ra_types specifies which kind of route announcements protocol wants to receive.


Function

void neigh_notify (neighbor * neigh) -- notify instance about neighbor status change

Arguments

neighbor * neigh

a neighbor cache entry

Description

The neigh_notify() hook is called by the neighbor cache whenever a neighbor changes its state, that is it gets disconnected or a sticky neighbor gets connected.


Function

int preexport (struct proto * p, rte ** e, ea_list ** attrs, struct linpool * pool) -- pre-filtering decisions before route export

Arguments

struct proto * p

protocol instance the route is going to be exported to

rte ** e

the route in question

ea_list ** attrs

extended attributes of the route

struct linpool * pool

linear pool for allocation of all temporary data

Description

The preexport() hook is called as the first step of a exporting a route from a routing table to the protocol instance. It can modify route attributes and force acceptance or rejection of the route before the user-specified filters are run. See rte_announce() for a complete description of the route distribution process.

The standard use of this hook is to reject routes having originated from the same instance and to set default values of the protocol's metrics.

Result

1 if the route has to be accepted, -1 if rejected and 0 if it should be passed to the filters.


Function

int rte_recalculate (struct rtable * table, struct network * net, struct rte * new, struct rte * old, struct rte * old_best) -- prepare routes for comparison

Arguments

struct rtable * table

a routing table

struct network * net

a network entry

struct rte * new

new route for the network

struct rte * old

old route for the network

struct rte * old_best

old best route for the network (may be NULL)

Description

This hook is called when a route change (from old to new for a net entry) is propagated to a table. It may be used to prepare routes for comparison by rte_better() in the best route selection. new may or may not be in net->routes list, old is not there.

Result

1 if the ordering implied by rte_better() changes enough that full best route calculation have to be done, 0 otherwise.


Function

int rte_better (rte * new, rte * old) -- compare metrics of two routes

Arguments

rte * new

the new route

rte * old

the original route

Description

This hook gets called when the routing table contains two routes for the same network which have originated from different instances of a single protocol and it wants to select which one is preferred over the other one. Protocols usually decide according to route metrics.

Result

1 if new is better (more preferred) than old, 0 otherwise.


Function

int rte_same (rte * e1, rte * e2) -- compare two routes

Arguments

rte * e1

route

rte * e2

route

Description

The rte_same() hook tests whether the routes e1 and e2 belonging to the same protocol instance have identical contents. Contents of rta, all the extended attributes and rte preference are checked by the core code, no need to take care of them here.

Result

1 if e1 is identical to e2, 0 otherwise.


Function

void rte_insert (net * n, rte * e) -- notify instance about route insertion

Arguments

net * n

network

rte * e

route

Description

This hook is called whenever a rte belonging to the instance is accepted for insertion to a routing table.

Please avoid using this function in new protocols.


Function

void rte_remove (net * n, rte * e) -- notify instance about route removal

Arguments

net * n

network

rte * e

route

Description

This hook is called whenever a rte belonging to the instance is removed from a routing table.

Please avoid using this function in new protocols.

2.7 Interfaces

The interface module keeps track of all network interfaces in the system and their addresses.

Each interface is represented by an iface structure which carries interface capability flags (IF_MULTIACCESS, IF_BROADCAST etc.), MTU, interface name and index and finally a linked list of network prefixes assigned to the interface, each one represented by struct ifa.

The interface module keeps a `soft-up' state for each iface which is a conjunction of link being up, the interface being of a `sane' type and at least one IP address assigned to it.


Function

void ifa_dump (struct ifa * a) -- dump interface address

Arguments

struct ifa * a

interface address descriptor

Description

This function dumps contents of an ifa to the debug output.


Function

void if_dump (struct iface * i) -- dump interface

Arguments

struct iface * i

interface to dump

Description

This function dumps all information associated with a given network interface to the debug output.


Function

void if_dump_all (void) -- dump all interfaces

Description

This function dumps information about all known network interfaces to the debug output.


Function

void if_delete (struct iface * old) -- remove interface

Arguments

struct iface * old

interface

Description

This function is called by the low-level platform dependent code whenever it notices an interface disappears. It is just a shorthand for if_update().


Function

struct iface * if_update (struct iface * new) -- update interface status

Arguments

struct iface * new

new interface status

Description

if_update() is called by the low-level platform dependent code whenever it notices an interface change.

There exist two types of interface updates -- synchronous and asynchronous ones. In the synchronous case, the low-level code calls if_start_update(), scans all interfaces reported by the OS, uses if_update() and ifa_update() to pass them to the core and then it finishes the update sequence by calling if_end_update(). When working asynchronously, the sysdep code calls if_update() and ifa_update() whenever it notices a change.

if_update() will automatically notify all other modules about the change.


Function

void if_feed_baby (struct proto * p) -- advertise interfaces to a new protocol

Arguments

struct proto * p

protocol to feed

Description

When a new protocol starts, this function sends it a series of notifications about all existing interfaces.


Function

struct iface * if_find_by_index (unsigned idx) -- find interface by ifindex

Arguments

unsigned idx

ifindex

Description

This function finds an iface structure corresponding to an interface of the given index idx. Returns a pointer to the structure or NULL if no such structure exists.


Function

struct iface * if_find_by_name (const char * name) -- find interface by name

Arguments

const char * name

interface name

Description

This function finds an iface structure corresponding to an interface of the given name name. Returns a pointer to the structure or NULL if no such structure exists.


Function

struct ifa * ifa_update (struct ifa * a) -- update interface address

Arguments

struct ifa * a

new interface address

Description

This function adds address information to a network interface. It's called by the platform dependent code during the interface update process described under if_update().


Function

void ifa_delete (struct ifa * a) -- remove interface address

Arguments

struct ifa * a

interface address

Description

This function removes address information from a network interface. It's called by the platform dependent code during the interface update process described under if_update().


Function

void if_init (void) -- initialize interface module

Description

This function is called during BIRD startup to initialize all data structures of the interface module.

2.8 MPLS

The MPLS subsystem manages MPLS labels and handles their allocation to MPLS-aware routing protocols. These labels are then attached to IP or VPN routes representing label switched paths -- LSPs. MPLS labels are also used in special MPLS routes (which use labels as network address) that are exported to MPLS routing table in kernel. The MPLS subsystem consists of MPLS domains (struct mpls_domain), MPLS channels (struct mpls_channel) and FEC maps (struct mpls_fec_map).

The MPLS domain represents one MPLS label address space, implements the label allocator, and handles associated configuration and management. The domain is declared in the configuration (struct mpls_domain_config). There might be multiple MPLS domains representing separate label spaces, but in most cases one domain is enough. MPLS-aware protocols and routing tables are associated with a specific MPLS domain.

The MPLS domain has configurable label ranges (struct mpls_range), by default it has two ranges: static (16-1000) and dynamic (1000-10000). When a protocol wants to allocate labels, it first acquires a handle (struct mpls_handle) for a specific range using mpls_new_handle(), and then it allocates labels from that with mpls_new_label(). When not needed, labels are freed by mpls_free_label() and the handle is released by mpls_free_handle(). Note that all labels and handles must be freed manually.

Both MPLS domain and MPLS range are reference counted, so when deconfigured they could be freed just after all labels and ranges are freed. Users are expected to hold a reference to a MPLS domain for whole time they use something from that domain (e.g. mpls_handle), but releasing reference to a range while holding associated handle is OK.

The MPLS channel is subclass of a generic protocol channel. It has two distinct purposes - to handle per-protocol MPLS configuration (e.g. which MPLS domain is associated with the protocol, which label range is used by the protocol), and to announce MPLS routes to a routing table (as a regular protocol channel).

The FEC map is a helper structure that maps forwarding equivalent classes (FECs) to MPLS labels. It is an internal matter of a routing protocol how to assign meaning to allocated labels, announce LSP routes and associated MPLS routes (i.e. ILM entries). But the common behavior is implemented in the FEC map, which can be used by the protocols that work with IP-prefix-based FECs.

The FEC map keeps hash tables of FECs (struct mpls_fec) based on network prefix, next hop eattr and assigned label. It has three general labeling policies: static assignment (MPLS_POLICY_STATIC), per-prefix policy (MPLS_POLICY_PREFIX), and aggregating policy (MPLS_POLICY_AGGREGATE). In per-prefix policy, each distinct LSP is a separate FEC and uses a separate label, which is kept even if the next hop of the LSP changes. In aggregating policy, LSPs with a same next hop form one FEC and use one label, but when a next hop (or remote label) of such LSP changes then the LSP must be moved to a different FEC and assigned a different label. There is also a special VRF policy (MPLS_POLICY_VRF) applicable for L3VPN protocols, which uses one label for all routes from a VRF, while replacing the original next hop with lookup in the VRF.

The overall process works this way: A protocol wants to announce a LSP route, it does that by announcing e.g. IP route with EA_MPLS_POLICY attribute. After the route is accepted by filters (which may also change the policy attribute or set a static label), the mpls_handle_rte() is called from rte_update2(), which applies selected labeling policy, finds existing FEC or creates a new FEC (which includes allocating new label and announcing related MPLS route by mpls_announce_fec()), and attach FEC label to the LSP route. After that, the LSP route is stored in routing table by rte_recalculate(). Changes in routing tables trigger mpls_rte_insert() and mpls_rte_remove() hooks, which refcount FEC structures and possibly trigger removal of FECs and withdrawal of MPLS routes.

TODO: - special handling of reserved labels

2.9 Neighbor cache

Most routing protocols need to associate their internal state data with neighboring routers, check whether an address given as the next hop attribute of a route is really an address of a directly connected host and which interface is it connected through. Also, they often need to be notified when a neighbor ceases to exist or when their long awaited neighbor becomes connected. The neighbor cache is there to solve all these problems.

The neighbor cache maintains a collection of neighbor entries. Each entry represents one IP address corresponding to either our directly connected neighbor or our own end of the link (when the scope of the address is set to SCOPE_HOST) together with per-neighbor data belonging to a single protocol. A neighbor entry may be bound to a specific interface, which is required for link-local IP addresses and optional for global IP addresses.

Neighbor cache entries are stored in a hash table, which is indexed by triple (protocol, IP, requested-iface), so if both regular and iface-bound neighbors are requested, they are represented by two neighbor cache entries. Active entries are also linked in per-interface list (allowing quick processing of interface change events). Inactive entries exist only when the protocol has explicitly requested it via the NEF_STICKY flag because it wishes to be notified when the node will again become a neighbor. Such entries are instead linked in a special list, which is walked whenever an interface changes its state to up. Neighbor entry VRF association is implied by respective protocol.

Besides the already mentioned NEF_STICKY flag, there is also NEF_ONLINK, which specifies that neighbor should be considered reachable on given iface regardless of associated address ranges, and NEF_IFACE, which represents pseudo-neighbor entry for whole interface (and uses IPA_NONE IP address).

When a neighbor event occurs (a neighbor gets disconnected or a sticky inactive neighbor becomes connected), the protocol hook neigh_notify() is called to advertise the change.


Function

neighbor * neigh_find (struct proto * p, ip_addr a, struct iface * iface, uint flags) -- find or create a neighbor entry

Arguments

struct proto * p

protocol which asks for the entry

ip_addr a

IP address of the node to be searched for

struct iface * iface

optionally bound neighbor to this iface (may be NULL)

uint flags

NEF_STICKY for sticky entry, NEF_ONLINK for onlink entry

Description

Search the neighbor cache for a node with given IP address. Iface can be specified for link-local addresses or for cases, where neighbor is expected on given interface. If it is found, a pointer to the neighbor entry is returned. If no such entry exists and the node is directly connected on one of our active interfaces, a new entry is created and returned to the caller with protocol-dependent fields initialized to zero. If the node is not connected directly or *a is not a valid unicast IP address, neigh_find() returns NULL.


Function

void neigh_dump (neighbor * n) -- dump specified neighbor entry.

Arguments

neighbor * n

the entry to dump

Description

This functions dumps the contents of a given neighbor entry to debug output.


Function

void neigh_dump_all (void) -- dump all neighbor entries.

Description

This function dumps the contents of the neighbor cache to debug output.


Function

void neigh_update (neighbor * n, struct iface * iface)

Arguments

neighbor * n

neighbor to update

struct iface * iface

changed iface

Description

The function recalculates state of the neighbor entry n assuming that only the interface iface may changed its state or addresses. Then, appropriate actions are executed (the neighbor goes up, down, up-down, or just notified).


Function

void neigh_if_up (struct iface * i)

Arguments

struct iface * i

interface in question

Description

Tell the neighbor cache that a new interface became up.

The neighbor cache wakes up all inactive sticky neighbors with addresses belonging to prefixes of the interface i.


Function

void neigh_if_down (struct iface * i) -- notify neighbor cache about interface down event

Arguments

struct iface * i

the interface in question

Description

Notify the neighbor cache that an interface has ceased to exist.

It causes all neighbors connected to this interface to be updated or removed.


Function

void neigh_if_link (struct iface * i) -- notify neighbor cache about interface link change

Arguments

struct iface * i

the interface in question

Description

Notify the neighbor cache that an interface changed link state. All owners of neighbor entries connected to this interface are notified.


Function

void neigh_ifa_up (struct ifa * a)

Arguments

struct ifa * a

interface address in question

Description

Tell the neighbor cache that an address was added or removed.

The neighbor cache wakes up all inactive sticky neighbors with addresses belonging to prefixes of the interface belonging to ifa and causes all unreachable neighbors to be flushed.


Function

void neigh_prune (void) -- prune neighbor cache

Description

neigh_prune() examines all neighbor entries cached and removes those corresponding to inactive protocols. It's called whenever a protocol is shut down to get rid of all its heritage.


Function

void neigh_init (pool * if_pool) -- initialize the neighbor cache.

Arguments

pool * if_pool

resource pool to be used for neighbor entries.

Description

This function is called during BIRD startup to initialize the neighbor cache module.

2.10 Command line interface

This module takes care of the BIRD's command-line interface (CLI). The CLI exists to provide a way to control BIRD remotely and to inspect its status. It uses a very simple textual protocol over a stream connection provided by the platform dependent code (on UNIX systems, it's a UNIX domain socket).

Each session of the CLI consists of a sequence of request and replies, slightly resembling the FTP and SMTP protocols. Requests are commands encoded as a single line of text, replies are sequences of lines starting with a four-digit code followed by either a space (if it's the last line of the reply) or a minus sign (when the reply is going to continue with the next line), the rest of the line contains a textual message semantics of which depends on the numeric code. If a reply line has the same code as the previous one and it's a continuation line, the whole prefix can be replaced by a single white space character.

Reply codes starting with 0 stand for `action successfully completed' messages, 1 means `table entry', 8 `runtime error' and 9 `syntax error'.

Each CLI session is internally represented by a cli structure and a resource pool containing all resources associated with the connection, so that it can be easily freed whenever the connection gets closed, not depending on the current state of command processing.

The CLI commands are declared as a part of the configuration grammar by using the CF_CLI macro. When a command is received, it is processed by the same lexical analyzer and parser as used for the configuration, but it's switched to a special mode by prepending a fake token to the text, so that it uses only the CLI command rules. Then the parser invokes an execution routine corresponding to the command, which either constructs the whole reply and returns it back or (in case it expects the reply will be long) it prints a partial reply and asks the CLI module (using the cont hook) to call it again when the output is transferred to the user.

The this_cli variable points to a cli structure of the session being currently parsed, but it's of course available only in command handlers not entered using the cont hook.

TX buffer management works as follows: At cli.tx_buf there is a list of TX buffers (struct cli_out), cli.tx_write is the buffer currently used by the producer (cli_printf(), cli_alloc_out()) and cli.tx_pos is the buffer currently used by the consumer (cli_write(), in system dependent code). The producer uses cli_out.wpos ptr as the current write position and the consumer uses cli_out.outpos ptr as the current read position. When the producer produces something, it calls cli_write_trigger(). If there is not enough space in the current buffer, the producer allocates the new one. When the consumer processes everything in the buffer queue, it calls cli_written(), tha frees all buffers (except the first one) and schedules cli.event .


Function

void cli_printf (cli * c, int code, char * msg, ... ...) -- send reply to a CLI connection

Arguments

cli * c

CLI connection

int code

numeric code of the reply, negative for continuation lines

char * msg

a printf()-like formatting string.

... ...

variable arguments

Description

This function send a single line of reply to a given CLI connection. In works in all aspects like bsprintf() except that it automatically prepends the reply line prefix.

Please note that if the connection can be already busy sending some data in which case cli_printf() stores the output to a temporary buffer, so please avoid sending a large batch of replies without waiting for the buffers to be flushed.

If you want to write to the current CLI output, you can use the cli_msg() macro instead.


Function

void cli_init (void) -- initialize the CLI module

Description

This function is called during BIRD startup to initialize the internal data structures of the CLI module.

2.11 Object locks

The lock module provides a simple mechanism for avoiding conflicts between various protocols which would like to use a single physical resource (for example a network port). It would be easy to say that such collisions can occur only when the user specifies an invalid configuration and therefore he deserves to get what he has asked for, but unfortunately they can also arise legitimately when the daemon is reconfigured and there exists (although for a short time period only) an old protocol instance being shut down and a new one willing to start up on the same interface.

The solution is very simple: when any protocol wishes to use a network port or some other non-shareable resource, it asks the core to lock it and it doesn't use the resource until it's notified that it has acquired the lock.

Object locks are represented by object_lock structures which are in turn a kind of resource. Lockable resources are uniquely determined by resource type (OBJLOCK_UDP for a UDP port etc.), IP address (usually a broadcast or multicast address the port is bound to), port number, interface and optional instance ID.


Function

struct object_lock * olock_new (pool * p) -- create an object lock

Arguments

pool * p

resource pool to create the lock in.

Description

The olock_new() function creates a new resource of type object_lock and returns a pointer to it. After filling in the structure, the caller should call olock_acquire() to do the real locking.


Function

void olock_acquire (struct object_lock * l) -- acquire a lock

Arguments

struct object_lock * l

the lock to acquire

Description

This function attempts to acquire exclusive access to the non-shareable resource described by the lock l. It returns immediately, but as soon as the resource becomes available, it calls the hook() function set up by the caller.

When you want to release the resource, just rfree() the lock.


Function

void olock_init (void) -- initialize the object lock mechanism

Description

This function is called during BIRD startup. It initializes all the internal data structures of the lock module.


Next Previous Contents