The BIRD Internet Routing Daemon Project

2. Core

2.1 Forwarding Information Base

FIB is a data structure designed for storage of routes indexed by their network prefixes. It supports insertion, deletion, searching by prefix, `routing' (in CIDR sense, that is searching for a longest prefix matching a given IP address) and (which makes the structure very tricky to implement) asynchronous reading, that is enumerating the contents of a FIB while other modules add, modify or remove entries.

Internally, each FIB is represented as a collection of nodes of type fib_node indexed using a sophisticated hashing mechanism. We use two-stage hashing where we calculate a 16-bit primary hash key independent on hash table size and then we just divide the primary keys modulo table size to get a real hash key used for determining the bucket containing the node. The lists of nodes in each bucket are sorted according to the primary hash key, hence if we keep the total number of buckets to be a power of two, re-hashing of the structure keeps the relative order of the nodes.

To get the asynchronous reading consistent over node deletions, we need to keep a list of readers for each node. When a node gets deleted, its readers are automatically moved to the next node in the table.

Basic FIB operations are performed by functions defined by this module, enumerating of FIB contents is accomplished by using the FIB_WALK() macro or FIB_ITERATE_START() if you want to do it asynchronously.

For simple iteration just place the body of the loop between FIB_WALK() and FIB_WALK_END(). You can't modify the FIB during the iteration (you can modify data in the node, but not add or remove nodes).

If you need more freedom, you can use the FIB_ITERATE_*() group of macros. First, you initialize an iterator with FIB_ITERATE_INIT(). Then you can put the loop body in between FIB_ITERATE_START() and FIB_ITERATE_END(). In addition, the iteration can be suspended by calling FIB_ITERATE_PUT(). This'll link the iterator inside the FIB. While suspended, you may modify the FIB, exit the current function, etc. To resume the iteration, enter the loop again. You can use FIB_ITERATE_UNLINK() to unlink the iterator (while iteration is suspended) in cases like premature end of FIB iteration.

Note that the iterator must not be destroyed when the iteration is suspended, the FIB would then contain a pointer to invalid memory. Therefore, after each FIB_ITERATE_INIT() or FIB_ITERATE_PUT() there must be either FIB_ITERATE_START() or FIB_ITERATE_UNLINK() before the iterator is destroyed.

Function

void fib_init (struct fib * f, pool * p, uint addr_type, uint node_size, uint node_offset, uint hash_order, fib_init_fn init) -- initialize a new FIB

Arguments

struct fib * f: the FIB to be initialized (the structure itself being allocated by the caller)
pool * p: pool to allocate the nodes in
uint addr_type: -- undescribed --
uint node_size: node size to be used (each node consists of a standard header fib_node followed by user data)
uint node_offset: -- undescribed --
uint hash_order: initial hash order (a binary logarithm of hash table size), 0 to use default order (recommended)
fib_init_fn init: pointer a function to be called to initialize a newly created node

Description

This function initializes a newly allocated FIB and prepares it for use.

Function

void * fib_find (struct fib * f, const net_addr * a) -- search for FIB node by prefix

Arguments

struct fib * f: FIB to search in
const net_addr * a: -- undescribed --

Description

Search for a FIB node corresponding to the given prefix, return a pointer to it or NULL if no such node exists.

Function

void * fib_get (struct fib * f, const net_addr * a) -- find or create a FIB node

Arguments

struct fib * f: FIB to work with
const net_addr * a: -- undescribed --

Description

Search for a FIB node corresponding to the given prefix and return a pointer to it. If no such node exists, create it.

Function

void * fib_route (struct fib * f, const net_addr * n) -- CIDR routing lookup

Arguments

struct fib * f: FIB to search in
const net_addr * n: network address

Description

Search for a FIB node with longest prefix matching the given network, that is a node which a CIDR router would use for routing that network.

Function

void fib_delete (struct fib * f, void * E) -- delete a FIB node

Arguments

struct fib * f: FIB to delete from
void * E: entry to delete

Description

This function removes the given entry from the FIB, taking care of all the asynchronous readers by shifting them to the next node in the canonical reading order.

Function

void fib_free (struct fib * f) -- delete a FIB

Arguments

struct fib * f: FIB to be deleted

Description

This function deletes a FIB -- it frees all memory associated with it and all its entries.

Function

void fib_check (struct fib * f) -- audit a FIB

Arguments

struct fib * f: FIB to be checked

Description

This debugging function audits a FIB by checking its internal consistency. Use when you suspect somebody of corrupting innocent data structures.

2.2 Routing tables

Routing tables are probably the most important structures BIRD uses. They hold all the information about known networks, the associated routes and their attributes.

There are multiple routing tables (a primary one together with any number of secondary ones if requested by the configuration). Each table is basically a FIB containing entries describing the individual destination networks. For each network (represented by structure net), there is a one-way linked list of route entries (rte), the first entry on the list being the best one (i.e., the one we currently use for routing), the order of the other ones is undetermined.

The rte contains information about the route. There are net and src, which together forms a key identifying the route in a routing table. There is a pointer to a rta structure (see the route attribute module for a precise explanation) holding the route attributes, which are primary data about the route. There are several technical fields used by routing table code (route id, REF_* flags), There is also the pflags field, holding protocol-specific flags. They are not used by routing table code, but by protocol-specific hooks. In contrast to route attributes, they are not primary data and their validity is also limited to the routing table.

There are several mechanisms that allow automatic update of routes in one routing table (dst) as a result of changes in another routing table (src). They handle issues of recursive next hop resolving, flowspec validation and RPKI validation.

The first such mechanism is handling of recursive next hops. A route in the dst table has an indirect next hop address, which is resolved through a route in the src table (which may also be the same table) to get an immediate next hop. This is implemented using structure hostcache attached to the src table, which contains hostentry structures for each tracked next hop address. These structures are linked from recursive routes in dst tables, possibly multiple routes sharing one hostentry (as many routes may have the same indirect next hop). There is also a trie in the hostcache, which matches all prefixes that may influence resolving of tracked next hops.

When a best route changes in the src table, the hostcache is notified using an auxiliary export request, which checks using the trie whether the change is relevant and if it is, then it schedules asynchronous hostcache recomputation. The recomputation is done by rt_update_hostcache() (called as an event of src table), it walks through all hostentries and resolves them (by rt_update_hostentry()). It also updates the trie. If a change in hostentry resolution was found, then it schedules asynchronous nexthop recomputation of associated dst table. That is done by rt_next_hop_update() (called from rt_event() of dst table), it iterates over all routes in the dst table and re-examines their hostentries for changes. Note that in contrast to hostcache update, next hop update can be interrupted by main loop. These two full-table walks (over hostcache and dst table) are necessary due to absence of direct lookups (route -> affected nexthop, nexthop -> its route).

The second mechanism is for flowspec validation, where validity of flowspec routes depends of resolving their network prefixes in IP routing tables. This is similar to the recursive next hop mechanism, but simpler as there are no intermediate hostcache and hostentries (because flows are less likely to share common net prefix than routes sharing a common next hop). Every dst table has its own export request in every src table. Each dst table has its own trie of prefixes that may influence validation of flowspec routes in it (flowspec_trie).

When a best route changes in the src table, the notification mechanism is invoked by the export request which checks its dst table's trie to see whether the change is relevant, and if so, an asynchronous re-validation of flowspec routes in the dst table is scheduled. That is also done by function rt_next_hop_update(), like nexthop recomputation above. It iterates over all flowspec routes and re-validates them. It also recalculates the trie.

Note that in contrast to the hostcache update, here the trie is recalculated during the rt_next_hop_update(), which may be interleaved with IP route updates. The trie is flushed at the beginning of recalculation, which means that such updates may use partial trie to see if they are relevant. But it works anyway! Either affected flowspec was already re-validated and added to the trie, then IP route change would match the trie and trigger a next round of re-validation, or it was not yet re-validated and added to the trie, but will be re-validated later in this round anyway.

The third mechanism is used for RPKI re-validation of IP routes and it is the simplest. It is also an auxiliary export request belonging to the appropriate channel, triggering its reload/refeed timer after a settle time.

Function

int net_roa_check (rtable * tp, const net_addr * n, u32 asn) -- check validity of route origination in a ROA table

Arguments

rtable * tp: -- undescribed --
const net_addr * n: network prefix to check
u32 asn: AS number of network prefix

Description

Implements RFC 6483 route validation for the given network prefix. The procedure is to find all candidate ROAs - ROAs whose prefixes cover the given network prefix. If there is no candidate ROA, return ROA_UNKNOWN. If there is a candidate ROA with matching ASN and maxlen field greater than or equal to the given prefix length, return ROA_VALID. Otherwise, return ROA_INVALID. If caller cannot determine origin AS, 0 could be used (in that case ROA_VALID cannot happen). Table tab must have type NET_ROA4 or NET_ROA6, network n must have type NET_IP4 or NET_IP6, respectively.

Function

enum aspa_result aspa_check (rtable * tab, const adata * path, bool force_upstream) -- check validity of AS Path in an ASPA table

Arguments

rtable * tab: ASPA table
const adata * path: AS Path to check
bool force_upstream: -- undescribed --

Description

Implements draft-ietf-sidrops-aspa-verification-16.

Function

void rte_free (struct rte_storage * e, struct rtable_private * tab) -- delete a rte (happens later)

Arguments

struct rte_storage * e: struct rte_storage to be deleted
struct rtable_private * tab: the table which the rte belongs to

Description

rte_free() deletes the given rte from the routing table it's linked to.

Function

void rt_refresh_begin (struct rt_import_request * req) -- start a refresh cycle

Arguments

struct rt_import_request * req: -- undescribed --

Description

This function starts a refresh cycle for given routing table and announce hook. The refresh cycle is a sequence where the protocol sends all its valid routes to the routing table (by rte_update()). After that, all protocol routes (more precisely routes with c as sender) not sent during the refresh cycle but still in the table from the past are pruned. This is implemented by marking all related routes as stale by REF_STALE flag in rt_refresh_begin(), then marking all related stale routes with REF_DISCARD flag in rt_refresh_end() and then removing such routes in the prune loop.

Function

void rt_refresh_end (struct rt_import_request * req) -- end a refresh cycle

Arguments

struct rt_import_request * req: -- undescribed --

Description

This function ends a refresh cycle for given routing table and announce hook. See rt_refresh_begin() for description of refresh cycles.

Function

void rt_refresh_trace (struct rtable_private * tab, struct rt_import_hook * ih, const char * msg) -- log information about route refresh

Arguments

struct rtable_private * tab: table
struct rt_import_hook * ih: import hook doing the route refresh
const char * msg: what is happening

Description

This function consistently logs route refresh messages.

Function

void rte_dump (struct dump_request * dreq, struct rte_storage * e) -- dump a route

Arguments

struct dump_request * dreq: -- undescribed --
struct rte_storage * e: rte to be dumped

Description

This functions dumps contents of a rte to debug output.

Function

void rt_dump (struct dump_request * dreq, rtable * tab) -- dump a routing table

Arguments

struct dump_request * dreq: -- undescribed --
rtable * tab: -- undescribed --

Description

This function dumps contents of a given routing table to debug output.

Function

void rt_dump_all (struct dump_request * dreq) -- dump all routing tables

Arguments

struct dump_request * dreq: -- undescribed --

Description

This function dumps contents of all routing tables to debug output.

Function

void rt_init (void) -- initialize routing tables

Description

This function is called during BIRD startup. It initializes the routing table module.

Function

void rt_prune_table (void * _tab) -- prune a routing table

Arguments

void * _tab: -- undescribed --

Description

The prune loop scans routing tables and removes routes belonging to flushing protocols, discarded routes and also stale network entries. It is called from rt_event(). The event is rescheduled if the current iteration do not finish the table. The pruning is directed by the prune state (prune_state), specifying whether the prune cycle is scheduled or running, and there is also a persistent pruning iterator (prune_fit).

The prune loop is used also for channel flushing. For this purpose, the channels to flush are marked before the iteration and notified after the iteration.

Function

void rt_unlock_trie (struct rtable_private * tab, const struct f_trie * trie) -- unlock a prefix trie of a routing table

Arguments

struct rtable_private * tab: routing table with prefix trie to be locked
const struct f_trie * trie: value returned by matching rt_lock_trie()

Description

Done for trie locked by rt_lock_trie() after walk over the trie is done. It may free the trie and schedule next trie pruning.

Function

void rt_lock_table_priv (struct rtable_private * r, const char * file, uint line) -- lock a routing table

Arguments

struct rtable_private * r: routing table to be locked
const char * file: -- undescribed --
uint line: -- undescribed --

Description

Lock a routing table, because it's in use by a protocol, preventing it from being freed when it gets undefined in a new configuration.

Function

void rt_unlock_table_priv (struct rtable_private * r, const char * file, uint line) -- unlock a routing table

Arguments

struct rtable_private * r: routing table to be unlocked
const char * file: -- undescribed --
uint line: -- undescribed --

Description

Unlock a routing table formerly locked by rt_lock_table(), that is decrease its use count and delete it if it's scheduled for deletion by configuration changes.

Function

void rt_commit (struct config * new, struct config * old) -- commit new routing table configuration

Arguments

struct config * new: new configuration
struct config * old: original configuration or NULL if it's boot time config

Description

Scan differences between old and new configuration and modify the routing tables according to these changes. If new defines a previously unknown table, create it, if it omits a table existing in old, schedule it for deletion (it gets deleted when all protocols disconnect from it by calling rt_unlock_table()), if it exists in both configurations, leave it unchanged.

2.3 Route attribute cache

Each route entry carries a set of route attributes. Several of them vary from route to route, but most attributes are usually common for a large number of routes. To conserve memory, we've decided to store only the varying ones directly in the rte and hold the rest in a special structure called rta which is shared among all the rte's with these attributes.

Each rta contains all the static attributes of the route (i.e., those which are always present) as structure members and a list of dynamic attributes represented by a linked list of ea_list structures, each of them consisting of an array of eattr's containing the individual attributes. An attribute can be specified more than once in the ea_list chain and in such case the first occurrence overrides the others. This semantics is used especially when someone (for example a filter) wishes to alter values of several dynamic attributes, but it wants to preserve the original attribute lists maintained by another module.

Each eattr contains an attribute identifier (split to protocol ID and per-protocol attribute ID), protocol dependent flags, a type code (consisting of several bit fields describing attribute characteristics) and either an embedded 32-bit value or a pointer to a adata structure holding attribute contents.

There exist two variants of rta's -- cached and un-cached ones. Un-cached rta's can have arbitrarily complex structure of ea_list's and they can be modified by any module in the route processing chain. Cached rta's have their attribute lists normalized (that means at most one ea_list is present and its values are sorted in order to speed up searching), they are stored in a hash table to make fast lookup possible and they are provided with a use count to allow sharing.

Routing tables always contain only cached rta's.

Function

struct rte_src * rt_find_source_global (u32 id)

Arguments

u32 id: requested global ID

Route attribute cache

sources stored by their ID. Checking for non-existent or foreign source is unsafe.

Description

Returns the found source or dies. Result of this function is guaranteed to be a valid source as long as the caller owns it.

Function

struct nexthop_adata * nexthop_merge (struct nexthop_adata * xin, struct nexthop_adata * yin, int max, linpool * lp) -- merge nexthop lists

Arguments

struct nexthop_adata * xin: -- undescribed --
struct nexthop_adata * yin: -- undescribed --
int max: max number of nexthops
linpool * lp: linpool for allocating nexthops

Description

The nexthop_merge() function takes two nexthop lists x and y and merges them, eliminating possible duplicates. The input lists must be sorted and the result is sorted too. The number of nexthops in result is limited by max. New nodes are allocated from linpool lp.

The arguments rx and ry specify whether corresponding input lists may be consumed by the function (i.e. their nodes reused in the resulting list), in that case the caller should not access these lists after that. To eliminate issues with deallocation of these lists, the caller should use some form of bulk deallocation (e.g. stack or linpool) to free these nodes when the resulting list is no longer needed. When reusability is not set, the corresponding lists are not modified nor linked from the resulting list.

Function

eattr * ea_find_by_id (ea_list * e, unsigned id) -- find an extended attribute

Arguments

ea_list * e: attribute list to search in
unsigned id: attribute ID to search for

Description

Given an extended attribute list, ea_find() searches for a first occurrence of an attribute with specified ID, returning either a pointer to its eattr structure or NULL if no such attribute exists.

Function

eattr * ea_walk (struct ea_walk_state * s, uint id, uint max) -- walk through extended attributes

Arguments

struct ea_walk_state * s: walk state structure
uint id: start of attribute ID interval
uint max: length of attribute ID interval

Description

Given an extended attribute list, ea_walk() walks through the list looking for first occurrences of attributes with ID in specified interval from id to (id + max - 1), returning pointers to found eattr structures, storing its walk state in s for subsequent calls.

The function ea_walk() is supposed to be called in a loop, with initially zeroed walk state structure s with filled the initial extended attribute list, returning one found attribute in each call or NULL when no other attribute exists. The extended attribute list or the arguments should not be modified between calls. The maximum value of max is 128.

Function

int ea_same (ea_list * x, ea_list * y) -- compare two ea_list's

Arguments

ea_list * x: attribute list
ea_list * y: attribute list

Description

ea_same() compares two normalized attribute lists x and y and returns 1 if they contain the same attributes, 0 otherwise.

Function

ea_list * ea_normalize (ea_list * e, u32 upto) -- create a normalized version of attributes

Arguments

ea_list * e: input attributes
u32 upto: bitmask of layers which should stay as an underlay

Description

This function squashes all updates done atop some ea_list and creates the final structure useful for storage or fast searching. The method is a bucket sort.

Returns the final ea_list allocated from the tmp_linpool. The adata is linked from the original places.

Function

void ea_show (struct cli * c, const eattr * e) -- print an eattr to CLI

Arguments

struct cli * c: destination CLI
const eattr * e: attribute to be printed

Description

This function takes an extended attribute represented by its eattr structure and prints it to the CLI according to the type information.

If the protocol defining the attribute provides its own get_attr() hook, it's consulted first.

Function

void ea_dump (struct dump_request * dreq, ea_list * e) -- dump an extended attribute

Arguments

struct dump_request * dreq: -- undescribed --
ea_list * e: attribute to be dumped

Description

ea_dump() dumps contents of the extended attribute given to the debug output.

Function

uint ea_hash (ea_list * e) -- calculate an ea_list hash key

Arguments

ea_list * e: attribute list

Description

ea_hash() takes an extended attribute list and calculated a hopefully uniformly distributed hash value from its contents.

Function

ea_list * ea_append (ea_list * to, ea_list * what) -- concatenate ea_list's

Arguments

ea_list * to: destination list (can be NULL)
ea_list * what: list to be appended (can be NULL)

Description

This function appends the ea_list what at the end of ea_list to and returns a pointer to the resulting list.

Function

ea_list * ea_lookup_slow (ea_list * o, u32 squash_upto, enum ea_stored oid) -- look up a rta in attribute cache

Arguments

ea_list * o: a un-cached rta
u32 squash_upto: -- undescribed --
enum ea_stored oid: -- undescribed --

Description

rta_lookup() gets an un-cached rta structure and returns its cached counterpart. It starts with examining the attribute cache to see whether there exists a matching entry. If such an entry exists, it's returned and its use count is incremented, else a new entry is created with use count set to 1.

The extended attribute lists attached to the rta are automatically converted to the normalized form.

Function

void ea_dump_all (struct dump_request * dreq) -- dump attribute cache

Arguments

struct dump_request * dreq: -- undescribed --

2.4 Routing protocols

Introduction

The routing protocols are the bird's heart and a fine amount of code is dedicated to their management and for providing support functions to them. (-: Actually, this is the reason why the directory with sources of the core code is called nest :-).

When talking about protocols, one need to distinguish between protocols and protocol instances. A protocol exists exactly once, not depending on whether it's configured or not and it can have an arbitrary number of instances corresponding to its "incarnations" requested by the configuration file. Each instance is completely autonomous, has its own configuration, its own status, its own set of routes and its own set of interfaces it works on.

A protocol is represented by a protocol structure containing all the basic information (protocol name, default settings and pointers to most of the protocol hooks). All these structures are linked in the protocol_list list.

Each instance has its own proto structure describing all its properties: protocol type, configuration, a resource pool where all resources belonging to the instance live, various protocol attributes (take a look at the declaration of proto in protocol.h), protocol states (see below for what do they mean), connections to routing tables, filters attached to the protocol and finally a set of pointers to the rest of protocol hooks (they are the same for all instances of the protocol, but in order to avoid extra indirections when calling the hooks from the fast path, they are stored directly in proto). The instance is always linked in both the global instance list (proto_list) and a per-status list (either active_proto_list for running protocols, initial_proto_list for protocols being initialized or flush_proto_list when the protocol is being shut down).

The protocol hooks are described in the next chapter, for more information about configuration of protocols, please refer to the configuration chapter and also to the description of the proto_commit function.

Protocol states

As startup and shutdown of each protocol are complex processes which can be affected by lots of external events (user's actions, reconfigurations, behavior of neighboring routers etc.), we have decided to supervise them by a pair of simple state machines -- the protocol state machine and a core state machine.

The protocol state machine corresponds to internal state of the protocol and the protocol can alter its state whenever it wants to. There are the following states:

PS_DOWN: The protocol is down and waits for being woken up by calling its start() hook.
PS_START: The protocol is waiting for connection with the rest of the network. It's active, it has resources allocated, but it still doesn't want any routes since it doesn't know what to do with them.
PS_UP: The protocol is up and running. It communicates with the core, delivers routes to tables and wants to hear announcement about route changes.
PS_STOP: The protocol has been shut down (either by being asked by the core code to do so or due to having encountered a protocol error).

Unless the protocol is in the PS_DOWN state, it can decide to change its state by calling the proto_notify_state function.

struct proto * p: protocol instance
struct channel_config * cf: channel configuration

Description

This function creates a channel between the protocol instance p and the routing table specified in the configuration cf, making the protocol hear all changes in the table and allowing the protocol to update routes in the table.

The channel is linked in the protocol channel list and when active also in the table channel list. Channels are allocated from the global resource pool (proto_pool) and they are automatically freed when the protocol is removed.

Function

void * proto_new (struct proto_config * cf) -- create a new protocol instance

Arguments

struct proto_config * cf: -- undescribed --

Description

When a new configuration has been read in, the core code starts initializing all the protocol instances configured by calling their init() hooks with the corresponding instance configuration. The initialization code of the protocol is expected to create a new instance according to the configuration by calling this function and then modifying the default settings to values wanted by the protocol.

Function

void * proto_config_new (struct protocol * pr, int class) -- create a new protocol configuration

Arguments

struct protocol * pr: protocol the configuration will belong to
int class: SYM_PROTO or SYM_TEMPLATE

Description

Whenever the configuration file says that a new instance of a routing protocol should be created, the parser calls proto_config_new() to create a configuration entry for this instance (a structure staring with the proto_config header containing all the generic items followed by protocol-specific ones). Also, the configuration entry gets added to the list of protocol instances kept in the configuration.

The function is also used to create protocol templates (when class SYM_TEMPLATE is specified), the only difference is that templates are not added to the list of protocol instances and therefore not initialized during protos_commit()).

Function

void proto_copy_config (struct proto_config * dest, struct proto_config * src) -- copy a protocol configuration

Arguments

struct proto_config * dest: destination protocol configuration
struct proto_config * src: source protocol configuration

Description

Whenever a new instance of a routing protocol is created from the template, proto_copy_config() is called to copy a content of the source protocol configuration to the new protocol configuration. Name, class and a node in protos list of dest are kept intact. copy_config() protocol hook is used to copy protocol-specific data.

Function

void protos_preconfig (struct config * c) -- pre-configuration processing

Arguments

struct config * c: new configuration

Description

This function calls the preconfig() hooks of all routing protocols available to prepare them for reading of the new configuration.

Function

void protos_commit (struct config * new, struct config * old, int type) -- commit new protocol configuration

Arguments

struct config * new: new configuration
struct config * old: old configuration or NULL if it's boot time config
int type: type of reconfiguration (RECONFIG_SOFT or RECONFIG_HARD)

Description

Scan differences between old and new configuration and adjust all protocol instances to conform to the new configuration.

When a protocol exists in the new configuration, but it doesn't in the original one, it's immediately started. When a collision with the other running protocol would arise, the new protocol will be temporarily stopped by the locking mechanism.

When a protocol exists in the old configuration, but it doesn't in the new one, it's shut down and deleted after the shutdown completes.

When a protocol exists in both configurations, the core decides whether it's possible to reconfigure it dynamically - it checks all the core properties of the protocol (changes in filters are ignored if type is RECONFIG_SOFT) and if they match, it asks the reconfigure() hook of the protocol to see if the protocol is able to switch to the new configuration. If it isn't possible, the protocol is shut down and a new instance is started with the new configuration after the shutdown is completed.

2.5 Graceful restart recovery

Graceful restart of a router is a process when the routing plane (e.g. BIRD) restarts but both the forwarding plane (e.g kernel routing table) and routing neighbors keep proper routes, and therefore uninterrupted packet forwarding is maintained.

BIRD implements graceful restart recovery by deferring export of routes to protocols until routing tables are refilled with the expected content. After start, protocols generate routes as usual, but routes are not propagated to them, until protocols report that they generated all routes. After that, graceful restart recovery is finished and the export (and the initial feed) to protocols is enabled.

When graceful restart recovery need is detected during initialization, then enabled protocols are marked with gr_recovery flag before start. Such protocols then decide how to proceed with graceful restart, participation is voluntary. Protocols could lock the recovery for each channel by function channel_graceful_restart_lock() (state stored in gr_lock flag), which means that they want to postpone the end of the recovery until they converge and then unlock it. They also could set gr_wait before advancing to PS_UP, which means that the core should defer route export to that channel until the end of the recovery. This should be done by protocols that expect their neigbors to keep the proper routes (kernel table, BGP sessions with BGP graceful restart capability).

The graceful restart recovery is finished when either all graceful restart locks are unlocked or when graceful restart wait timer fires.

Function

void graceful_recovery_done (struct callback *_ UNUSED) -- finalize graceful restart

Arguments

struct callback *_ UNUSED: -- undescribed --

Description

When there are no locks on graceful restart, the functions finalizes the graceful restart recovery. Protocols postponing route export until the end of the recovery are awakened and the export to them is enabled.

Function

void graceful_restart_recovery (void) -- request initial graceful restart recovery

Description

Called by the platform initialization code if the need for recovery after graceful restart is detected during boot. Have to be called before protos_commit().

Function

void graceful_restart_init (void) -- initialize graceful restart

Description

When graceful restart recovery was requested, the function starts an active phase of the recovery and initializes graceful restart wait timer. The function have to be called after protos_commit().

Function

void channel_graceful_restart_lock (struct channel * c) -- lock graceful restart by channel

Arguments

struct channel * c: -- undescribed --

Description

This function allows a protocol to postpone the end of graceful restart recovery until it converges. The lock is removed when the protocol calls channel_graceful_restart_unlock() or when the channel is closed.

The function have to be called during the initial phase of graceful restart recovery and only for protocols that are part of graceful restart (i.e. their gr_recovery is set), which means it should be called from protocol start hooks.

Function

void channel_graceful_restart_unlock (struct channel * c) -- unlock graceful restart by channel

Arguments

struct channel * c: -- undescribed --

Description

This function unlocks a lock from channel_graceful_restart_lock(). It is also automatically called when the lock holding protocol went down.

Function

void protos_dump_all (struct dump_request * dreq) -- dump status of all protocols

Arguments

struct dump_request * dreq: -- undescribed --

Description

This function dumps status of all existing protocol instances to the debug output. It involves printing of general status information such as protocol states, its position on the protocol lists and also calling of a dump() hook of the protocol to print the internals.

Function

void proto_build (struct protocol * p) -- make a single protocol available

Arguments

struct protocol * p: the protocol

Arguments

struct proto * p: an instance
struct proto_config * c: new configuration

Description

The core calls the reconfigure() hook whenever it wants to ask the protocol for switching to a new configuration. If the reconfiguration is possible, the hook returns 1. Otherwise, it returns 0 and the core will shut down the instance and start a new one with the new configuration.

After the protocol confirms reconfiguration, it must no longer keep any references to the old configuration since the memory it's stored in can be re-used at any time.

Function

void dump (struct proto * p) -- dump protocol state

Arguments

struct proto * p: an instance

Description

This hook dumps the complete state of the instance to the debug output.

Function

int start (struct proto * p) -- request instance startup

Arguments

struct proto * p: protocol instance

If the type of route announcement is RA_OPTIMAL, it is an announcement of optimal route change, new stores the new optimal route and old stores the old optimal route.

If the type of route announcement is RA_ANY, it is an announcement of any route change, new stores the new route and old stores the old route from the same protocol.

p->accept_ra_types specifies which kind of route announcements protocol wants to receive.

Function

void neigh_notify (neighbor * neigh) -- notify instance about neighbor status change

Arguments

neighbor * neigh: a neighbor cache entry

Description

The neigh_notify() hook is called by the neighbor cache whenever a neighbor changes its state, that is it gets disconnected or a sticky neighbor gets connected.

Function

int preexport (struct proto * p, rte ** e, ea_list ** attrs, struct linpool * pool) -- pre-filtering decisions before route export

Arguments

struct proto * p: protocol instance the route is going to be exported to
rte ** e: the route in question
ea_list ** attrs: extended attributes of the route
struct linpool * pool: linear pool for allocation of all temporary data

Description

The preexport() hook is called as the first step of a exporting a route from a routing table to the protocol instance. It can modify route attributes and force acceptance or rejection of the route before the user-specified filters are run. See rte_announce() for a complete description of the route distribution process.

net * n: network
rte * e: route

Description

This hook is called whenever a rte belonging to the instance is accepted for insertion to a routing table.

Please avoid using this function in new protocols.

Function

void rte_remove (net * n, rte * e) -- notify instance about route removal

Arguments

net * n: network
rte * e: route

Description

This hook is called whenever a rte belonging to the instance is removed from a routing table.

Please avoid using this function in new protocols.

2.7 Interfaces

The interface module keeps track of all network interfaces in the system and their addresses.

Each interface is represented by an iface structure which carries interface capability flags (IF_MULTIACCESS, IF_BROADCAST etc.), MTU, interface name and index and finally a linked list of network prefixes assigned to the interface, each one represented by struct ifa.

The interface module keeps a `soft-up' state for each iface which is a conjunction of link being up, the interface being of a `sane' type and at least one IP address assigned to it.

Function

void ifa_dump (struct dump_request * dreq, struct ifa * a) -- dump interface address

Arguments

struct dump_request * dreq: -- undescribed --
struct ifa * a: interface address descriptor

Description

This function dumps contents of an ifa to the debug output.

Function

void if_dump (struct dump_request * dreq, struct iface * i) -- dump interface

Arguments

struct dump_request * dreq: -- undescribed --
struct iface * i: interface to dump

Description

This function dumps all information associated with a given network interface to the debug output.

Function

void if_dump_all (struct dump_request * dreq) -- dump all interfaces

Arguments

struct dump_request * dreq: -- undescribed --

Description

This function dumps information about all known network interfaces to the debug output.

Function

void if_delete (struct iface * old) -- remove interface

Arguments

struct iface * old: interface

Description

This function is called by the low-level platform dependent code whenever it notices an interface disappears. It is just a shorthand for if_update().

Function

struct iface * if_update (struct iface * new) -- update interface status

Arguments

struct iface * new: new interface status

Description

if_update() is called by the low-level platform dependent code whenever it notices an interface change.

There exist two types of interface updates -- synchronous and asynchronous ones. In the synchronous case, the low-level code calls if_start_update(), scans all interfaces reported by the OS, uses if_update() and ifa_update() to pass them to the core and then it finishes the update sequence by calling if_end_update(). When working asynchronously, the sysdep code calls if_update() and ifa_update() whenever it notices a change.

if_update() will automatically notify all other modules about the change.

Function

void iface_subscribe (struct iface_subscription * s) -- request interface updates

Arguments

struct iface_subscription * s: subscription structure

Description

When a new protocol starts, this function sends it a series of notifications about all existing interfaces.

Function

void iface_unsubscribe (struct iface_subscription * s) -- unsubscribe from interface updates

Arguments

struct iface_subscription * s: subscription structure

Function

struct iface * if_find_by_index_locked (unsigned idx) -- find interface by ifindex

Arguments

unsigned idx: ifindex

Description

This function finds an iface structure corresponding to an interface of the given index idx. Returns a pointer to the structure or NULL if no such structure exists.

Function

struct iface * if_find_by_name (const char * name) -- find interface by name

Arguments

const char * name: interface name

Description

This function finds an iface structure corresponding to an interface of the given name name. Returns a pointer to the structure or NULL if no such structure exists.

Function

struct ifa * ifa_update (struct ifa * a) -- update interface address

Arguments

struct ifa * a: new interface address

Description

This function adds address information to a network interface. It's called by the platform dependent code during the interface update process described under if_update().

Function

void ifa_delete (struct ifa * a) -- remove interface address

Arguments

struct ifa * a: interface address

Description

This function removes address information from a network interface. It's called by the platform dependent code during the interface update process described under if_update().

Function

void if_init (void) -- initialize interface module

Description

This function is called during BIRD startup to initialize all data structures of the interface module.

2.8 MPLS

The MPLS subsystem manages MPLS labels and handles their allocation to MPLS-aware routing protocols. These labels are then attached to IP or VPN routes representing label switched paths -- LSPs. MPLS labels are also used in special MPLS routes (which use labels as network address) that are exported to MPLS routing table in kernel. The MPLS subsystem consists of MPLS domains (struct mpls_domain), MPLS channels (struct mpls_channel) and FEC maps (struct mpls_fec_map).

The MPLS domain represents one MPLS label address space, implements the label allocator, and handles associated configuration and management. The domain is declared in the configuration (struct mpls_domain_config). There might be multiple MPLS domains representing separate label spaces, but in most cases one domain is enough. MPLS-aware protocols and routing tables are associated with a specific MPLS domain.

The MPLS domain has configurable label ranges (struct mpls_range), by default it has two ranges: static (16-1000) and dynamic (1000-10000). When a protocol wants to allocate labels, it first acquires a handle (struct mpls_handle) for a specific range using mpls_new_handle(), and then it allocates labels from that with mpls_new_label(). When not needed, labels are freed by mpls_free_label() and the handle is released by mpls_free_handle(). Note that all labels and handles must be freed manually.

Both MPLS domain and MPLS range are reference counted, so when deconfigured they could be freed just after all labels and ranges are freed. Users are expected to hold a reference to a MPLS domain for whole time they use something from that domain (e.g. mpls_handle), but releasing reference to a range while holding associated handle is OK.

The MPLS channel is subclass of a generic protocol channel. It has two distinct purposes - to handle per-protocol MPLS configuration (e.g. which MPLS domain is associated with the protocol, which label range is used by the protocol), and to announce MPLS routes to a routing table (as a regular protocol channel).

The FEC map is a helper structure that maps forwarding equivalent classes (FECs) to MPLS labels. It is an internal matter of a routing protocol how to assign meaning to allocated labels, announce LSP routes and associated MPLS routes (i.e. ILM entries). But the common behavior is implemented in the FEC map, which can be used by the protocols that work with IP-prefix-based FECs.

The FEC map keeps hash tables of FECs (struct mpls_fec) based on network prefix, next hop eattr and assigned label. It has three general labeling policies: static assignment (MPLS_POLICY_STATIC), per-prefix policy (MPLS_POLICY_PREFIX), and aggregating policy (MPLS_POLICY_AGGREGATE). In per-prefix policy, each distinct LSP is a separate FEC and uses a separate label, which is kept even if the next hop of the LSP changes. In aggregating policy, LSPs with a same next hop form one FEC and use one label, but when a next hop (or remote label) of such LSP changes then the LSP must be moved to a different FEC and assigned a different label. There is also a special VRF policy (MPLS_POLICY_VRF) applicable for L3VPN protocols, which uses one label for all routes from a VRF, while replacing the original next hop with lookup in the VRF.

The overall process works this way: A protocol wants to announce a LSP route, it does that by announcing e.g. IP route with EA_MPLS_POLICY attribute. After the route is accepted by filters (which may also change the policy attribute or set a static label), the mpls_handle_rte() is called from rte_update2(), which applies selected labeling policy, finds existing FEC or creates a new FEC (which includes allocating new label and announcing related MPLS route by mpls_announce_fec()), and attach FEC label to the LSP route. After that, the LSP route is stored in routing table by rte_recalculate(). Changes in routing tables trigger mpls_rte_insert() and mpls_rte_remove() hooks, which refcount FEC structures and possibly trigger removal of FECs and withdrawal of MPLS routes.

TODO: - special handling of reserved labels

2.9 Neighbor cache

Most routing protocols need to associate their internal state data with neighboring routers, check whether an address given as the next hop attribute of a route is really an address of a directly connected host and which interface is it connected through. Also, they often need to be notified when a neighbor ceases to exist or when their long awaited neighbor becomes connected. The neighbor cache is there to solve all these problems.

The neighbor cache maintains a collection of neighbor entries. Each entry represents one IP address corresponding to either our directly connected neighbor or our own end of the link (when the scope of the address is set to SCOPE_HOST) together with per-neighbor data belonging to a single protocol. A neighbor entry may be bound to a specific interface, which is required for link-local IP addresses and optional for global IP addresses.

Neighbor cache entries are stored in a hash table, which is indexed by triple (protocol, IP, requested-iface), so if both regular and iface-bound neighbors are requested, they are represented by two neighbor cache entries. Active entries are also linked in per-interface list (allowing quick processing of interface change events). Inactive entries exist only when the protocol has explicitly requested it via the NEF_STICKY flag because it wishes to be notified when the node will again become a neighbor. Such entries are instead linked in a special list, which is walked whenever an interface changes its state to up. Neighbor entry VRF association is implied by respective protocol.

Besides the already mentioned NEF_STICKY flag, there is also NEF_ONLINK, which specifies that neighbor should be considered reachable on given iface regardless of associated address ranges, and NEF_IFACE, which represents pseudo-neighbor entry for whole interface (and uses IPA_NONE IP address).

When a neighbor event occurs (a neighbor gets disconnected or a sticky inactive neighbor becomes connected), the protocol hook neigh_notify() is called to advertise the change.

Function

neighbor * neigh_find (struct proto * p, ip_addr a, struct iface * iface, uint flags) -- find or create a neighbor entry

Arguments

struct proto * p: protocol which asks for the entry
ip_addr a: IP address of the node to be searched for
struct iface * iface: optionally bound neighbor to this iface (may be NULL)
uint flags: NEF_STICKY for sticky entry, NEF_ONLINK for onlink entry

Description

Search the neighbor cache for a node with given IP address. Iface can be specified for link-local addresses or for cases, where neighbor is expected on given interface. If it is found, a pointer to the neighbor entry is returned. If no such entry exists and the node is directly connected on one of our active interfaces, a new entry is created and returned to the caller with protocol-dependent fields initialized to zero. If the node is not connected directly or *a is not a valid unicast IP address, neigh_find() returns NULL.

Function

void neigh_dump (struct dump_request * dreq, neighbor * n) -- dump specified neighbor entry.

Arguments

struct dump_request * dreq: -- undescribed --
neighbor * n: the entry to dump

Description

This functions dumps the contents of a given neighbor entry to debug output.

Function

void neigh_dump_all (struct dump_request * dreq) -- dump all neighbor entries.

Arguments

struct dump_request * dreq: -- undescribed --

Description

This function dumps the contents of the neighbor cache to debug output.

Function

void neigh_update (neighbor * n, struct iface * iface)

Arguments

neighbor * n: neighbor to update
struct iface * iface: changed iface

Description

The function recalculates state of the neighbor entry n assuming that only the interface iface may changed its state or addresses. Then, appropriate actions are executed (the neighbor goes up, down, up-down, or just notified).

Function

void neigh_if_up (struct iface * i)

Arguments

struct iface * i: interface in question

Description

Tell the neighbor cache that a new interface became up.

The neighbor cache wakes up all inactive sticky neighbors with addresses belonging to prefixes of the interface i.

Function

void neigh_if_down (struct iface * i) -- notify neighbor cache about interface down event

Arguments

struct iface * i: the interface in question

Description

Notify the neighbor cache that an interface has ceased to exist.

It causes all neighbors connected to this interface to be updated or removed.

Function

void neigh_if_link (struct iface * i) -- notify neighbor cache about interface link change

Arguments

struct iface * i: the interface in question

Description

Notify the neighbor cache that an interface changed link state. All owners of neighbor entries connected to this interface are notified.

Function

void neigh_ifa_up (struct ifa * a)

Arguments

struct ifa * a: interface address in question

Description

Tell the neighbor cache that an address was added or removed.

The neighbor cache wakes up all inactive sticky neighbors with addresses belonging to prefixes of the interface belonging to ifa and causes all unreachable neighbors to be flushed.

Function

void neigh_init (pool * if_pool) -- initialize the neighbor cache.

Arguments

pool * if_pool: resource pool to be used for neighbor entries.

Description

This function is called during BIRD startup to initialize the neighbor cache module.

2.10 Command line interface

This module takes care of the BIRD's command-line interface (CLI). The CLI exists to provide a way to control BIRD remotely and to inspect its status. It uses a very simple textual protocol over a stream connection provided by the platform dependent code (on UNIX systems, it's a UNIX domain socket).

Each session of the CLI consists of a sequence of request and replies, slightly resembling the FTP and SMTP protocols. Requests are commands encoded as a single line of text, replies are sequences of lines starting with a four-digit code followed by either a space (if it's the last line of the reply) or a minus sign (when the reply is going to continue with the next line), the rest of the line contains a textual message semantics of which depends on the numeric code. If a reply line has the same code as the previous one and it's a continuation line, the whole prefix can be replaced by a single white space character.

Reply codes starting with 0 stand for `action successfully completed' messages, 1 means `table entry', 8 `runtime error' and 9 `syntax error'.

Each CLI session is internally represented by a cli structure and a resource pool containing all resources associated with the connection, so that it can be easily freed whenever the connection gets closed, not depending on the current state of command processing.

The CLI commands are declared as a part of the configuration grammar by using the CF_CLI macro. When a command is received, it is processed by the same lexical analyzer and parser as used for the configuration, but it's switched to a special mode by prepending a fake token to the text, so that it uses only the CLI command rules. Then the parser invokes an execution routine corresponding to the command, which either constructs the whole reply and returns it back or (in case it expects the reply will be long) it prints a partial reply and asks the CLI module (using the cont hook) to call it again when the output is transferred to the user.

The this_cli variable points to a cli structure of the session being currently parsed, but it's of course available only in command handlers not entered using the cont hook.

TX buffer management works as follows: At cli.tx_buf there is a list of TX buffers (struct cli_out), cli.tx_write is the buffer currently used by the producer (cli_printf(), cli_alloc_out()) and cli.tx_pos is the buffer currently used by the consumer (cli_write(), in system dependent code). The producer uses cli_out.wpos ptr as the current write position and the consumer uses cli_out.outpos ptr as the current read position. When the producer produces something, it calls cli_write_trigger(). If there is not enough space in the current buffer, the producer allocates the new one. When the consumer processes everything in the buffer queue, it calls cli_written(), tha frees all buffers (except the first one) and schedules cli.event .

Function

void cli_vprintf (cli * c, int code, const char * msg, va_list args) -- send reply to a CLI connection

Arguments

cli * c: CLI connection
int code: numeric code of the reply, negative for continuation lines
const char * msg: a printf()-like formatting string.
va_list args: -- undescribed --

Description

This function send a single line of reply to a given CLI connection. In works in all aspects like bsprintf() except that it automatically prepends the reply line prefix.

Please note that if the connection can be already busy sending some data in which case cli_printf() stores the output to a temporary buffer, so please avoid sending a large batch of replies without waiting for the buffers to be flushed.

struct object_lock * l: the lock to acquire

Description

This function attempts to acquire exclusive access to the non-shareable resource described by the lock l. It returns immediately, but as soon as the resource becomes available, it calls the hook() function set up by the caller.

When you want to release the resource, just rfree() the lock.

Function

void olock_init (void) -- initialize the object lock mechanism

Description

This function is called during BIRD startup. It initializes all the internal data structures of the lock module.

Next Previous Contents

Web created by Feela