5. Protocols
5.1 The Babel protocol
The Babel is a loop-avoiding distance-vector routing protocol that is robust and efficient both in ordinary wired networks and in wireless mesh networks.
The Babel protocol keeps state for each neighbour in a babel_neighbor struct, tracking received Hello and I Heard You (IHU) messages. A babel_interface struct keeps hello and update times for each interface, and a separate hello seqno is maintained for each interface.
For each prefix, Babel keeps track of both the possible routes (with next hop and router IDs), as well as the feasibility distance for each prefix and router id. The prefix itself is tracked in a babel_entry struct, while the possible routes for the prefix are tracked as babel_route entries and the feasibility distance is maintained through babel_source structures.
The main route selection is done in babel_select_route(). This is called when an entry is updated by receiving updates from the network or when modified by internal timers. The function selects from feasible and reachable routes the one with the lowest metric to be announced to the core.
Supported standards: RFC 8966 - The Babel Routing Protocol RFC 8967 - MAC Authentication for Babel RFC 9079 - Source Specific Routing for Babel RFC 9229 - IPv4 Routes with IPv6 Next Hop for Babel
Function
void babel_announce_rte (struct babel_proto * p, struct babel_entry * e) -- announce selected route to the core
Arguments
- struct babel_proto * p
Babel protocol instance
- struct babel_entry * e
Babel route entry to announce
Description
This function announces a Babel entry to the core if it has a selected incoming path, and retracts it otherwise. If there is no selected route but the entry is valid and ours, the unreachable route is announced instead.
Function
void babel_select_route (struct babel_proto * p, struct babel_entry * e, struct babel_route * mod) -- select best route for given route entry
Arguments
- struct babel_proto * p
Babel protocol instance
- struct babel_entry * e
Babel entry to select the best route for
- struct babel_route * mod
Babel route that was modified or NULL if unspecified
Description
Select the best reachable and feasible route for a given prefix among the routes received from peers, and propagate it to the nest. This just selects the reachable and feasible route with the lowest metric, but keeps selected the old one in case of tie.
If no feasible route is available for a prefix that previously had a route selected, a seqno request is sent to try to get a valid route. If the entry is valid and not owned by us, the unreachable route is announced to the nest (to blackhole packets going to it, as per section 2.8). It is later removed by babel_expire_routes(). Otherwise, the route is just removed from the nest.
Argument mod is used to optimize best route calculation. When specified, the function can assume that only the mod route was modified to avoid full best route selection and announcement when non-best route was modified in minor way. The caller is advised to not call babel_select_route() when no change is done (e.g. periodic route updates) to avoid unnecessary announcements of the same best route. The caller is not required to call the function in case of a retraction of a non-best route.
Note that the function does not active triggered updates. That is done by babel_rt_notify() when the change is propagated back to Babel.
Function
void babel_send_update_ (struct babel_iface * ifa, btime changed, struct fib * rtable) -- send route table updates
Arguments
- struct babel_iface * ifa
Interface to transmit on
- btime changed
Only send entries changed since this time
- struct fib * rtable
-- undescribed --
Description
This function produces update TLVs for all entries changed since the time indicated by the changed parameter and queues them for transmission on the selected interface. During the process, the feasibility distance for each transmitted entry is updated.
Function
void babel_handle_update (union babel_msg * m, struct babel_iface * ifa) -- handle incoming route updates
Arguments
- union babel_msg * m
Incoming update TLV
- struct babel_iface * ifa
Interface the update was received on
Description
This function is called as a handler for update TLVs and handles the updating and maintenance of route entries in Babel's internal routing cache. The handling follows the actions described in the Babel RFC, and at the end of each update handling, babel_select_route() is called on the affected entry to optionally update the selected routes and propagate them to the core.
Function
void babel_auth_reset_index (struct babel_iface * ifa) -- Reset authentication index on interface
Arguments
- struct babel_iface * ifa
Interface to reset
Description
This function resets the authentication index and packet counter for an interface, and should be called on interface configuration, or when the packet counter overflows.
Function
void babel_iface_timer (timer * t) -- Babel interface timer handler
Arguments
- timer * t
Timer
Description
This function is called by the per-interface timer and triggers sending of periodic Hello's and both triggered and periodic updates. Periodic Hello's and updates are simply handled by setting the next_{hello,regular} variables on the interface, and triggering an update (and resetting the variable) whenever 'now' exceeds that value.
For triggered updates, babel_trigger_iface_update() will set the want_triggered field on the interface to a timestamp value. If this is set (and the next_triggered time has passed; this is a rate limiting mechanism), babel_send_update() will be called with this timestamp as the second parameter. This causes updates to be send consisting of only the routes that have changed since the time saved in want_triggered.
Mostly when an update is triggered, the route being modified will be set to the value of 'now' at the time of the trigger; the >= comparison for selecting which routes to send in the update will make sure this is included.
Function
void babel_timer (timer * t) -- global timer hook
Arguments
- timer * t
Timer
Description
This function is called by the global protocol instance timer and handles expiration of routes and neighbours as well as pruning of the seqno request cache.
Function
uint babel_write_queue (struct babel_iface * ifa, list * queue) -- Write a TLV queue to a transmission buffer
Arguments
- struct babel_iface * ifa
Interface holding the transmission buffer
- list * queue
TLV queue to write (containing internal-format TLVs)
Description
This function writes a packet to the interface transmission buffer with as many TLVs from the queue as will fit in the buffer. It returns the number of bytes written (NOT counting the packet header). The function is called by babel_send_queue() and babel_send_unicast() to construct packets for transmission, and uses per-TLV helper functions to convert the internal-format TLVs to their wire representations.
The TLVs in the queue are freed after they are written to the buffer.
Function
void babel_send_unicast (union babel_msg * msg, struct babel_iface * ifa, ip_addr dest) -- send a single TLV via unicast to a destination
Arguments
- union babel_msg * msg
TLV to send
- struct babel_iface * ifa
Interface to send via
- ip_addr dest
Destination of the TLV
Description
This function is used to send a single TLV via unicast to a designated receiver. This is used for replying to certain incoming requests, and for sending unicast requests to refresh routes before they expire.
Function
void babel_enqueue (union babel_msg * msg, struct babel_iface * ifa) -- enqueue a TLV for transmission on an interface
Arguments
- union babel_msg * msg
TLV to enqueue (in internal TLV format)
- struct babel_iface * ifa
Interface to enqueue to
Description
This function is called to enqueue a TLV for subsequent transmission on an interface. The transmission event is triggered whenever a TLV is enqueued; this ensures that TLVs will be transmitted in a timely manner, but that TLVs which are enqueued in rapid succession can be transmitted together in one packet.
Function
void babel_process_packet (struct babel_iface * ifa, struct babel_pkt_header * pkt, int len, ip_addr saddr, u16 sport, ip_addr daddr, u16 dport) -- process incoming data packet
Arguments
- struct babel_iface * ifa
Interface packet was received on
- struct babel_pkt_header * pkt
Pointer to the packet data
- int len
Length of received packet
- ip_addr saddr
Address of packet sender
- u16 sport
Packet source port
- ip_addr daddr
Destination address of packet
- u16 dport
Packet destination port
Description
This function is the main processing hook of incoming Babel packets. It checks that the packet header is well-formed, then processes the TLVs contained in the packet. This is done in two passes: First all TLVs are parsed into the internal TLV format. If a TLV parser fails, processing of the rest of the packet is aborted.
After the parsing step, the TLV handlers are called for each parsed TLV in order.
Function
int babel_auth_check (struct babel_iface * ifa, ip_addr saddr, u16 sport, ip_addr daddr, u16 dport, struct babel_pkt_header * pkt, byte * trailer, uint trailer_len) -- Check authentication for a packet
Arguments
- struct babel_iface * ifa
Interface holding the transmission buffer
- ip_addr saddr
Source address the packet was received from
- u16 sport
Source port the packet was received from
- ip_addr daddr
Destination address the packet was sent to
- u16 dport
Destination port the packet was sent to
- struct babel_pkt_header * pkt
Pointer to start of the packet data
- byte * trailer
Pointer to the packet trailer
- uint trailer_len
Length of the packet trailer
Description
This function performs any necessary authentication checks on a packet and returns 0 if the packet should be accepted (either because it has been successfully authenticated or because authentication is disabled or configured in permissive mode), or 1 if the packet should be dropped without further processing.
Function
int babel_auth_add_tlvs (struct babel_iface * ifa, struct babel_tlv * hdr, uint max_len) -- Add authentication-related TLVs to a packet
Arguments
- struct babel_iface * ifa
Interface holding the transmission buffer
- struct babel_tlv * hdr
-- undescribed --
- uint max_len
Maximum length available for adding new TLVs
Description
This function adds any new TLVs required by the authentication mode to a packet before it is shipped out. For MAC authentication, this is the packet counter TLV that must be included in every packet.
Function
int babel_auth_sign (struct babel_iface * ifa, ip_addr dest) -- Sign an outgoing packet before transmission
Arguments
- struct babel_iface * ifa
Interface holding the transmission buffer
- ip_addr dest
Destination address of the packet
Description
This function adds authentication signature(s) to the packet trailer for each of the configured authentication keys on the interface.
Function
void babel_auth_set_tx_overhead (struct babel_iface * ifa) -- Set interface TX overhead for authentication
Arguments
- struct babel_iface * ifa
Interface to configure
Description
This function sets the TX overhead for an interface based on its authentication configuration.
5.2 Bidirectional Forwarding Detection
The BFD protocol is implemented in three files: bfd.c
containing the
protocol logic and the protocol glue with BIRD core, packets.c
handling BFD
packet processing, RX, TX and protocol sockets. io.c
then contains generic
code for the event loop, threads and event sources (sockets, microsecond
timers). This generic code will be merged to the main BIRD I/O code in the
future.
The BFD implementation uses a separate thread with an internal event loop for handling the protocol logic, which requires high-res and low-latency timing, so it is not affected by the rest of BIRD, which has several low-granularity hooks in the main loop, uses second-based timers and cannot offer good latency. The core of BFD protocol (the code related to BFD sessions, interfaces and packets) runs in the BFD thread, while the rest (the code related to BFD requests, BFD neighbors and the protocol glue) runs in the main thread.
BFD sessions are represented by structure bfd_session that contains a state related to the session and two timers (TX timer for periodic packets and hold timer for session timeout). These sessions are allocated from session_slab and are accessible by two hash tables, session_hash_id (by session ID) and session_hash_ip (by IP addresses of neighbors and associated interfaces). Slab and both hashes are in the main protocol structure bfd_proto. The protocol logic related to BFD sessions is implemented in internal functions bfd_session_*(), which are expected to be called from the context of BFD thread, and external functions bfd_add_session(), bfd_remove_session() and bfd_reconfigure_session(), which form an interface to the BFD core for the rest and are expected to be called from the context of main thread.
Each BFD session has an associated BFD interface, represented by structure bfd_iface. A BFD interface contains a socket used for TX (the one for RX is shared in bfd_proto), an interface configuration and reference counter. Compared to interface structures of other protocols, these structures are not created and removed based on interface notification events, but according to the needs of BFD sessions. When a new session is created, it requests a proper BFD interface by function bfd_get_iface(), which either finds an existing one in iface_list (from bfd_proto) or allocates a new one. When a session is removed, an associated iface is discharged by bfd_free_iface().
BFD requests are the external API for the other protocols. When a protocol wants a BFD session, it calls bfd_request_session(), which creates a structure bfd_request containing approprite information and an notify hook. This structure is a resource associated with the caller's resource pool. When a BFD protocol is available, a BFD request is submitted to the protocol, an appropriate BFD session is found or created and the request is attached to the session. When a session changes state, all attached requests (and related protocols) are notified. Note that BFD requests do not depend on BFD protocol running. When the BFD protocol is stopped or removed (or not available from beginning), related BFD requests are stored in bfd_wait_list, where waits for a new protocol.
BFD neighbors are just a way to statically configure BFD sessions without requests from other protocol. Structures bfd_neighbor are part of BFD configuration (like static routes in the static protocol). BFD neighbors are handled by BFD protocol like it is a BFD client -- when a BFD neighbor is ready, the protocol just creates a BFD request like any other protocol.
The protocol uses a new generic event loop (structure birdloop) from io.c
,
which supports sockets, timers and events like the main loop. A birdloop is
associated with a thread (field thread) in which event hooks are executed.
Most functions for setting event sources (like sk_start() or tm_start()) must
be called from the context of that thread. Birdloop allows to temporarily
acquire the context of that thread for the main thread by calling
birdloop_enter() and then birdloop_leave(), which also ensures mutual
exclusion with all event hooks. Note that resources associated with a
birdloop (like timers) should be attached to the independent resource pool,
detached from the main resource tree.
There are two kinds of interaction between the BFD core (running in the BFD thread) and the rest of BFD (running in the main thread). The first kind are configuration calls from main thread to the BFD thread (like bfd_add_session()). These calls are synchronous and use birdloop_enter() mechanism for mutual exclusion. The second kind is a notification about session changes from the BFD thread to the main thread. This is done in an asynchronous way, sesions with pending notifications are linked (in the BFD thread) to notify_list in bfd_proto, and then bfd_notify_hook() in the main thread is activated using a standard event sending code. The hook then processes scheduled sessions and calls hooks from associated BFD requests. This notify_list (and state fields in structure bfd_session) is protected by a spinlock in bfd_proto and functions bfd_lock_sessions() / bfd_unlock_sessions().
There are few data races (accessing p->p.debug from TRACE() from the BFD thread and accessing some some private fields of bfd_session from bfd_show_sessions() from the main thread, but these are harmless (i hope).
TODO: document functions and access restrictions for fields in BFD structures.
Supported standards: - RFC 5880 - main BFD standard - RFC 5881 - BFD for IP links - RFC 5882 - generic application of BFD - RFC 5883 - BFD for multihop paths
5.3 Border Gateway Protocol
The BGP protocol is implemented in three parts: bgp.c
which takes care of
the connection and most of the interface with BIRD core, packets.c
handling
both incoming and outgoing BGP packets and attrs.c
containing functions for
manipulation with BGP attribute lists.
As opposed to the other existing routing daemons, BIRD has a sophisticated core architecture which is able to keep all the information needed by BGP in the primary routing table, therefore no complex data structures like a central BGP table are needed. This increases memory footprint of a BGP router with many connections, but not too much and, which is more important, it makes BGP much easier to implement.
Each instance of BGP (corresponding to a single BGP peer) is described by a bgp_proto structure to which are attached individual connections represented by bgp_connection (usually, there exists only one connection, but during BGP session setup, there can be more of them). The connections are handled according to the BGP state machine defined in the RFC with all the timers and all the parameters configurable.
In incoming direction, we listen on the connection's socket and each time we receive some input, we pass it to bgp_rx(). It decodes packet headers and the markers and passes complete packets to bgp_rx_packet() which distributes the packet according to its type.
In outgoing direction, we gather all the routing updates and sort them to buckets (bgp_bucket) according to their attributes (we keep a hash table for fast comparison of rta's and a fib which helps us to find if we already have another route for the same destination queued for sending, so that we can replace it with the new one immediately instead of sending both updates). There also exists a special bucket holding all the route withdrawals which cannot be queued anywhere else as they don't have any attributes. If we have any packet to send (due to either new routes or the connection tracking code wanting to send a Open, Keepalive or Notification message), we call bgp_schedule_packet() which sets the corresponding bit in a packet_to_send bit field in bgp_conn and as soon as the transmit socket buffer becomes empty, we call bgp_fire_tx(). It inspects state of all the packet type bits and calls the corresponding bgp_create_xx() functions, eventually rescheduling the same packet type if we have more data of the same type to send.
The processing of attributes consists of two functions: bgp_decode_attrs() for checking of the attribute blocks and translating them to the language of BIRD's extended attributes and bgp_encode_attrs() which does the converse. Both functions are built around a bgp_attr_table array describing all important characteristics of all known attributes. Unknown transitive attributes are attached to the route as EAF_TYPE_OPAQUE byte streams.
BGP protocol implements graceful restart in both restarting (local restart) and receiving (neighbor restart) roles. The first is handled mostly by the graceful restart code in the nest, BGP protocol just handles capabilities, sets gr_wait and locks graceful restart until end-of-RIB mark is received. The second is implemented by internal restart of the BGP state to BS_IDLE and protocol state to PS_START, but keeping the protocol up from the core point of view and therefore maintaining received routes. Routing table refresh cycle (rt_refresh_begin(), rt_refresh_end()) is used for removing stale routes after reestablishment of BGP session during graceful restart.
Supported standards: RFC 4271 - Border Gateway Protocol 4 (BGP) RFC 1997 - BGP Communities Attribute RFC 2385 - Protection of BGP Sessions via TCP MD5 Signature RFC 2545 - Use of BGP Multiprotocol Extensions for IPv6 RFC 2918 - Route Refresh Capability RFC 3107 - Carrying Label Information in BGP RFC 4360 - BGP Extended Communities Attribute RFC 4364 - BGP/MPLS IPv4 Virtual Private Networks RFC 4456 - BGP Route Reflection RFC 4486 - Subcodes for BGP Cease Notification Message RFC 4659 - BGP/MPLS IPv6 Virtual Private Networks RFC 4724 - Graceful Restart Mechanism for BGP RFC 4760 - Multiprotocol extensions for BGP RFC 4798 - Connecting IPv6 Islands over IPv4 MPLS RFC 5065 - AS confederations for BGP RFC 5082 - Generalized TTL Security Mechanism RFC 5492 - Capabilities Advertisement with BGP RFC 5575 - Dissemination of Flow Specification Rules RFC 5668 - 4-Octet AS Specific BGP Extended Community RFC 6286 - AS-Wide Unique BGP Identifier RFC 6608 - Subcodes for BGP Finite State Machine Error RFC 6793 - BGP Support for 4-Octet AS Numbers RFC 7311 - Accumulated IGP Metric Attribute for BGP RFC 7313 - Enhanced Route Refresh Capability for BGP RFC 7606 - Revised Error Handling for BGP UPDATE Messages RFC 7911 - Advertisement of Multiple Paths in BGP RFC 7947 - Internet Exchange BGP Route Server RFC 8092 - BGP Large Communities Attribute RFC 8203 - BGP Administrative Shutdown Communication RFC 8212 - Default EBGP Route Propagation Behavior without Policies RFC 8654 - Extended Message Support for BGP RFC 8950 - Advertising IPv4 NLRI with an IPv6 Next Hop RFC 9072 - Extended Optional Parameters Length for BGP OPEN Message RFC 9117 - Revised Validation Procedure for BGP Flow Specifications RFC 9234 - Route Leak Prevention and Detection Using Roles RFC 9687 - Send Hold Timer draft-uttaro-idr-bgp-persistence-04 draft-walton-bgp-hostname-capability-02
Function
void bgp_close (struct bgp_proto * p) -- close a BGP instance
Arguments
- struct bgp_proto * p
BGP instance
Description
This function frees and deconfigures shared BGP resources.
Function
void bgp_open (struct bgp_proto * p) -- open a BGP instance
Arguments
- struct bgp_proto * p
BGP instance
Description
This function allocates and configures shared BGP resources, mainly listening sockets. Should be called as the last step during initialization (when lock is acquired and neighbor is ready). When error, caller should change state to PS_DOWN and return immediately.
Function
void bgp_start_timer (struct bgp_proto * p, timer * t, uint value) -- start a BGP timer
Arguments
- struct bgp_proto * p
-- undescribed --
- timer * t
timer
- uint value
time (in seconds) to fire (0 to disable the timer)
Description
This functions calls tm_start() on t with time value and the amount of randomization suggested by the BGP standard. Please use it for all BGP timers.
Function
void bgp_close_conn (struct bgp_conn * conn) -- close a BGP connection
Arguments
- struct bgp_conn * conn
connection to close
Description
This function takes a connection described by the bgp_conn structure, closes its socket and frees all resources associated with it.
Function
void bgp_update_startup_delay (struct bgp_proto * p) -- update a startup delay
Arguments
- struct bgp_proto * p
BGP instance
Description
This function updates a startup delay that is used to postpone next BGP connect. It also handles disable_after_error and might stop BGP instance when error happened and disable_after_error is on.
It should be called when BGP protocol error happened.
Function
void bgp_handle_graceful_restart (struct bgp_proto * p) -- handle detected BGP graceful restart
Arguments
- struct bgp_proto * p
BGP instance
Description
This function is called when a BGP graceful restart of the neighbor is detected (when the TCP connection fails or when a new TCP connection appears). The function activates processing of the restart - starts routing table refresh cycle and activates BGP restart timer. The protocol state goes back to PS_START, but changing BGP state back to BS_IDLE is left for the caller.
Function
void bgp_graceful_restart_done (struct bgp_channel * c) -- finish active BGP graceful restart
Arguments
- struct bgp_channel * c
BGP channel
Description
This function is called when the active BGP graceful restart of the neighbor should be finished for channel c - either successfully (the neighbor sends all paths and reports end-of-RIB for given AFI/SAFI on the new session) or unsuccessfully (the neighbor does not support BGP graceful restart on the new session). The function ends the routing table refresh cycle.
Function
void bgp_graceful_restart_timeout (timer * t) -- timeout of graceful restart 'restart timer'
Arguments
- timer * t
timer
Description
This function is a timeout hook for gr_timer, implementing BGP restart time limit for reestablisment of the BGP session after the graceful restart. When fired, we just proceed with the usual protocol restart.
Function
void bgp_refresh_begin (struct bgp_channel * c) -- start incoming enhanced route refresh sequence
Arguments
- struct bgp_channel * c
BGP channel
Description
This function is called when an incoming enhanced route refresh sequence is started by the neighbor, demarcated by the BoRR packet. The function updates the load state and starts the routing table refresh cycle. Note that graceful restart also uses routing table refresh cycle, but RFC 7313 and load states ensure that these two sequences do not overlap.
Function
void bgp_refresh_end (struct bgp_channel * c) -- finish incoming enhanced route refresh sequence
Arguments
- struct bgp_channel * c
BGP channel
Description
This function is called when an incoming enhanced route refresh sequence is finished by the neighbor, demarcated by the EoRR packet. The function updates the load state and ends the routing table refresh cycle. Routes not received during the sequence are removed by the nest.
Function
void bgp_connect (struct bgp_proto * p) -- initiate an outgoing connection
Arguments
- struct bgp_proto * p
BGP instance
Description
The bgp_connect() function creates a new bgp_conn and initiates a TCP connection to the peer. The rest of connection setup is governed by the BGP state machine as described in the standard.
Function
struct bgp_proto * bgp_find_proto (sock * sk) -- find existing proto for incoming connection
Arguments
- sock * sk
TCP socket
Function
int bgp_incoming_connection (sock * sk, uint dummy UNUSED) -- handle an incoming connection
Arguments
- sock * sk
TCP socket
- uint dummy UNUSED
-- undescribed --
Description
This function serves as a socket hook for accepting of new BGP connections. It searches a BGP instance corresponding to the peer which has connected and if such an instance exists, it creates a bgp_conn structure, attaches it to the instance and either sends an Open message or (if there already is an active connection) it closes the new connection by sending a Notification message.
Function
void bgp_error (struct bgp_conn * c, uint code, uint subcode, byte * data, int len) -- report a protocol error
Arguments
- struct bgp_conn * c
connection
- uint code
error code (according to the RFC)
- uint subcode
error sub-code
- byte * data
data to be passed in the Notification message
- int len
length of the data
Description
bgp_error() sends a notification packet to tell the other side that a protocol error has occurred (including the data considered erroneous if possible) and closes the connection.
Function
void bgp_store_error (struct bgp_proto * p, struct bgp_conn * c, u8 class, u32 code) -- store last error for status report
Arguments
- struct bgp_proto * p
BGP instance
- struct bgp_conn * c
connection
- u8 class
error class (BE_xxx constants)
- u32 code
error code (class specific)
Description
bgp_store_error() decides whether given error is interesting enough and store that error to last_error variables of p
Function
int bgp_fire_tx (struct bgp_conn * conn) -- transmit packets
Arguments
- struct bgp_conn * conn
connection
Description
Whenever the transmit buffers of the underlying TCP connection are free and we have any packets queued for sending, the socket functions call bgp_fire_tx() which takes care of selecting the highest priority packet queued (Notification > Keepalive > Open > Update), assembling its header and body and sending it to the connection.
Function
void bgp_schedule_packet (struct bgp_conn * conn, struct bgp_channel * c, int type) -- schedule a packet for transmission
Arguments
- struct bgp_conn * conn
connection
- struct bgp_channel * c
channel
- int type
packet type
Description
Schedule a packet of type type to be sent as soon as possible.
Function
const char * bgp_error_dsc (uint code, uint subcode) -- return BGP error description
Arguments
- uint code
BGP error code
- uint subcode
BGP error subcode
Description
bgp_error_dsc() returns error description for BGP errors which might be static string or given temporary buffer.
Function
void bgp_rx_packet (struct bgp_conn * conn, byte * pkt, uint len) -- handle a received packet
Arguments
- struct bgp_conn * conn
BGP connection
- byte * pkt
start of the packet
- uint len
packet size
Description
bgp_rx_packet() takes a newly received packet and calls the corresponding packet handler according to the packet type.
Function
int bgp_rx (sock * sk, uint size) -- handle received data
Arguments
- sock * sk
socket
- uint size
amount of data received
Description
bgp_rx() is called by the socket layer whenever new data arrive from the underlying TCP connection. It assembles the data fragments to packets, checks their headers and framing and passes complete packets to bgp_rx_packet().
Function
ea_list * bgp_export_attrs (struct bgp_export_state * s, ea_list * a) -- export BGP attributes
Arguments
- struct bgp_export_state * s
BGP export state
- ea_list * a
-- undescribed --
Description
The bgp_export_attrs() function takes a list of attributes and merges it to one newly allocated and sorted segment. Attributes are validated and normalized by type-specific export hooks and attribute flags are updated. Some attributes may be eliminated (e.g. unknown non-tranitive attributes, or empty community sets).
Result
one sorted attribute list segment, or NULL if attributes are unsuitable.
Function
int bgp_encode_attrs (struct bgp_write_state * s, ea_list * attrs, byte * buf, byte * end) -- encode BGP attributes
Arguments
- struct bgp_write_state * s
BGP write state
- ea_list * attrs
a list of extended attributes
- byte * buf
buffer
- byte * end
buffer end
Description
The bgp_encode_attrs() function takes a list of extended attributes and converts it to its BGP representation (a part of an Update message). BGP write state may be fake when called from MRT protocol.
Result
Length of the attribute block generated or -1 if not enough space.
Function
ea_list * bgp_decode_attrs (struct bgp_parse_state * s, byte * data, uint len) -- check and decode BGP attributes
Arguments
- struct bgp_parse_state * s
BGP parse state
- byte * data
start of attribute block
- uint len
length of attribute block
Description
This function takes a BGP attribute block (a part of an Update message), checks its consistency and converts it to a list of BIRD route attributes represented by an (uncached) rta.
5.4 BGP Monitoring Protocol (BMP)
Supported standards: o RFC 7854 - BMP standard
TODO: - Support Peer Distinguisher ID in Per-Peer Header - Support peer type as RD Instance in Peer Type field of Per-Peer Header. Currently, there are supported Global and Local Instance Peer types - Support corresponding FSM event code during send PEER DOWN NOTIFICATION - Support DE_CONFIGURED PEER DOWN REASON code in PEER DOWN NOTIFICATION message - If connection with BMP collector will lost then we don't establish connection again - Set Peer Type by its a global and local-scope IP address
The BMP session is managed by a simple state machine with three states: Idle (!started, !sk), Connect (!started, sk active), and Established (started). It has three events: connect successful (Connect -> Established), socket error (any -> Idle), and connect timeout (Idle/Connect -> Connect, resetting the TCP socket).
Function
void bmp_put_per_peer_hdr (buffer * stream, const struct bmp_peer_hdr_info * peer) -- serializes Per-Peer Header
Arguments
- buffer * stream
-- undescribed --
- const struct bmp_peer_hdr_info * peer
-- undescribed --
BGP Monitoring Protocol (BMP)
Function
void bmp_startup (struct bmp_proto * p) -- enter established state
Arguments
- struct bmp_proto * p
BMP instance
Description
The bgp_startup() function is called when the BMP session is established. It sends initiation and peer up messagages.
Function
void bmp_down (struct bmp_proto * p) -- leave established state
Arguments
- struct bmp_proto * p
BMP instance
Description
The bgp_down() function is called when the BMP session fails. The caller is responsible for changing protocol state.
Function
void bmp_connect (struct bmp_proto * p) -- initiate an outgoing connection
Arguments
- struct bmp_proto * p
BMP instance
Description
The bmp_connect() function creates the socket and initiates an outgoing TCP connection to the monitoring station. It is called to enter Connect state.
Function
int bmp_start (struct proto * P) -- initialize internal resources of BMP implementation.
Arguments
- struct proto * P
-- undescribed --
NOTE
It does not connect to BMP collector yet.
5.5 Open Shortest Path First (OSPF)
The OSPF protocol is quite complicated and its complex implemenation is split
to many files. In ospf.c
, you will find mainly the interface for
communication with the core (e.g., reconfiguration hooks, shutdown and
initialisation and so on). File iface.c
contains the interface state
machine and functions for allocation and deallocation of OSPF's interface
data structures. Source neighbor.c
includes the neighbor state machine and
functions for election of Designated Router and Backup Designated router. In
packet.c
, you will find various functions for sending and receiving generic
OSPF packets. There are also routines for authentication and checksumming.
In hello.c
, there are routines for sending and receiving of hello packets
as well as functions for maintaining wait times and the inactivity timer.
Files lsreq.c
, lsack.c
, dbdes.c
contain functions for sending and
receiving of link-state requests, link-state acknowledgements and database
descriptions respectively. In lsupd.c
, there are functions for sending and
receiving of link-state updates and also the flooding algorithm. Source
topology.c
is a place where routines for searching LSAs in the link-state
database, adding and deleting them reside, there also are functions for
originating of various types of LSAs (router LSA, net LSA, external LSA).
File rt.c
contains routines for calculating the routing table. lsalib.c
is a set of various functions for working with the LSAs (endianity
conversions, calculation of checksum etc.).
One instance of the protocol is able to hold LSA databases for multiple OSPF areas, to exchange routing information between multiple neighbors and to calculate the routing tables. The core structure is ospf_proto to which multiple ospf_area and ospf_iface structures are connected. ospf_proto is also connected to top_hash_graph which is a dynamic hashing structure that describes the link-state database. It allows fast search, addition and deletion. Each LSA is kept in two pieces: header and body. Both of them are kept in the endianity of the CPU.
In OSPFv2 specification, it is implied that there is one IP prefix for each physical network/interface (unless it is an ptp link). But in modern systems, there might be more independent IP prefixes associated with an interface. To handle this situation, we have one ospf_iface for each active IP prefix (instead for each active iface); This behaves like virtual interface for the purpose of OSPF. If we receive packet, we associate it with a proper virtual interface mainly according to its source address.
OSPF keeps one socket per ospf_iface. This allows us (compared to one socket approach) to evade problems with a limit of multicast groups per socket and with sending multicast packets to appropriate interface in a portable way. The socket is associated with underlying physical iface and should not receive packets received on other ifaces (unfortunately, this is not true on BSD). Generally, one packet can be received by more sockets (for example, if there are more ospf_iface on one physical iface), therefore we explicitly filter received packets according to src/dst IP address and received iface.
Vlinks are implemented using particularly degenerate form of ospf_iface, which has several exceptions: it does not have its iface or socket (it copies these from 'parent' ospf_iface) and it is present in iface list even when down (it is not freed in ospf_iface_down()).
The heart beat of ospf is ospf_disp(). It is called at regular intervals (ospf_proto->tick). It is responsible for aging and flushing of LSAs in the database, updating topology information in LSAs and for routing table calculation.
To every ospf_iface, we connect one or more ospf_neighbor's -- a structure containing many timers and queues for building adjacency and for exchange of routing messages.
BIRD's OSPF implementation respects RFC2328 in every detail, but some of internal algorithms do differ. The RFC recommends making a snapshot of the link-state database when a new adjacency is forming and sending the database description packets based on the information in this snapshot. The database can be quite large in some networks, so rather we walk through a slist structure which allows us to continue even if the actual LSA we were working with is deleted. New LSAs are added at the tail of this slist.
We also do not keep a separate OSPF routing table, because the core helps us by being able to recognize when a route is updated to an identical one and it suppresses the update automatically. Due to this, we can flush all the routes we have recalculated and also those we have deleted to the core's routing table and the core will take care of the rest. This simplifies the process and conserves memory.
Supported standards: - RFC 2328 - main OSPFv2 standard - RFC 5340 - main OSPFv3 standard - RFC 3101 - OSPFv2 NSSA areas - RFC 3623 - OSPFv2 Graceful Restart - RFC 4576 - OSPFv2 VPN loop prevention - RFC 5187 - OSPFv3 Graceful Restart - RFC 5250 - OSPFv2 Opaque LSAs - RFC 5709 - OSPFv2 HMAC-SHA Cryptographic Authentication - RFC 5838 - OSPFv3 Support of Address Families - RFC 6549 - OSPFv2 Multi-Instance Extensions - RFC 6987 - OSPF Stub Router Advertisement - RFC 7166 - OSPFv3 Authentication Trailer - RFC 7770 - OSPF Router Information LSA
Function
void ospf_disp (timer * timer) -- invokes routing table calculation, aging and also area_disp()
Arguments
- timer * timer
timer usually called every ospf_proto->tick second, timer->data point to ospf_proto
Function
int ospf_preexport (struct channel * C, rte * e) -- accept or reject new route from nest's routing table
Arguments
- struct channel * C
-- undescribed --
- rte * e
-- undescribed --
Description
Its quite simple. It does not accept our own routes and leaves the decision on import to the filters.
Function
int ospf_shutdown (struct proto * P) -- Finish of OSPF instance
Arguments
- struct proto * P
OSPF protocol instance
Description
RFC does not define any action that should be taken before router shutdown. To make my neighbors react as fast as possible, I send them hello packet with empty neighbor list. They should start their neighbor state machine with event NEIGHBOR_1WAY.
Function
int ospf_reconfigure (struct proto * P, struct proto_config * CF) -- reconfiguration hook
Arguments
- struct proto * P
current instance of protocol (with old configuration)
- struct proto_config * CF
-- undescribed --
Description
This hook tries to be a little bit intelligent. Instance of OSPF will survive change of many constants like hello interval, password change, addition or deletion of some neighbor on nonbroadcast network, cost of interface, etc.
Function
struct top_hash_entry * ospf_install_lsa (struct ospf_proto * p, struct ospf_lsa_header * lsa, u32 type, u32 domain, void * body) -- install new LSA into database
Arguments
- struct ospf_proto * p
OSPF protocol instance
- struct ospf_lsa_header * lsa
LSA header
- u32 type
type of LSA
- u32 domain
domain of LSA
- void * body
pointer to LSA body
Description
This function ensures installing new LSA received in LS update into LSA database. Old instance is replaced. Several actions are taken to detect if new routing table calculation is necessary. This is described in 13.2 of RFC 2328. This function is for received LSA only, locally originated LSAs are installed by ospf_originate_lsa().
The LSA body in body is expected to be mb_allocated by the caller and its ownership is transferred to the LSA entry structure.
Function
void ospf_advance_lsa (struct ospf_proto * p, struct top_hash_entry * en, struct ospf_lsa_header * lsa, u32 type, u32 domain, void * body) -- handle received unexpected self-originated LSA
Arguments
- struct ospf_proto * p
OSPF protocol instance
- struct top_hash_entry * en
current LSA entry or NULL
- struct ospf_lsa_header * lsa
new LSA header
- u32 type
type of LSA
- u32 domain
domain of LSA
- void * body
pointer to LSA body
Description
This function handles received unexpected self-originated LSA (lsa, body) by either advancing sequence number of the local LSA instance (en) and propagating it, or installing the received LSA and immediately flushing it (if there is no local LSA; i.e., en is NULL or MaxAge).
The LSA body in body is expected to be mb_allocated by the caller and its ownership is transferred to the LSA entry structure or it is freed.
Function
struct top_hash_entry * ospf_originate_lsa (struct ospf_proto * p, struct ospf_new_lsa * lsa) -- originate new LSA
Arguments
- struct ospf_proto * p
OSPF protocol instance
- struct ospf_new_lsa * lsa
New LSA specification
Description
This function prepares a new LSA, installs it into the LSA database and floods it. If the new LSA cannot be originated now (because the old instance was originated within MinLSInterval, or because the LSA seqnum is currently wrapping), the origination is instead scheduled for later. If the new LSA is equivalent to the current LSA, the origination is skipped. In all cases, the corresponding LSA entry is returned. The new LSA is based on the LSA specification (lsa) and the LSA body from lsab buffer of p, which is emptied after the call. The opposite of this function is ospf_flush_lsa().
Function
void ospf_flush_lsa (struct ospf_proto * p, struct top_hash_entry * en) -- flush LSA from OSPF domain
Arguments
- struct ospf_proto * p
OSPF protocol instance
- struct top_hash_entry * en
LSA entry to flush
Description
This function flushes en from the OSPF domain by setting its age to LSA_MAXAGE and flooding it. That also triggers subsequent events in LSA lifecycle leading to removal of the LSA from the LSA database (e.g. the LSA content is freed when flushing is acknowledged by neighbors). The function does nothing if the LSA is already being flushed. LSA entries are not immediately removed when being flushed, the caller may assume that en still exists after the call. The function is the opposite of ospf_originate_lsa() and is supposed to do the right thing even in cases of postponed origination.
Function
void ospf_update_lsadb (struct ospf_proto * p) -- update LSA database
Arguments
- struct ospf_proto * p
OSPF protocol instance
Description
This function is periodicaly invoked from ospf_disp(). It does some periodic or postponed processing related to LSA entries. It originates postponed LSAs scheduled by ospf_originate_lsa(), It continues in flushing processes started by ospf_flush_lsa(). It also periodically refreshs locally originated LSAs -- when the current instance is older LSREFRESHTIME, a new instance is originated. Finally, it also ages stored LSAs and flushes ones that reached LSA_MAXAGE.
The RFC 2328 says that a router should periodically check checksums of all stored LSAs to detect hardware problems. This is not implemented.
Function
void ospf_originate_ext_lsa (struct ospf_proto * p, struct ospf_area * oa, ort * nf, u8 mode, u32 metric, u32 ebit, ip_addr fwaddr, u32 tag, int pbit, int dn) -- new route received from nest and filters
Arguments
- struct ospf_proto * p
OSPF protocol instance
- struct ospf_area * oa
ospf_area for which LSA is originated
- ort * nf
network prefix and mask
- u8 mode
the mode of the LSA (LSA_M_EXPORT or LSA_M_RTCALC)
- u32 metric
the metric of a route
- u32 ebit
E-bit for route metric (bool)
- ip_addr fwaddr
the forwarding address
- u32 tag
the route tag
- int pbit
P-bit for NSSA LSAs (bool), ignored for external LSAs
- int dn
-- undescribed --
Description
If I receive a message that new route is installed, I try to originate an external LSA. If oa is an NSSA area, NSSA-LSA is originated instead. oa should not be a stub area. src does not specify whether the LSA is external or NSSA, but it specifies the source of origination - the export from ospf_rt_notify(), or the NSSA-EXT translation.
Function
struct top_graph * ospf_top_new (struct ospf_proto * p, pool * pool) -- allocated new topology database
Arguments
- struct ospf_proto * p
OSPF protocol instance
- pool * pool
pool for allocation
Description
This dynamically hashed structure is used for keeping LSAs. Mainly it is used for the LSA database of the OSPF protocol, but also for LSA retransmission and request lists of OSPF neighbors.
Function
void ospf_neigh_chstate (struct ospf_neighbor * n, u8 state) -- handles changes related to new or lod state of neighbor
Arguments
- struct ospf_neighbor * n
OSPF neighbor
- u8 state
new state
Description
Many actions have to be taken acording to a change of state of a neighbor. It starts rxmt timers, call interface state machine etc.
Function
void ospf_neigh_sm (struct ospf_neighbor * n, int event) -- ospf neighbor state machine
Arguments
- struct ospf_neighbor * n
neighor
- int event
actual event
Description
This part implements the neighbor state machine as described in 10.3 of RFC 2328. The only difference is that state NEIGHBOR_ATTEMPT is not used. We discover neighbors on nonbroadcast networks in the same way as on broadcast networks. The only difference is in sending hello packets. These are sent to IPs listed in ospf_iface->nbma_list .
Function
void ospf_dr_election (struct ospf_iface * ifa) -- (Backup) Designed Router election
Arguments
- struct ospf_iface * ifa
actual interface
Description
When the wait timer fires, it is time to elect (Backup) Designated Router. Structure describing me is added to this list so every electing router has the same list. Backup Designated Router is elected before Designated Router. This process is described in 9.4 of RFC 2328. The function is supposed to be called only from ospf_iface_sm() as a part of the interface state machine.
Function
void ospf_iface_chstate (struct ospf_iface * ifa, u8 state) -- handle changes of interface state
Arguments
- struct ospf_iface * ifa
OSPF interface
- u8 state
new state
Description
Many actions must be taken according to interface state changes. New network LSAs must be originated, flushed, new multicast sockets to listen for messages for ALLDROUTERS have to be opened, etc.
Function
void ospf_iface_sm (struct ospf_iface * ifa, int event) -- OSPF interface state machine
Arguments
- struct ospf_iface * ifa
OSPF interface
- int event
event comming to state machine
Description
This fully respects 9.3 of RFC 2328 except we have slightly different handling of DOWN and LOOP state. We remove intefaces that are DOWN. DOWN state is used when an interface is waiting for a lock. LOOP state is used when an interface does not have a link.
Function
int ospf_rx_hook (sock * sk, uint len)
Arguments
- sock * sk
socket we received the packet.
- uint len
length of the packet
Description
This is the entry point for messages from neighbors. Many checks (like authentication, checksums, size) are done before the packet is passed to non generic functions.
Function
int lsa_validate (struct ospf_lsa_header * lsa, u32 lsa_type, int ospf2, void * body) -- check whether given LSA is valid
Arguments
- struct ospf_lsa_header * lsa
LSA header
- u32 lsa_type
internal LSA type (LSA_T_xxx)
- int ospf2
true for OSPFv2, false for OSPFv3
- void * body
pointer to LSA body
Description
Checks internal structure of given LSA body (minimal length, consistency). Returns true if valid.
Function
void ospf_send_dbdes (struct ospf_proto * p, struct ospf_neighbor * n) -- transmit database description packet
Arguments
- struct ospf_proto * p
OSPF protocol instance
- struct ospf_neighbor * n
neighbor
Description
Sending of a database description packet is described in 10.8 of RFC 2328. Reception of each packet is acknowledged in the sequence number of another. When I send a packet to a neighbor I keep a copy in a buffer. If the neighbor does not reply, I don't create a new packet but just send the content of the buffer.
Function
void ospf_rt_spf (struct ospf_proto * p) -- calculate internal routes
Arguments
- struct ospf_proto * p
OSPF protocol instance
Description
Calculation of internal paths in an area is described in 16.1 of RFC 2328. It's based on Dijkstra's shortest path tree algorithms. This function is invoked from ospf_disp().
5.6 Pipe
The Pipe protocol is very simple. It just connects to two routing tables using proto_add_announce_hook() and whenever it receives a rt_notify() about a change in one of the tables, it converts it to a rte_update() in the other one.
To avoid pipe loops, Pipe keeps a `being updated' flag in each routing table.
A pipe has two announce hooks, the first connected to the main table, the second connected to the peer table. When a new route is announced on the main table, it gets checked by an export filter in ahook 1, and, after that, it is announced to the peer table via rte_update(), an import filter in ahook 2 is called. When a new route is announced in the peer table, an export filter in ahook2 and an import filter in ahook 1 are used. Oviously, there is no need in filtering the same route twice, so both import filters are set to accept, while user configured 'import' and 'export' filters are used as export filters in ahooks 2 and 1. Route limits are handled similarly, but on the import side of ahooks.
5.7 Router Advertisements
The RAdv protocol is implemented in two files: radv.c
containing the
interface with BIRD core and the protocol logic and packets.c
handling low
level protocol stuff (RX, TX and packet formats). The protocol does not
export any routes.
The RAdv is structured in the usual way - for each handled interface there is a structure radv_iface that contains a state related to that interface together with its resources (a socket, a timer). There is also a prepared RA stored in a TX buffer of the socket associated with an iface. These iface structures are created and removed according to iface events from BIRD core handled by radv_if_notify() callback.
The main logic of RAdv consists of two functions: radv_iface_notify(), which processes asynchronous events (specified by RA_EV_* codes), and radv_timer(), which triggers sending RAs and computes the next timeout.
The RAdv protocol could receive routes (through radv_preexport() and radv_rt_notify()), but only the configured trigger route is tracked (in active var). When a radv protocol is reconfigured, the connected routing table is examined (in radv_check_active()) to have proper active value in case of the specified trigger prefix was changed.
Supported standards: RFC 4861 - main RA standard RFC 4191 - Default Router Preferences and More-Specific Routes RFC 6106 - DNS extensions (RDDNS, DNSSL)
5.8 Routing Information Protocol (RIP)
The RIP protocol is implemented in two files: rip.c
containing the protocol
logic, route management and the protocol glue with BIRD core, and packets.c
handling RIP packet processing, RX, TX and protocol sockets.
Each instance of RIP is described by a structure rip_proto, which contains an internal RIP routing table, a list of protocol interfaces and the main timer responsible for RIP routing table cleanup.
RIP internal routing table contains incoming and outgoing routes. For each network (represented by structure rip_entry) there is one outgoing route stored directly in rip_entry and an one-way linked list of incoming routes (structures rip_rte). The list contains incoming routes from different RIP neighbors, but only routes with the lowest metric are stored (i.e., all stored incoming routes have the same metric).
Note that RIP itself does not select outgoing route, that is done by the core routing table. When a new incoming route is received, it is propagated to the RIP table by rip_update_rte() and possibly stored in the list of incoming routes. Then the change may be propagated to the core by rip_announce_rte(). The core selects the best route and propagate it to RIP by rip_rt_notify(), which updates outgoing route part of rip_entry and possibly triggers route propagation by rip_trigger_update().
RIP interfaces are represented by structures rip_iface. A RIP interface contains a per-interface socket, a list of associated neighbors, interface configuration, and state information related to scheduled interface events and running update sessions. RIP interfaces are added and removed based on core interface notifications.
There are two RIP interface events - regular updates and triggered updates. Both are managed from the RIP interface timer (rip_iface_timer()). Regular updates are called at fixed interval and propagate the whole routing table, while triggered updates are scheduled by rip_trigger_update() due to some routing table change and propagate only the routes modified since the time they were scheduled. There are also unicast-destined requested updates, but these are sent directly as a reaction to received RIP request message. The update session is started by rip_send_table(). There may be at most one active update session per interface, as the associated state (including the fib iterator) is stored directly in rip_iface structure.
RIP neighbors are represented by structures rip_neighbor. Compared to neighbor handling in other routing protocols, RIP does not have explicit neighbor discovery and adjacency maintenance, which makes the rip_neighbor related code a bit peculiar. RIP neighbors are interlinked with core neighbor structures (neighbor) and use core neighbor notifications to ensure that RIP neighbors are timely removed. RIP neighbors are added based on received route notifications and removed based on core neighbor and RIP interface events.
RIP neighbors are linked by RIP routes and use counter to track the number of associated routes, but when these RIP routes timeout, associated RIP neighbor is still alive (with zero counter). When RIP neighbor is removed but still has some associated routes, it is not freed, just changed to detached state (core neighbors and RIP ifaces are unlinked), then during the main timer cleanup phase the associated routes are removed and the rip_neighbor structure is finally freed.
Supported standards: RFC 1058 - RIPv1 RFC 2453 - RIPv2 RFC 2080 - RIPng RFC 2091 - Triggered RIP for demand circuits RFC 4822 - RIP cryptographic authentication
Function
void rip_announce_rte (struct rip_proto * p, struct rip_entry * en) -- announce route from RIP routing table to the core
Arguments
- struct rip_proto * p
RIP instance
- struct rip_entry * en
related network
Description
The function takes a list of incoming routes from en, prepare appropriate rte for the core and propagate it by rte_update().
Function
void rip_update_rte (struct rip_proto * p, net_addr * n, struct rip_rte * new) -- enter a route update to RIP routing table
Arguments
- struct rip_proto * p
RIP instance
- net_addr * n
-- undescribed --
- struct rip_rte * new
a rip_rte representing the new route
Description
The function is called by the RIP packet processing code whenever it receives a reachable route. The appropriate routing table entry is found and the list of incoming routes is updated. Eventually, the change is also propagated to the core by rip_announce_rte(). Note that for unreachable routes, rip_withdraw_rte() should be called instead of rip_update_rte().
Function
void rip_withdraw_rte (struct rip_proto * p, net_addr * n, struct rip_neighbor * from) -- enter a route withdraw to RIP routing table
Arguments
- struct rip_proto * p
RIP instance
- net_addr * n
-- undescribed --
- struct rip_neighbor * from
a rip_neighbor propagating the withdraw
Description
The function is called by the RIP packet processing code whenever it receives an unreachable route. The incoming route for given network from nbr from is removed. Eventually, the change is also propagated by rip_announce_rte().
Function
void rip_timer (timer * t) -- RIP main timer hook
Arguments
- timer * t
timer
Description
The RIP main timer is responsible for routing table maintenance. Invalid or expired routes (rip_rte) are removed and garbage collection of stale routing table entries (rip_entry) is done. Changes are propagated to core tables, route reload is also done here. Note that garbage collection uses a maximal GC time, while interfaces maintain an illusion of per-interface GC times in rip_send_response().
Keeping incoming routes and the selected outgoing route are two independent functions, therefore after garbage collection some entries now considered invalid (RIP_ENTRY_DUMMY) still may have non-empty list of incoming routes, while some valid entries (representing an outgoing route) may have that list empty.
The main timer is not scheduled periodically but it uses the time of the current next event and the minimal interval of any possible event to compute the time of the next run.
Function
void rip_iface_timer (timer * t) -- RIP interface timer hook
Arguments
- timer * t
timer
Description
RIP interface timers are responsible for scheduling both regular and triggered updates. Fixed, delay-independent period is used for regular updates, while minimal separating interval is enforced for triggered updates. The function also ensures that a new update is not started when the old one is still running.
Function
void rip_send_table (struct rip_proto * p, struct rip_iface * ifa, ip_addr addr, btime changed) -- RIP interface timer hook
Arguments
- struct rip_proto * p
RIP instance
- struct rip_iface * ifa
RIP interface
- ip_addr addr
destination IP address
- btime changed
time limit for triggered updates
Description
The function activates an update session and starts sending routing update packets (using rip_send_response()). The session may be finished during the call or may continue in rip_tx_hook() until all appropriate routes are transmitted. Note that there may be at most one active update session per interface, the function will terminate the old active session before activating the new one.
Function
void rip_rxmt_timeout (timer * t) -- RIP retransmission timer hook
Arguments
- timer * t
timer
Description
In Demand Circuit mode, update packets must be acknowledged to ensure reliability. If they are not acknowledged, we need to retransmit them.
5.9 RPKI To Router (RPKI-RTR)
The RPKI-RTR protocol is implemented in several files: rpki.c
containing
the routes handling, protocol logic, timer events, cache connection,
reconfiguration, configuration and protocol glue with BIRD core, packets.c
containing the RPKI packets handling and finally all transports files:
transport.c
, tcp_transport.c
and ssh_transport.c
.
The transport.c
is a middle layer and interface for each specific
transport. Transport is a way how to wrap a communication with a cache
server. There is supported an unprotected TCP transport and an encrypted
SSHv2 transport. The SSH transport requires LibSSH library. LibSSH is
loading dynamically using dlopen()
function. SSH support is integrated in
sysdep/unix/io.c
. Each transport must implement an initialization
function, an open function and a socket identification function. That's all.
This implementation is based on the RTRlib (http://rpki.realmv6.org/). The
BIRD takes over files packets.c
, rtr.c
(inside rpki.c
), transport.c
,
tcp_transport.c
and ssh_transport.c
from RTRlib.
A RPKI-RTR connection is described by a structure rpki_cache. The main
logic is located in rpki_cache_change_state()
function. There is a state
machine. The standard starting state flow looks like Down
> Connecting
> Sync-Start
> Sync-Running
> Established
and then the last three
states are periodically repeated.
Connecting
state establishes the transport connection. The state from a
call rpki_cache_change_state(CONNECTING)
to a call rpki_connected_hook()
Sync-Start
state starts with sending Reset Query
or Serial Query
and
then waits for Cache Response
. The state from rpki_connected_hook()
to
rpki_handle_cache_response_pdu()
During Sync-Running
BIRD receives data with IPv4/IPv6 Prefixes from cache
server. The state starts from rpki_handle_cache_response_pdu()
and ends
in rpki_handle_end_of_data_pdu()
.
Established
state means that BIRD has synced all data with cache server.
Schedules a refresh timer event that invokes Sync-Start
. Schedules Expire
timer event and stops a Retry timer event.
Transport Error
state means that we have some troubles with a network
connection. We cannot connect to a cache server or we wait too long for some
expected PDU for received - Cache Response
or End of Data
. It closes
current connection and schedules a Retry timer event.
Fatal Protocol Error
is occurred e.g. by received a bad Session ID. We
restart a protocol, so all ROAs are flushed immediately.
The RPKI-RTR protocol (RFC 6810 bis) defines configurable refresh, retry and
expire intervals. For maintaining a connection are used timer events that
are scheduled by rpki_schedule_next_refresh()
,
rpki_schedule_next_retry()
and rpki_schedule_next_expire()
functions.
A Refresh timer event performs a sync of Established
connection. So it
shifts state to Sync-Start
. If at the beginning of second call of a
refresh event is connection in Sync-Start
state then we didn't receive a
Cache Response
from a cache server and we invoke Transport Error
state.
A Retry timer event attempts to connect cache server. It is activated after
Transport Error
state and terminated by reaching Established
state.
If cache connection is still connecting to the cache server at the beginning
of an event call then the Retry timer event invokes Transport Error
state.
An Expire timer event checks expiration of ROAs. If a last successful sync was more ago than the expire interval then the Expire timer event invokes a protocol restart thereby removes all ROAs learned from that cache server and continue trying to connect to cache server. The Expire event is activated by initial successful loading of ROAs, receiving End of Data PDU.
A reconfiguration of cache connection works well without restarting when we change only intervals values.
Supported standards: - RFC 6810 - main RPKI-RTR standard - RFC 6810 bis - an explicit timing parameters and protocol version number negotiation
Function
const char * rpki_cache_state_to_str (enum rpki_cache_state state) -- give a text representation of cache state
Arguments
- enum rpki_cache_state state
A cache state
Description
The function converts logic cache state into string.
Function
void rpki_start_cache (struct rpki_cache * cache) -- connect to a cache server
Arguments
- struct rpki_cache * cache
RPKI connection instance
Description
This function is a high level method to kick up a connection to a cache server.
Function
void rpki_force_restart_proto (struct rpki_proto * p) -- force shutdown and start protocol again
Arguments
- struct rpki_proto * p
RPKI protocol instance
Description
This function calls shutdown and frees all protocol resources as well. After calling this function should be no operations with protocol data, they could be freed already.
Function
void rpki_cache_change_state (struct rpki_cache * cache, const enum rpki_cache_state new_state) -- check and change cache state
Arguments
- struct rpki_cache * cache
RPKI cache instance
- const enum rpki_cache_state new_state
suggested new state
Description
This function makes transitions between internal states. It represents the core of logic management of RPKI protocol. Cannot transit into the same state as cache is in already.
Function
void rpki_refresh_hook (timer * tm) -- control a scheduling of downloading data from cache server
Arguments
- timer * tm
refresh timer with cache connection instance in data
Description
This function is periodically called during ESTABLISHED or SYNC* state
cache connection. The first refresh schedule is invoked after receiving a
End of Data
PDU and has run by some ERROR is occurred.
Function
void rpki_retry_hook (timer * tm) -- control a scheduling of retrying connection to cache server
Arguments
- timer * tm
retry timer with cache connection instance in data
Description
This function is periodically called during ERROR* state cache connection. The first retry schedule is invoked after any ERROR* state occurred and ends by reaching of ESTABLISHED state again.
Function
void rpki_expire_hook (timer * tm) -- control a expiration of ROA entries
Arguments
- timer * tm
expire timer with cache connection instance in data
Description
This function is scheduled after received a End of Data
PDU.
A waiting interval is calculated dynamically by last update.
If we reach an expiration time then we invoke a restarting
of the protocol.
Function
const char * rpki_check_refresh_interval (uint seconds) -- check validity of refresh interval value
Arguments
- uint seconds
suggested value
Description
This function validates value and should return NULL
.
If the check doesn't pass then returns error message.
Function
const char * rpki_check_retry_interval (uint seconds) -- check validity of retry interval value
Arguments
- uint seconds
suggested value
Description
This function validates value and should return NULL
.
If the check doesn't pass then returns error message.
Function
const char * rpki_check_expire_interval (uint seconds) -- check validity of expire interval value
Arguments
- uint seconds
suggested value
Description
This function validates value and should return NULL
.
If the check doesn't pass then returns error message.
Function
const char * rpki_get_cache_ident (struct rpki_cache * cache) -- give a text representation of cache server name
Arguments
- struct rpki_cache * cache
RPKI connection instance
Description
The function converts cache connection into string.
Function
int rpki_reconfigure_cache (struct rpki_proto *p UNUSED, struct rpki_cache * cache, struct rpki_config * new, struct rpki_config * old) -- a cache reconfiguration
Arguments
- struct rpki_proto *p UNUSED
-- undescribed --
- struct rpki_cache * cache
a cache connection
- struct rpki_config * new
new RPKI configuration
- struct rpki_config * old
old RPKI configuration
Description
This function reconfigures existing single cache server connection with new
existing configuration. Generally, a change of time intervals could be
reconfigured without restarting and all others changes requires a restart of
protocol. Returns NEED_TO_RESTART
or SUCCESSFUL_RECONF
.
Function
int rpki_reconfigure (struct proto * P, struct proto_config * CF) -- a protocol reconfiguration hook
Arguments
- struct proto * P
a protocol instance
- struct proto_config * CF
a new protocol configuration
Description
This function reconfigures whole protocol.
It sets new protocol configuration into a protocol structure.
Returns NEED_TO_RESTART
or SUCCESSFUL_RECONF
.
Function
void rpki_check_config (struct rpki_config * cf) -- check and complete configuration of RPKI protocol
Arguments
- struct rpki_config * cf
RPKI configuration
Description
This function is called at the end of parsing RPKI protocol configuration.
Function
struct pdu_header * rpki_pdu_back_to_network_byte_order (struct pdu_header * out, const struct pdu_header * in) -- convert host-byte order PDU back to network-byte order
Arguments
- struct pdu_header * out
allocated memory for writing a converted PDU of size in->len
- const struct pdu_header * in
host-byte order PDU
Assumed
A == ntoh(ntoh(A))
Function
int rpki_check_receive_packet (struct rpki_cache * cache, const struct pdu_header * pdu) -- make a basic validation of received RPKI PDU header
Arguments
- struct rpki_cache * cache
cache connection instance
- const struct pdu_header * pdu
RPKI PDU in network byte order
Description
This function checks protocol version, PDU type, version and size. If all is all right then
function returns RPKI_SUCCESS
otherwise sends Error PDU and returns
RPKI_ERROR
.
Function
net_addr_union * rpki_prefix_pdu_2_net_addr (const struct pdu_header * pdu, net_addr_union * n) -- convert IPv4/IPv6 Prefix PDU into net_addr_union
Arguments
- const struct pdu_header * pdu
host byte order IPv4/IPv6 Prefix PDU
- net_addr_union * n
allocated net_addr_union for save ROA
Description
This function reads ROA data from IPv4/IPv6 Prefix PDU and write them into net_addr_roa4 or net_addr_roa6 data structure.
Function
void rpki_rx_packet (struct rpki_cache * cache, struct pdu_header * pdu) -- process a received RPKI PDU
Arguments
- struct rpki_cache * cache
RPKI connection instance
- struct pdu_header * pdu
a RPKI PDU in network byte order
Function
int rpki_send_error_pdu_ (struct rpki_cache * cache, const enum pdu_error_type error_code, u32 err_pdu_len, const struct pdu_header * erroneous_pdu, const char * fmt, ... ...) -- send RPKI Error PDU
Arguments
- struct rpki_cache * cache
RPKI connection instance
- const enum pdu_error_type error_code
PDU Error type
- u32 err_pdu_len
length of erroneous_pdu
- const struct pdu_header * erroneous_pdu
optional network byte-order PDU that invokes Error by us or NULL
- const char * fmt
optional description text of error or NULL
- ... ...
variable arguments
Description
This function prepares Error PDU and sends it to a cache server.
Function
ip_addr rpki_hostname_autoresolv (const char * host, const char ** err_msg) -- auto-resolve an IP address from a hostname
Arguments
- const char * host
domain name of host, e.g. "rpki-validator.realmv6.org"
- const char ** err_msg
error message returned in case of errors
Description
This function resolves an IP address from a hostname.
Returns ip_addr structure with IP address or IPA_NONE
.
Function
int rpki_tr_open (struct rpki_tr_sock * tr) -- prepare and open a socket connection
Arguments
- struct rpki_tr_sock * tr
initialized transport socket
Description
Prepare and open a socket connection specified by tr that must be initialized before. This function ends with a calling the sk_open() function. Returns RPKI_TR_SUCCESS or RPKI_TR_ERROR.
Function
void rpki_tr_close (struct rpki_tr_sock * tr) -- close socket and prepare it for possible next open
Arguments
- struct rpki_tr_sock * tr
successfully opened transport socket
Description
Close socket and free resources.
Function
const char * rpki_tr_ident (struct rpki_tr_sock * tr) -- Returns a string identifier for the rpki transport socket
Arguments
- struct rpki_tr_sock * tr
successfully opened transport socket
Description
Returns a terminated string identifier for the socket endpoint, e.g. "<host>:<port>". Memory is allocated inside tr structure.
Function
void rpki_tr_tcp_init (struct rpki_tr_sock * tr) -- initializes the RPKI transport structure for a TCP connection
Arguments
- struct rpki_tr_sock * tr
allocated RPKI transport structure
Function
void rpki_tr_ssh_init (struct rpki_tr_sock * tr) -- initializes the RPKI transport structure for a SSH connection
Arguments
- struct rpki_tr_sock * tr
allocated RPKI transport structure
5.10 Static
The Static protocol is implemented in a straightforward way. It keeps a list of static routes. Routes of dest RTD_UNICAST have associated sticky node in the neighbor cache to be notified about gaining or losing the neighbor and about interface-related events (e.g. link down). They may also have a BFD request if associated with a BFD session. When a route is notified, static_decide() is used to see whether the route activeness is changed. In such case, the route is marked as dirty and scheduled to be announced or withdrawn, which is done asynchronously from event hook. Routes of other types (e.g. black holes) are announced all the time.
Multipath routes are a bit tricky. To represent additional next hops, dummy static_route nodes are used, which are chained using mp_next field and link to the master node by mp_head field. Each next hop has a separate neighbor entry and an activeness state, but the master node is used for most purposes. Note that most functions DO NOT accept dummy nodes as arguments.
The only other thing worth mentioning is that when asked for reconfiguration, Static not only compares the two configurations, but it also calculates difference between the lists of static routes and it just inserts the newly added routes, removes the obsolete ones and reannounces changed ones.
5.11 Direct
The Direct protocol works by converting all ifa_notify() events it receives to rte_update() calls for the corresponding network.
Next Previous Contents