## 4. Filters

## 4.1 Filters

You can find sources of the filter language in `filter/`

directory. File `filter/config.Y`

contains filter grammar and basically translates
the source from user into a tree of *f_inst* structures. These trees are
later interpreted using code in `filter/filter.c`

.

A filter is represented by a tree of *f_inst* structures, later translated
into lists called *f_line*. All the instructions are defined and documented
in `filter/f-inst.c`

definition file.

Filters use a *f_val* structure for their data. Each *f_val*
contains type and value (types are constants prefixed with *T_*).
Look into `filter/data.h`

for more information and appropriate calls.

### Function

*enum filter_return*
**interpret**
(*struct filter_state ** **fs**, *const struct f_line ** **line**, *struct f_val ** **val**)

### Arguments

*struct filter_state ****fs**filter state

*const struct f_line ****line**-- undescribed --

*struct f_val ****val**-- undescribed --

### Description

Interpret given tree of filter instructions. This is core function of filter system and does all the hard work.

### Each instruction has 4 fields

code (which is instruction code), aux (which is extension to instruction code, typically type), arg1 and arg2 - arguments. Depending on instruction, arguments are either integers, or pointers to instruction trees. Common instructions like +, that have two expressions as arguments use TWOARGS macro to get both of them evaluated.

### Function

*enum filter_return*
**f_run**
(*const struct filter ** **filter**, *struct rte *** **rte**, *struct linpool ** **tmp_pool**, *int* **flags**) -- run a filter for a route

### Arguments

*const struct filter ****filter**filter to run

*struct rte *****rte**route being filtered, may be modified

*struct linpool ****tmp_pool**all filter allocations go from this pool

*int***flags**flags

### Description

If filter needs to modify the route, there are several
posibilities. **rte** might be read-only (with REF_COW flag), in that
case rw copy is obtained by **rte_cow()** and **rte** is replaced. If
**rte** is originally rw, it may be directly modified (and it is never
copied).

The returned rte may reuse the (possibly cached, cloned) rta, or
(if rta was modified) contains a modified uncached rta, which
uses parts allocated from **tmp_pool** and parts shared from original
rta. There is one exception - if **rte** is rw but contains a cached
rta and that is modified, rta in returned rte is also cached.

Ownership of cached rtas is consistent with rte, i.e. if a new rte is returned, it has its own clone of cached rta (and cached rta of read-only source rte is intact), if rte is modified in place, old cached rta is possibly freed.

### Function

*enum filter_return*
**f_eval_rte**
(*const struct f_line ** **expr**, *struct rte *** **rte**, *struct linpool ** **tmp_pool**) -- run a filter line for an uncached route

### Arguments

*const struct f_line ****expr**filter line to run

*struct rte *****rte**route being filtered, may be modified

*struct linpool ****tmp_pool**all filter allocations go from this pool

### Description

This specific filter entry point runs the given filter line (which must not have any arguments) on the given route.

The route MUST NOT have REF_COW set and its attributes MUST NOT
be cached by **rta_lookup()**.

### Function

*int*
**filter_same**
(*const struct filter ** **new**, *const struct filter ** **old**) -- compare two filters

### Arguments

*const struct filter ****new**first filter to be compared

*const struct filter ****old**second filter to be compared

### Description

Returns 1 in case filters are same, otherwise 0. If there are underlying bugs, it will rather say 0 on same filters than say 1 on different.

### Function

*void*
**filter_commit**
(*struct config ** **new**, *struct config ** **old**) -- do filter comparisons on all the named functions and filters

### Arguments

*struct config ****new**-- undescribed --

*struct config ****old**-- undescribed --

### Function

*struct f_tree **
**build_tree**
(*struct f_tree ** **from**)

### Arguments

*struct f_tree ****from**degenerated tree (linked by

**tree**->left) to be transformed into form suitable for**find_tree()**

### Description

Transforms degenerated tree into balanced tree.

### Function

*int*
**same_tree**
(*const struct f_tree ** **t1**, *const struct f_tree ** **t2**)

### Arguments

*const struct f_tree ****t1**first tree to be compared

*const struct f_tree ****t2**second one

### Description

Compares two trees and returns 1 if they are same

## 4.2 Trie for prefix sets

We use a (compressed) trie to represent prefix sets. Every node in the trie
represents one prefix (*addr*/*plen*) and *plen* also indicates the index of
bits in the address that are used to branch at the node. Note that such
prefix is not necessary a member of the prefix set, it is just a canonical
prefix associated with a node. Prefix lengths of nodes are aligned to
multiples of *TRIE_STEP* (4) and there is 16-way branching in each
node. Therefore, we say that a node is associated with a range of prefix
lengths (*plen* .. *plen* + TRIE_STEP - 1).

The prefix set is not just a set of prefixes, it is defined by a set of
prefix patterns. Each prefix pattern consists of *ppaddr*/*pplen* and two
integers: *low* and *high*. The tested prefix *paddr*/*plen* matches that pattern
if the first MIN(*plen*, *pplen*) bits of *paddr* and *ppaddr* are the same and
*low* <= *plen* <= *high*.

There are two ways to represent accepted prefixes for a node. First, there is
a bitmask *local*, which represents independently all 15 prefixes that extend
the canonical prefix of the node and are within a range of prefix lengths
associated with the node. E.g., for node 10.0.0.0/8 they are 10.0.0.0/8,
10.0.0.0/9, 10.128.0.0/9, .. 10.224.0.0/11. This order (first by length, then
lexicographically) is used for indexing the bitmask *local*, starting at
position 1. I.e., index is 2^(plen - base) + offset within the same length,
see function **trie_local_mask6()** for details.

Second, we use a bitmask *accept* to represent accepted prefix lengths at a
node. The bit is set means that all prefixes of given length that are either
subprefixes or superprefixes of the canonical prefix are accepted. As there
are 33 prefix lengths (0..32 for IPv4), but there is just one prefix of zero
length in the whole trie so we have *zero* flag in *f_trie* (indicating whether
the trie accepts prefix 0.0.0.0/0) as a special case, and *accept* bitmask
represents accepted prefix lengths from 1 to 32.

One complication is handling of prefix patterns with unaligned prefix length.
When such pattern is to be added, we add a primary node above (with rounded
down prefix length *nlen*) and a set of secondary nodes below (with rounded up
prefix lengths *slen*). Accepted prefix lengths of the original prefix pattern
are then represented in different places based on their lengths. For prefixes
shorter than *nlen*, it is *accept* bitmask of the primary node, for prefixes
between *nlen* and *slen* - 1 it is *local* bitmask of the primary node, and for
prefixes longer of equal *slen* it is *accept* bitmasks of secondary nodes.

There are two cases in prefix matching - a match when the length of the
prefix is smaller that the length of the prefix pattern, (*plen* < *pplen*) and
otherwise. The second case is simple - we just walk through the trie and look
at every visited node whether that prefix accepts our prefix length (*plen*).
The first case is tricky - we do not want to examine every descendant of a
final node, so (when we create the trie) we have to propagate that
information from nodes to their ascendants.

There are two kinds of propagations - propagation from child's *accept*
bitmask to parent's *accept* bitmask, and propagation from child's *accept*
bitmask to parent's *local* bitmask. The first kind is simple - as all
superprefixes of a parent are also all superprefixes of appropriate length of
a child, then we can just add (by bitwise or) a child *accept* mask masked by
parent prefix length mask to the parent *accept* mask. This handles prefixes
shorter than node *plen*.

The second kind of propagation is necessary to handle superprefixes of a
child that are represented by parent *local* mask - that are in the range of
prefix lengths associated with the parent. For each accepted (by child
*accept* mask) prefix length from that range, we need to set appropriate bit
in *local* mask. See function **trie_amask_to_local()** for details.

There are four cases when we walk through a trie:

- we are in NULL
- we are out of path (prefixes are inconsistent)
- we are in the wanted (final) node (node length == *plen*)
- we are beyond the end of path (node length > *plen*)
- we are still on path and keep walking (node length < *plen*)

The walking code in **trie_match_net()** is structured according to these cases.

Iteration over prefixes in a trie can be done using **TRIE_WALK()** macro, or
directly using **trie_walk_init()** and **trie_walk_next()** functions. The second
approach allows suspending the iteration and continuing in it later.
Prefixes are enumerated in the usual lexicographic order and may be
restricted to a subset of the trie (all subnets of a specified prefix).

Note that the trie walk does not reliably enumerate `implicit' prefixes
defined by *low* and *high* fields in prefix patterns, it is supposed to be
used on tries constructed from `explicit' prefixes (*low* == *plen* == *high*
in call to **trie_add_prefix()**).

The trie walk has three basic state variables stored in the struct
*f_trie_walk_state* -- the current node in *stack*[stack_pos], *accept_length*
for iteration over inter-node prefixes (non-branching prefixes on compressed
path between the current node and its parent node, stored in the bitmap
*accept* of the current node) and *local_pos* for iteration over intra-node
prefixes (stored in the bitmap *local*).

The trie also supports longest-prefix-match query by **trie_match_longest_ip4()**
and it can be extended to iteration over all covering prefixes for a given
prefix (from longest to shortest) using **TRIE_WALK_TO_ROOT_IP4()** macro. There
are also IPv6 versions (for practical reasons, these functions and macros are
separate for IPv4 and IPv6). There is the same limitation to enumeration of
`implicit' prefixes like with the previous **TRIE_WALK()** macro.

### Function

*struct f_trie **
**f_new_trie**
(*linpool ** **lp**, *uint* **data_size**) -- allocates and returns a new empty trie

### Arguments

*linpool ****lp**linear pool to allocate items from

*uint***data_size**user data attached to node

### Function

*void **
**trie_add_prefix**
(*struct f_trie ** **t**, *const net_addr ** **net**, *uint* **l**, *uint* **h**)

### Arguments

*struct f_trie ****t**trie to add to

*const net_addr ****net**IP network prefix

*uint***l**prefix lower bound

*uint***h**prefix upper bound

### Description

Adds prefix (prefix pattern) **n** to trie **t**. **l** and **h** are lower
and upper bounds on accepted prefix lengths, both inclusive.
0 <= l, h <= 32 (128 for IPv6).

Returns a pointer to the allocated node. The function can return a pointer to
an existing node if **px** and **plen** are the same. If px/plen == 0/0 (or ::/0),
a pointer to the root node is returned. Returns NULL when called with
mismatched IPv4/IPv6 net type.

### Function

*int*
**trie_match_net**
(*const struct f_trie ** **t**, *const net_addr ** **n**)

### Arguments

*const struct f_trie ****t**trie

*const net_addr ****n**net address

### Description

Tries to find a matching net in the trie such that
prefix **n** matches that prefix pattern. Returns 1 if there
is such prefix pattern in the trie.

### Function

*int*
**trie_match_longest_ip4**
(*const struct f_trie ** **t**, *const net_addr_ip4 ** **net**, *net_addr_ip4 ** **dst**, *ip4_addr ** **found0**)

### Arguments

*const struct f_trie ****t**trie

*const net_addr_ip4 ****net**net address

*net_addr_ip4 ****dst**return value

*ip4_addr ****found0**optional returned bitmask of found nodes

### Description

Perform longest prefix match for the address **net** and return the resulting
prefix in the buffer **dst**. The bitmask **found0** is used to report lengths of
prefixes on the path from the root to the resulting prefix. E.g., if there is
also a /20 shorter matching prefix, then 20-th bit is set in **found0**. This
can be used to enumerate all matching prefixes for the network **net** using
function **trie_match_next_longest_ip4()** or macro **TRIE_WALK_TO_ROOT_IP4()**.

This function assumes IPv4 trie, there is also an IPv6 variant. The **net**
argument is typed as net_addr_ip4, but would accept any IPv4-based net_addr,
like **net4_prefix()**. Anyway, returned **dst** is always net_addr_ip4.

### Result

1 if a matching prefix was found, 0 if not.

### Function

*int*
**trie_match_longest_ip6**
(*const struct f_trie ** **t**, *const net_addr_ip6 ** **net**, *net_addr_ip6 ** **dst**, *ip6_addr ** **found0**)

### Arguments

*const struct f_trie ****t**trie

*const net_addr_ip6 ****net**net address

*net_addr_ip6 ****dst**return value

*ip6_addr ****found0**optional returned bitmask of found nodes

### Description

Perform longest prefix match for the address **net** and return the resulting
prefix in the buffer **dst**. The bitmask **found0** is used to report lengths of
prefixes on the path from the root to the resulting prefix. E.g., if there is
also a /20 shorter matching prefix, then 20-th bit is set in **found0**. This
can be used to enumerate all matching prefixes for the network **net** using
function **trie_match_next_longest_ip6()** or macro **TRIE_WALK_TO_ROOT_IP6()**.

This function assumes IPv6 trie, there is also an IPv4 variant. The **net**
argument is typed as net_addr_ip6, but would accept any IPv6-based net_addr,
like **net6_prefix()**. Anyway, returned **dst** is always net_addr_ip6.

### Result

1 if a matching prefix was found, 0 if not.

### Function

*void*
**trie_walk_init**
(*struct f_trie_walk_state ** **s**, *const struct f_trie ** **t**, *const net_addr ** **net**)

### Arguments

*struct f_trie_walk_state ****s**walk state

*const struct f_trie ****t**trie

*const net_addr ****net**optional subnet for walk

### Description

Initialize walk state for subsequent walk through nodes of the trie **t** by
**trie_walk_next()**. The argument **net** allows to restrict walk to given subnet,
otherwise full walk over all nodes is used. This is done by finding node at
or below **net** and starting position in it.

### Function

*int*
**trie_walk_next**
(*struct f_trie_walk_state ** **s**, *net_addr ** **net**)

### Arguments

*struct f_trie_walk_state ****s**walk state

*net_addr ****net**return value

### Description

Find the next prefix in the trie walk and return it in the buffer **net**.
Prefixes are walked in the usual lexicographic order and may be restricted
to a subset of the trie during walk setup by **trie_walk_init()**. Note that the
trie walk does not iterate reliably over 'implicit' prefixes defined by *low*
and *high* fields in prefix patterns, it is supposed to be used on tries
constructed from 'explicit' prefixes (*low* == *plen* == *high* in call to
**trie_add_prefix()**).

### Result

1 if the next prefix was found, 0 for the end of walk.

### Function

*int*
**trie_same**
(*const struct f_trie ** **t1**, *const struct f_trie ** **t2**)

### Arguments

*const struct f_trie ****t1**first trie to be compared

*const struct f_trie ****t2**second one

### Description

Compares two tries and returns 1 if they are same

### Function

*void*
**trie_format**
(*const struct f_trie ** **t**, *buffer ** **buf**)

### Arguments

*const struct f_trie ****t**trie to be formatted

*buffer ****buf**destination buffer

### Description

Prints the trie to the supplied buffer.

Next Previous Contents