evpn rebase to HEAD

Pim van Pelt pim at ipng.ch
Thu Feb 19 23:11:05 CET 2026


Hoi,

Thanks for your time Ondrej, and apologies Maria for mistyping your 
name, Mrs IPng Networks is called Marina so that kind of just rolls off 
the keyboard sometimes :)

On 19.02.2026 18:04, Ondrej Zajicek wrote:
> On Wed, Feb 18, 2026 at 09:59:05PM +0100, Pim van Pelt wrote:
>> Hoi,
>>
>> Thanks for taking a look, Marina and Ondrej, I appreciate it!
>>
>> On 18.02.2026 17:50, Ondrej Zajicek wrote:
>>> As others noted, the relevant branch is 'oz-evpn', the older 'evpn'
>>> branch fell victim to my needlesly strict adherence to "do not rebase
>>> public branch" rule. The patches in 'oz-evpn' are not only rebased on
>>> newer BIRD version, but also have fixes squashed in them, and there is
>>> newer development. I just pushed there rebase to 2.18. Please look at
>>> this branch first. Also note there are some minor changes to EVPN protocol
>>> configuration syntax.
>> I have ported by vppevpn protocol implementation to be based on oz-evpn, and
>> the system is functional here also. Yaay!
>>
>> I only had one small issue. In oz-evpn, the 'evpn' protocol will stay in
>> 'startup' until the vxlan0 interface becomes ready. However, in my usecase,
>> vxlan is not performed by the kernel, but by VPP, so there is no 'vxlan0'
>> interface. I need only 'vni' and 'router address' (and the remote VTEP) to
>> construct the dataplane configuration. To allow the evpn protocol to
>> transition to PS_UP, I decided to fire an event that announces the IMET if
>> router_addr and VNI are set, and skips waiting for the interface.
> Hmm, you have NULL interface in the encap->tunnel_dev? Or some fake interface
> created by if_get_by_name()? Or some dummy/irrelevant interface (loopback)?
I do specify an 'encapsulation vxlan { tunnel device "vxlan0";};'. It 
satisfies Bird2 by having an interface, it just doesn't exist in the 
kernel. In branch 'evpn' this was fine, in branch 'oz-evpn' this needs 
me to cheat a bit because we're waiting on the device to be oper-up and 
enslaved to the bridge. If I skip that part, everything works fine 
without any kernel interaction. See below in [1] for my cheat.

> The interface is here not just to get/check router_addr and VNI, but
> primarily to construct next hops for routes in bridge table:
>
> evpn_receive_mac() / evpn_receive_imet():
>
>    .nh.iface = encap->tunnel_dev,
>
> These are necessary not just for kernel dataplane (to specify tunnel
> implemnting iface), but also formally just to have non-NULL nh.iface,
> which we generally assumed in BIRD for RTD_UNICAST nexthops. So how
> these routes looks in your setup?
Once I convince bird to not wait for the encap->tunnel_dev oper-up and 
its bridge master, the 'evpn' protocol starts, and next hop looks quite 
normal.

 From 'evpntab':
evpn imet 8298:200 0 192.168.10.2  [vpp0_2 2026-02-19 from 
2001:678:d78:200::2] * (100) [i]
         Type: BGP univ
         BGP.origin: IGP
         BGP.as_path:
         BGP.next_hop: 192.168.10.2
         BGP.local_pref: 100
         BGP.ext_community: (rt, 8298, 20040) (generic, 0x30c0000, 0x8)
         BGP.pmsi_tunnel: ingress-replication 192.168.10.2 mpls 20040

evpn mac 8298:200 0 fe:54:00:f0:11:23 * unicast [vpp0_2 2026-02-19 from 
2001:678:d78:200::2] * (100/5) [i]
         via 192.168.10.10 on e0 mpls 20040
         Type: BGP univ
         BGP.origin: IGP
         BGP.as_path:
         BGP.next_hop: 192.168.10.2
         BGP.local_pref: 100
         BGP.ext_community: (rt, 8298, 20040) (generic, 0x30c0000, 0x8)
         BGP.mpls_label_stack: 20040

Equivalent routes from 'etab':
00:00:00:00:00:00 vlan 200 mpls 20040 unicast [evpn2 2026-02-19] * (80)
         via 192.168.10.2 on vxlan0 mpls 20040
         Type: EVPN univ
         mpls_label: 20040

fe:54:00:f0:11:23 vlan 200 mpls 20040 unicast [evpn2 2026-02-19] * (80)
         via 192.168.10.2 on vxlan0 mpls 20040
         Type: EVPN univ
         mpls_label: 20040

> Note that the nexthops of VXLAN-tunneled routes in bridge table are just
> makeshift now, esp. usage of nh.gw for encap-dst-ip and nh->label[0]
> encap-vni, these should get their own attributes (once we will redesign
> nexthops to have proper attributes).
The information I needed for my usecase, is nexthop '192.168.10.2', and 
mpls_label '20040' from etab, and IMET from evpntab (because in P2MP 
there will be multiple IMETs and etab will only carry one of them). I've 
implemented also 'vid', as you see above 200, but it carries no meaning 
for VPP because the bridge-domain can be separately configured to allow 
untagged, single-tagged or double-tagged in the PE interfaces. If new 
attributes (like the vxlan nexthop or vxlan vni you suggest below) were 
to appear, it will be easy for me to switch to using them instead.

> I am often uncertain how much BIRD representation of routes should match
> Linux API representation of routes (esp. for idiosyncratic details like
> here when Linux API assumes nominal tunnel interfaces in next hop
> interfaces for lightweight tunnels), but i usually defer to try to keep
> it consistent to limit impedance mismatch here. But it may cause
> problems when other backends with different conventions are used, like
> in your case.
I think assuming by default a linux 'bridge' with its tunneling 
functionality is perfectly fine, although I'd prefer it if it does not 
become the /only/ valid way:
1) I'm not sure if that works well on other platforms (eg FreeBSD, 
Windows, MacOS)
2) or embedded platforms (eg Broadcom or Marvell chips).
3) or VPP :-)

Requiring a linux bridge, and requiring a kernel interface, prohibits 
non-linux eVPN scenarios. May I suggest that these things are kept 
optional even if they are the default, but that they can be turned off, 
for example by configuring a dummy interface dummy0, setting a config 
toggle 'nowait' to skip waiting for it to be oper-up/enslaved, and that 
we also do not require 'bridge' protocol ?

> Btw, i planned to explicitly configure bridge device for EVPN protocol
> (as it is now implicitly through tunnel_dev->master). The idea is that as
> VRF device (in Linux) defines L3 VRF, bridge device defines MAC-VRF. And
> as L3 protocols are associated with specific L3 VRF, L2 protocols should
> be associated with specific MAC-VRF.
It would be good if 'evpn' protocol can continue to be used standalone, 
in particular not conflate with 'bridge'. In my view, one should be able 
to inspect evpntab and etab to construct other integrations without the 
need to consult kernel devices. At the moment, 'evpn' entirely so and 
less so 'oz-evpn' are elegant precisely because it does complete 
signalling and captures evpntab and etab using exclusively one 'evpn' 
and 'bgp' protocol together with the 'evpn table' and 'eth table'. It 
allows me to create a custom 'vppevpn' protocol that subscribes to those 
tables. See attached config file (bird-example.conf) for an idea of 
where I'm headed.

> Do you have (kernel-level) bridge
> device in your setup? (i do not mean using BIRD bridge protocol).
VPP does not use any kernel bridge or vxlan device, it entirely operates 
as a userspace dataplane. In my case, Bird directly programs the VPP 
dataplane, the main flow of a four-router eVPN mesh looks like this, 
imagine each of these log lines is the result of an API call to VPP 
directly over unix domain socket:

Feb 19 22:25:50 vpp0-3 bird[1214613]: Enabling protocol bd200
Feb 19 22:25:50 vpp0-3 bird[1214613]: bd200: created bridge-domain 
bd=200 with tag='bird_bd200'
Feb 19 22:25:50 vpp0-3 bird[1214613]: bd200: created vxlan-tunnel 
sw_if_index=12 src=[192.168.10.3] dst=[192.168.10.0] vni=20040
Feb 19 22:25:50 vpp0-3 bird[1214613]: bd200: added sw_if_index=12 to 
bd=200 shg=1
Feb 19 22:25:50 vpp0-3 bird[1214613]: bd200: added remote 
mac=52:54:00:f0:10:10 vid=200 to bd=200 via vtep=[192.168.10.0] 
vni=20040 sw_if_index=12
Feb 19 22:25:50 vpp0-3 bird[1214613]: bd200: created vxlan-tunnel 
sw_if_index=16 src=[192.168.10.3] dst=[192.168.10.1] vni=20040
Feb 19 22:25:50 vpp0-3 bird[1214613]: bd200: added sw_if_index=16 to 
bd=200 shg=1
Feb 19 22:25:50 vpp0-3 bird[1214613]: bd200: added remote 
mac=52:54:00:f0:10:11 vid=200 to bd=200 via vtep=[192.168.10.1] 
vni=20040 sw_if_index=16
Feb 19 22:25:50 vpp0-3 bird[1214613]: bd200: created vxlan-tunnel 
sw_if_index=11 src=[192.168.10.3] dst=[192.168.10.2] vni=20040
Feb 19 22:25:50 vpp0-3 bird[1214613]: bd200: added sw_if_index=11 to 
bd=200 shg=1
Feb 19 22:25:50 vpp0-3 bird[1214613]: bd200: added remote 
mac=52:54:00:f0:10:12 vid=200 to bd=200 via vtep=[192.168.10.2] 
vni=20040 sw_if_index=11
Feb 19 22:25:50 vpp0-3 bird[1214613]: bd200: added remote 
mac=fe:54:00:f0:11:23 vid=200 to bd=200 via vtep=[192.168.10.2] 
vni=20040 sw_if_index=11
Feb 19 22:25:50 vpp0-3 bird[1214613]: bd200: added remote 
mac=fe:54:00:f0:11:03 vid=200 to bd=200 via vtep=[192.168.10.0] 
vni=20040 sw_if_index=12
Feb 19 22:25:50 vpp0-3 bird[1214613]: bd200: added remote 
mac=00:00:02:00:00:01 vid=200 to bd=200 via vtep=[192.168.10.0] 
vni=20040 sw_if_index=12
Feb 19 22:25:50 vpp0-3 bird[1214613]: bd200: added remote 
mac=fe:54:00:f0:11:13 vid=200 to bd=200 via vtep=[192.168.10.1] 
vni=20040 sw_if_index=16
Feb 19 22:25:50 vpp0-3 bird[1214613]: bd200: added imet bd=200 vid=200 
vtep=[192.168.10.2] vni=20040
Feb 19 22:25:50 vpp0-3 bird[1214613]: bd200: added imet bd=200 vid=200 
vtep=[192.168.10.1] vni=20040
Feb 19 22:25:50 vpp0-3 bird[1214613]: bd200: added imet bd=200 vid=200 
vtep=[192.168.10.0] vni=20040
Feb 19 22:25:51 vpp0-3 bird[1214613]: bd200: learned 
mac=52:54:00:f0:10:13 vid=200 on bd=200
Feb 19 22:25:51 vpp0-3 bird[1214613]: bd200: learned 
mac=fe:54:00:f0:11:33 vid=200 on bd=200

I am happy to share the 'vppevpn' protocol with others also, as an 
example '3P integration'. I do not expect it to be upstreamed into 
Bird2, unless there are community requests for it.
Ondrej, do let me know if you'd like to take a sneak peak at my code 
(it's in a private repo for now, as it's not ready for wider review yet, 
but it is mostly functional).
>>>> (3) Setting BGP Next Hop clears MPLS Labelstack, filters cannot set this.
>>>> When the BGP Next Hop is changed by an export filter, we lose the MPLS
>>>> labelstack. There is no way to add MPLS labelstack in filters (at least,
>>>> that I could find), so we cannot use 'next hop address X' to determine the
>>>> Type-2 MAC VxLAN endpoint. Note: IMET updates do not use the BGP Next Hop,
>>>> but rather a PSMI attribute with the 'router address' already.
>>> Resetting MPLS label when changing next hop is intentional, as MPLS labels are
>>> (in general) specific to receiving routers.
>>>
>>> There is gw_mpls (and undocumented/semantically broken gw_mpls_stack)
>>> attribute that could be accessed in filters.
>>>
>>> I am not sure what is your use case here to change it with filters, can
>>> you describe it more? What about setting 'router address' in EVPN proto?
>> With the oz-evpn branch as-is, setting 'router address' in evpn proto will:
>> 1) copy that to the PSMI attribute: good
>> 2) not do anything for MAC announcements; they will have BGP.next_hop set to
>> the session address.
>>
>> if the previous patch in (2) is accepted, then 'router address' will be used
>> as BGP.next_hop, which will avoid the need to change it with filters with
>> (3).
> Oh, i see. You are right, this should work automatically for both IMET / PMSI
> and MAC.
>
> I do not like using regular/immediate next hops here in EVPN table, as
> it does not fit well semantically and requires formal device. But seems
> to me that a reasonable alternative would be to just attach BGP_NEXT_HOP
> by EVPN protocol, similarly how BGP_PMSI_TUNNEL is attached. Wil do that.
> Any comments?
If you were to attach a specific attribute like vxlan_nexthop or 
vxlan_vni to the etab table entry, I would simply read that and use it 
instead of the bgp nexthop. That's what happens already today for IMET, 
as it has the BGP.pmsi_tunnel attribute with the needed 
ingress-replication 2001:678:d78:200::2 mpls 10040 information. How do 
other vendors (say Arista, Cisco, Nokia, FRRouting) handle the Type-2 
nexthop? My understanding is they use BGP next hop for that (in other 
words, the same as how Bird does it today).

> Note that immediate next hops in EVPN table for routes received through
> BGP are here just as an artefact of BGP_NEXT_HOP resolvability check,
> they should not be here too.
Not sure I understand what you mean - don't we have this problem also 
for kernel based vxlan? If we create a vxlan0 interface in a bridge, and 
set a fdb entry onto it, we also need to know which vxlan nexthop to 
use. The way I read 'evpn' and 'oz-evpn', we use the BGP nexthop for 
that purpose. However, if what you're saying is you'd want to remove the 
BGP Next Hop and instead have an EVPN VxLAN Next Hop attribute to 
populate the 'etab' gateway field that would work just as well for me. I 
kind of wonder why you'd go to the trouble obfuscating the BGP Next Hop. 
Don't other vendors use the same thing (send vxlan packet to the address 
learned via the BGP Next Hop in Type-2 announcements) ?
>> If neither patch is applied, the following config:
>>
>> protocol evpn {
>>    ...
>>    encapsulation vxlan { router address 192.0.2.1; };
>> }
>> protocol bgp {
>>    evpn { import all; export all; };
>>    local 2001:db8::1 as 65512;
>>    neighbor 2001:db8::2 as 65512;
>> }
>>
>> will yield IMET pointing at 192.0.2.1 but MAC pointing at 2001:db8::1. If I
>> want MAC pointing at 192.0.2.1 also, I would either need (2, my preference)
>> or a filter with (3).
>> If there exists a device out there which has different addressing for IMET
>> and MAC (note: I don't know of any, but perhaps they exist), then (3) would
>> come in handy.
> While i agree that it should work automatically by just setting router
> address in protocol evpn, i think that this setup that should work even
> without patches:
>
>   protocol evpn {
>     ...
>     encapsulation vxlan { router address 192.0.2.1; };
>   }
>   protocol bgp {
>     evpn { import all; export all; next hop address 192.0.2.1; };
>     local 2001:db8::1 as 65512;
>     neighbor 2001:db8::2 as 65512;
>   }
I don't think this works for MAC, for IMET it works because that has a 
custom PSMI BGP attribute which is set to encap0->router_addr). Setting 
the next hop in this way will clear the mpls labelstack. So we'd end up 
with:
fe:54:00:f0:11:02 vlan 100 mpls 0 unicast [evpn1 2026-02-19] * (80)
         via 192.0.2.1 on vxlan0 mpls 0
and we'd lose the VNI.

groet,
Pim

[1] skipping the wait for tunnel_dev to become operational:
@@ -1059,11 +1070,37 @@ evpn_start(struct proto *P)
      P->mpls_map->vrf_iface = P->vrf;
    */

+  /* If router address and VNI are fully configured, no need to wait for
+   * the tunnel device to come up (e.g., when VPP manages VXLAN tunnels).
+   * Schedule an immediate event to transition to PS_UP. */
+  struct evpn_encap *encap0 = evpn_get_encap(p);
+  if (!ipa_zero(encap0->router_addr) && (p->vni != U32_UNDEF))
+  {
+    event *e = ev_new_init(p->p.pool, evpn_no_iface_startup, p);
+    ev_schedule(e);
+  }
+
    /* Wait for VXLAN interfaces to be up */

    return PS_START;
  }

+static void
+evpn_no_iface_startup(void *data)
+{
+  struct evpn_proto *p = data;
+
+  if (p->p.proto_state != PS_START)
+    return;
+
+  proto_notify_state(&p->p, PS_UP);
+
+  evpn_announce_imet(p, EVPN_ROOT_VLAN(p), 1);
+
+  WALK_LIST_(struct evpn_vlan, v, p->vlans)
+    evpn_announce_imet(p, v, 1);
+}

-- 
Pim van Pelt<pim at ipng.ch>
PBVP1-RIPEhttps://ipng.ch/

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://trubka.network.cz/pipermail/bird-users/attachments/20260219/d272f7e8/attachment.htm>
-------------- next part --------------
## Manual configuration for vpp1-0

eth table etab;
eth table etab100;
eth table etab200;
evpn table evpntab;

protocol static {
  eth { table etab; };
  route eth 00:00:01:00:00:01 vlan 100 prohibit;
  route eth 00:00:02:00:00:01 vlan 200 prohibit;
}

protocol evpn {
  debug all;
  eth { table etab; };
  evpn { import all; export all; };
  rd 8298:100;
  import target (rt, 8298, 10040);
  export target (rt, 8298, 10040);
  encapsulation vxlan {
    tunnel device "vxlan0";
    router address 2001:678:d78:200::;
  };
  vni 10040;
  vid 100;
}; 

protocol evpn {
  debug all;
  eth { table etab; };
  evpn { import all; export all; };
  rd 8298:200;
  import target (rt, 8298, 20040);
  export target (rt, 8298, 20040);
  encapsulation vxlan {
    tunnel device "vxlan0";
    router address 192.168.10.0;
  };
  vni 20040;
  vid 200;
}; 

filter bgp_evpn_out {
#  if (rt, 8298, 20040) ~ bgp_ext_community then { bgp_next_hop = 2001:678:d78:200::; }
  accept;
}

template bgp T_BGP_EVPN {
  evpn { import all; export filter bgp_evpn_out; };
  local 2001:678:d78:200:: as 65512;
}

protocol bgp vpp0_1 from T_BGP_EVPN { neighbor 2001:678:d78:200::1 as 65512; }
protocol bgp vpp0_2 from T_BGP_EVPN { neighbor 2001:678:d78:200::2 as 65512; }
protocol bgp vpp0_3 from T_BGP_EVPN { neighbor 2001:678:d78:200::3 as 65512; }

protocol vppevpn bd100 {
  debug all;
  eth { table etab; import all; export all; };

  vxlan ipv6 src 2001:678:d78:200::;
  vxlan ipv4 src 192.168.10.0;
  vxlan src port 4789;
  vxlan dst port 4789;
  bridge domain 100;
  scan time 5;
  vid 100;
  vni 10040;
};

protocol vppevpn bd200 {
  debug all;
  eth { table etab; import all; export all; };

  vxlan ipv6 src 2001:678:d78:200::;
  vxlan ipv4 src 192.168.10.0;
  vxlan src port 4789;
  vxlan dst port 4789;
  bridge domain 200;
  bridge mac age 10;
  scan time 5;
  vid 200;
  vni 20040;
};


More information about the Bird-users mailing list