evpn rebase to HEAD

Pim van Pelt pim at ipng.ch
Fri Feb 20 13:24:07 CET 2026


Hoi,

On 20.02.2026 01:31, Ondrej Zajicek wrote:
> That is the fake interface from if_get_by_name(). Using them in route
> nexthops is 'fine' on the level that it does not crash due to NULL
> dereference, but they were never supposed be used this way, they are
> just placeholders for configuration.
>
> Note that these fake interfaces are horrible hack in BIRD code, as
> properly there should be two distinct structures: iface_config and
> iface, the former representing interface referenced in config file, and
> the latter representing real kernel interfaces found by 'device' protocol.
> But we use the same structure for both cases.
Understood - once iface_config and iface are split, I can make use of 
either construct (the iface_config one makes more sense). Neither the 
interface name or kernel device are necessary in my implementation.
> I wonder if your setup would work, if you instead of using this fake interface
> use some real placeholder interface, say loopback:
>
> 'encapsulation vxlan { tunnel device "lo"; };'
It works fine. As an aside, reconfiguring causes a restart of evpn 
protocol, which trips an assertion and crashes. The crash also happens 
on 'birdc disable evpn1'.
Feb 20 12:12:29 vpp0-3 bird[1455113]: Restarting protocol evpn1
Feb 20 12:12:29 vpp0-3 bird[1455113]: Assertion 'pub->queue && 
pub->topic' failed at lib/pubsub.c:161
Feb 20 12:12:29 vpp0-3 systemd[1]: bird-dataplane.service: Main process 
exited, code=killed, status=11/SEGV

Either way, Bird comes back up and works just fine using tunnel_dev set 
to "lo". It reminds me that I already use this trick, as MAC addresses 
learned from VPP's bridge-domain do not have any corresponding Linux or 
Bird interface, so I inject them into etab using "lo" as well.

> The 'cheat' have to be modified (it should wait for the interface,
> but will ignore the fact that the interface is not a tunnel (i.e.
> skip/ignore evpn_validate_iface_attrs()).
I like that. Perhaps a keyword in the config can signal that this is OK, 
like 'tunnel device "evpn0-dummy" virtual;' or just 'tunnel device "lo" 
virtual;'

> Note that you should read IMET from etab too. EVPN protocol translate
> all IMETs from evpntab to etab, otherwise even our kernel-based setup
> would not work -- 'bridge' protocol that configures kernel bridge also
> reads just etab.
I do not have multiple IMETs in etab, only one:
root at vpp0-0:/etc/bird# birdc show route table evpntab | grep imet
evpn imet 8298:100 0 2001:678:d78:200::3  [vpp0_3 12:12:38.484 from 
2001:678:d78:200::3] * (100) [i]
evpn imet 8298:100 0 2001:678:d78:200::2  [vpp0_2 11:18:21.821 from 
2001:678:d78:200::2] * (100) [i]
evpn imet 8298:100 0 2001:678:d78:200::1  [vpp0_1 11:18:21.253 from 
2001:678:d78:200::1] * (100) [i]
evpn imet 8298:100 0 2001:678:d78:200:: unicast [evpn1 11:18:07.285] * (120)

root at vpp0-0:/etc/bird# birdc show route table etab | grep 00:00:00:
00:00:00:00:00:00 vlan 100 mpls 10040 unicast [evpn1 12:12:38.484] * (80)

Perhaps I'm holding it wrong (see bird-example.conf). It would actually 
be super if I could rely /only/ on etab, as tracking both etab and 
evpntab was a fair amount of extra code.
> I agree and this split of work between 'evpn' protocol and 'bridge' protocol
> (with separate 'evpn table' and 'eth table') are going to stay.
Thank you! That's great news for me.

>> I am happy to share the 'vppevpn' protocol with others also, as an example
>> '3P integration'. I do not expect it to be upstreamed into Bird2, unless
>> there are community requests for it.
>> Ondrej, do let me know if you'd like to take a sneak peak at my code (it's
>> in a private repo for now, as it's not ready for wider review yet, but it is
>> mostly functional).
> Having better integration with VPP (or some other userspace dataplane)
> is something we are interested in general, but i would not look at it
> before i finish some other tasks (including merging EVPN) as i am rather
> overwhelmed.
I can volunteer my time to write a vpp protocol (for ip4, ip6, mpls FIB 
and interfaces). I'll contact you separately for that, it sounds like a 
worthwhile project and I've kind of always wanted to do it.

>>>>>> (3) Setting BGP Next Hop clears MPLS Labelstack, filters cannot set this.
>>>>>> When the BGP Next Hop is changed by an export filter, we lose the MPLS
>>>>>> labelstack. There is no way to add MPLS labelstack in filters (at least,
>>>>>> that I could find), so we cannot use 'next hop address X' to determine the
>>>>>> Type-2 MAC VxLAN endpoint. Note: IMET updates do not use the BGP Next Hop,
>>>>>> but rather a PSMI attribute with the 'router address' already.
>>>>> Resetting MPLS label when changing next hop is intentional, as MPLS labels are
>>>>> (in general) specific to receiving routers.
>>>>>
>>>>> There is gw_mpls (and undocumented/semantically broken gw_mpls_stack)
>>>>> attribute that could be accessed in filters.
>>>>>
>>>>> I am not sure what is your use case here to change it with filters, can
>>>>> you describe it more? What about setting 'router address' in EVPN proto?
>>>> With the oz-evpn branch as-is, setting 'router address' in evpn proto will:
>>>> 1) copy that to the PSMI attribute: good
>>>> 2) not do anything for MAC announcements; they will have BGP.next_hop set to
>>>> the session address.
>>>>
>>>> if the previous patch in (2) is accepted, then 'router address' will be used
>>>> as BGP.next_hop, which will avoid the need to change it with filters with
>>>> (3).
>>> Oh, i see. You are right, this should work automatically for both IMET / PMSI
>>> and MAC.
>>>
>>> I do not like using regular/immediate next hops here in EVPN table, as
>>> it does not fit well semantically and requires formal device. But seems
>>> to me that a reasonable alternative would be to just attach BGP_NEXT_HOP
>>> by EVPN protocol, similarly how BGP_PMSI_TUNNEL is attached. Wil do that.
>>> Any comments?
>> If you were to attach a specific attribute like vxlan_nexthop or vxlan_vni
>> to the etab table entry, I would simply read that and use it instead of the
>> bgp nexthop. That's what happens already today for IMET, as it has the
>> BGP.pmsi_tunnel attribute with the needed ingress-replication
>> 2001:678:d78:200::2 mpls 10040 information. How do other vendors (say
>> Arista, Cisco, Nokia, FRRouting) handle the Type-2 nexthop? My understanding
>> is they use BGP next hop for that (in other words, the same as how Bird does
>> it today).
> I think there is some confusion here. I am talking about evpntab
> entries, not about etab entries. And about your patch that sets router
> IP into their immediate next hop (nh.gw).
I see - then maybe I can try a different approach. The patch, I thought, 
makes Bird behave the same as Nokia SRLinux {1], which also sets the 
router ip (the local VTEP) as nexthop but what you're saying is I should 
not set the /immediate/ nexthop, but rather leave that alone and set the 
/BGP Next Hop/? Although as a reminder, I need to be able to set an IPv4 
BGP Next Hop on an IPv6 session only for some RTs, not all. See one more 
thought on that below ..

>> Not sure I understand what you mean - don't we have this problem also for
>> kernel based vxlan? If we create a vxlan0 interface in a bridge, and set a
>> fdb entry onto it, we also need to know which vxlan nexthop to use. The way
>> I read 'evpn' and 'oz-evpn', we use the BGP nexthop for that purpose.
>> However, if what you're saying is you'd want to remove the BGP Next Hop and
>> instead have an EVPN VxLAN Next Hop attribute to populate the 'etab' gateway
>> field that would work just as well for me. I kind of wonder why you'd go to
>> the trouble obfuscating the BGP Next Hop. Don't other vendors use the same
>> thing (send vxlan packet to the address learned via the BGP Next Hop in
>> Type-2 announcements) ?
> I just mean that immediate next hop fields for evpntab routes received
> through BGP are irrelevant, while the BGP next hop attribute is the
> important one. When 'evpn' protocol takes a route from evpntab and convert
> it to etab entry, it examines BGP next hop, not immediate next hop.
OK I think I understand now.

>>> While i agree that it should work automatically by just setting router
>>> address in protocol evpn, i think that this setup that should work even
>>> without patches:
>>>
>>>    protocol evpn {
>>>      ...
>>>      encapsulation vxlan { router address 192.0.2.1; };
>>>    }
>>>    protocol bgp {
>>>      evpn { import all; export all; next hop address 192.0.2.1; };
>>>      local 2001:db8::1 as 65512;
>>>      neighbor 2001:db8::2 as 65512;
>>>    }
>> I don't think this works for MAC, for IMET it works because that has a
>> custom PSMI BGP attribute which is set to encap0->router_addr). Setting the
>> next hop in this way will clear the mpls labelstack. So we'd end up with:
>> fe:54:00:f0:11:02 vlan 100 mpls 0 unicast [evpn1 2026-02-19] * (80)
>>          via 192.0.2.1 on vxlan0 mpls 0
>> and we'd lose the VNI.
> I think it will not clear the MPLS labelstack. This is not setting next
> hop in filters. The difference between
>
>    evpn { import all; export all; next hop address 192.0.2.1; };
>
> and
>
>    evpn { import all; export all; };
>
> in BGP protocol export is only where the BGP next hop value is taken
> from (explicitly configured one or source address from BGP session), but
> route processing is the same. See bgp_update_next_hop_ip(), the
> !bgp_use_next_hop(s, a) and !bgp_use_gateway(s) case.
I tried this, and you are correct that 'next hop address' works and 
leaves the MPLS labelstack alone:
00:00:00:00:00:00 vlan 200 mpls 20040 unicast [evpn2 12:54:44.880] * (80)
         via 192.168.10.0 on lo mpls 20040
fe:54:00:f0:11:03 vlan 200 mpls 20040 unicast [evpn2 12:54:44.880] * (80)
         via 192.168.10.0 on lo mpls 20040

Now let's suppose I have two evpn protocols, one with an IPv4 router 
address and one with an IPv6 router address. In this scenario, I can't 
use 'next hop address' because it'll force both to use that address family.

It yields a bad state:
1) as before, the IPv4-only evpn (VNI 20040) works
2) but now, the evpn with an IPv6 router address, sends IMET with IPv6, 
and MAC with IPv4
00:00:00:00:00:00 vlan 100 mpls 10040 unicast [evpn1 13:07:07.455] * (80)
         via 2001:678:d78:200:: on lo mpls 10040
         Type: EVPN univ
         mpls_label: 10040
fe:54:00:f0:11:02 vlan 100 mpls 10040 unicast [evpn1 13:07:07.455] * (80)
         via 192.168.10.0 on lo mpls 10040
         Type: EVPN univ
         mpls_label: 10040

An obvious solution is to use a filter, like this one:
filter bgp_evpn_out {
   if (rt, 8298, 10040) ~ bgp_ext_community then { bgp_next_hop = 
192.168.10.3; }
   if (rt, 8298, 20040) ~ bgp_ext_community then { bgp_next_hop = 
2001:678:d78:200::3; }
   accept;
}

template bgp T_BGP_EVPN {
   evpn { import all; export filter bgp_evpn_out; };
   local 2001:678:d78:200::3 as 65512;
}

But now the filter does destroy the MPLS labelstack, although the 
mpls_label attribute remains:
00:00:00:00:00:00 vlan 100 mpls 10040 unicast [evpn1 13:00:01.031] * (80)
         via 2001:678:d78:200:: on lo mpls 10040
         Type: EVPN univ
         mpls_label: 10040
fe:54:00:f0:11:02 vlan 100 mpls 10040 unicast [evpn1 13:00:01.031] * (80)
*        via 2001:678:d78:200:: on lo mpls 0*
         Type: EVPN univ
         mpls_label: 10040

00:00:00:00:00:00 vlan 200 mpls 20040 unicast [evpn2 13:00:01.031] * (80)
         via 192.168.10.0 on lo mpls 20040
         Type: EVPN univ
         mpls_label: 20040
fe:54:00:f0:11:03 vlan 200 mpls 20040 unicast [evpn2 13:00:01.031] * (80)
*        via 192.168.10.0 on lo mpls 0*
         Type: EVPN univ
         mpls_label: 20040

My conclusion was: I need to be able to apply filters without destroying 
the MPLS labels. If I now understand correctly, I can remove the 
nh.gw/nh.iface from evpn_announce_mac() and evpn_announce_imet(), but 
keep the change in bgp_update_next_hop_ip()

@@ -1314,19 +1310,6 @@ bgp_update_next_hop_ip(struct bgp_export_state 
*s, eattr *a, ea_list **to)
      }
    }

+  /* For L2VPN (EVPN): ensure MPLS label stack is set even if next hop 
was filter-overridden */
+ if (s->mpls && bgp_channel_is_l2vpn(s->channel) && !bgp_find_attr(*to, 
BA_MPLS_LABEL_STACK))
+  {
+    rta *ra = s->route->attrs;
+    if (ra->nh.labels)
+      bgp_set_attr_data(to, s->pool, BA_MPLS_LABEL_STACK, 0, 
ra->nh.label, ra->nh.labels * 4);
+    else
+    {
+      u32 label = ea_get_int(ra->eattrs, EA_MPLS_LABEL, BGP_MPLS_NULL);
+      bgp_set_attr_data(to, s->pool, BA_MPLS_LABEL_STACK, 0, &label, 4);
+    }
+  }

This allows the above filter to work while preserving the labelstack:
00:00:00:00:00:00 vlan 100 mpls 10040 unicast [evpn1 13:15:42.571] * (80)
         via 2001:678:d78:200:: on lo mpls 10040
         Type: EVPN univ
         mpls_label: 10040
fe:54:00:f0:11:02 vlan 100 mpls 10040 unicast [evpn1 13:15:42.571] * (80)
*        via 2001:678:d78:200:: on lo mpls 10040*
         Type: EVPN univ
         mpls_label: 10040

00:00:00:00:00:00 vlan 200 mpls 20040 unicast [evpn2 13:15:42.571] * (80)
         via 192.168.10.0 on lo mpls 20040
         Type: EVPN univ
         mpls_label: 20040
fe:54:00:f0:11:03 vlan 200 mpls 20040 unicast [evpn2 13:15:42.571] * (80)
*        via 192.168.10.0 on lo mpls 20040*
         Type: EVPN univ
         mpls_label: 20040

Of course, open to better solutions :)

groet,
Pim

[1] A:pim at asw121# show network-instance default protocols bgp routes 
evpn route-type 2 detail | more
Route Distinguisher: 65500:264
Tag-ID             : 0
MAC address        : 64:9D:99:D0:70:4D
IP Address         : 10.26.0.1
neighbor           : 198.19.16.0
path-id            : 0
Received paths     : 1
   Path 1: <Best,Valid,Used,>
     ESI               : 00:00:00:00:00:00:00:00:00:00
     Label             : 264
     Route source      : neighbor 198.19.16.0 (last modified 68d14h37m6s 
ago)
     Route preference  : No MED, LocalPref is 100
     Atomic Aggr       : false
     BGP next-hop      : 198.19.18.0
     AS Path           :  i
     Communities       : [target:65500:264, bgp-tunnel-encap:VXLAN]
     RR Attributes     : No Originator-ID, Cluster-List is []
     Aggregation       : None
     Unknown Attr      : None
     Invalid Reason    : None
     Tie Break Reason  : none
     Route Flap Damping: None

-- 
Pim van Pelt<pim at ipng.ch>
PBVP1-RIPEhttps://ipng.ch/

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://trubka.network.cz/pipermail/bird-users/attachments/20260220/cf03a3e4/attachment.htm>
-------------- next part --------------
## Manual configuration for vpp1-0

eth table etab;
eth table etab100;
eth table etab200;
evpn table evpntab;

protocol static {
  eth { table etab; };
  route eth 00:00:01:00:00:01 vlan 100 prohibit;
  route eth 00:00:02:00:00:01 vlan 200 prohibit;
}

protocol evpn {
  debug all;
  eth { table etab; };
  evpn { import all; export all; };
  rd 8298:100;
  import target (rt, 8298, 10040);
  export target (rt, 8298, 10040);
  encapsulation vxlan {
    tunnel device "vxlan0";
    router address 2001:678:d78:200::;
  };
  vni 10040;
  vid 100;
}; 

protocol evpn {
  debug all;
  eth { table etab; };
  evpn { import all; export all; };
  rd 8298:200;
  import target (rt, 8298, 20040);
  export target (rt, 8298, 20040);
  encapsulation vxlan {
    tunnel device "vxlan0";
    router address 192.168.10.0;
  };
  vni 20040;
  vid 200;
}; 

filter bgp_evpn_out {
#  if (rt, 8298, 20040) ~ bgp_ext_community then { bgp_next_hop = 2001:678:d78:200::; }
  accept;
}

template bgp T_BGP_EVPN {
  evpn { import all; export filter bgp_evpn_out; };
  local 2001:678:d78:200:: as 65512;
}

protocol bgp vpp0_1 from T_BGP_EVPN { neighbor 2001:678:d78:200::1 as 65512; }
protocol bgp vpp0_2 from T_BGP_EVPN { neighbor 2001:678:d78:200::2 as 65512; }
protocol bgp vpp0_3 from T_BGP_EVPN { neighbor 2001:678:d78:200::3 as 65512; }

protocol vppevpn bd100 {
  debug all;
  eth { table etab; import all; export all; };

  vxlan ipv6 src 2001:678:d78:200::;
  vxlan ipv4 src 192.168.10.0;
  vxlan src port 4789;
  vxlan dst port 4789;
  bridge domain 100;
  scan time 5;
  vid 100;
  vni 10040;
};

protocol vppevpn bd200 {
  debug all;
  eth { table etab; import all; export all; };

  vxlan ipv6 src 2001:678:d78:200::;
  vxlan ipv4 src 192.168.10.0;
  vxlan src port 4789;
  vxlan dst port 4789;
  bridge domain 200;
  bridge mac age 10;
  scan time 5;
  vid 200;
  vni 20040;
};


More information about the Bird-users mailing list