evpn rebase to HEAD
Pim van Pelt
pim at ipng.ch
Fri Feb 20 13:24:07 CET 2026
Hoi,
On 20.02.2026 01:31, Ondrej Zajicek wrote:
> That is the fake interface from if_get_by_name(). Using them in route
> nexthops is 'fine' on the level that it does not crash due to NULL
> dereference, but they were never supposed be used this way, they are
> just placeholders for configuration.
>
> Note that these fake interfaces are horrible hack in BIRD code, as
> properly there should be two distinct structures: iface_config and
> iface, the former representing interface referenced in config file, and
> the latter representing real kernel interfaces found by 'device' protocol.
> But we use the same structure for both cases.
Understood - once iface_config and iface are split, I can make use of
either construct (the iface_config one makes more sense). Neither the
interface name or kernel device are necessary in my implementation.
> I wonder if your setup would work, if you instead of using this fake interface
> use some real placeholder interface, say loopback:
>
> 'encapsulation vxlan { tunnel device "lo"; };'
It works fine. As an aside, reconfiguring causes a restart of evpn
protocol, which trips an assertion and crashes. The crash also happens
on 'birdc disable evpn1'.
Feb 20 12:12:29 vpp0-3 bird[1455113]: Restarting protocol evpn1
Feb 20 12:12:29 vpp0-3 bird[1455113]: Assertion 'pub->queue &&
pub->topic' failed at lib/pubsub.c:161
Feb 20 12:12:29 vpp0-3 systemd[1]: bird-dataplane.service: Main process
exited, code=killed, status=11/SEGV
Either way, Bird comes back up and works just fine using tunnel_dev set
to "lo". It reminds me that I already use this trick, as MAC addresses
learned from VPP's bridge-domain do not have any corresponding Linux or
Bird interface, so I inject them into etab using "lo" as well.
> The 'cheat' have to be modified (it should wait for the interface,
> but will ignore the fact that the interface is not a tunnel (i.e.
> skip/ignore evpn_validate_iface_attrs()).
I like that. Perhaps a keyword in the config can signal that this is OK,
like 'tunnel device "evpn0-dummy" virtual;' or just 'tunnel device "lo"
virtual;'
> Note that you should read IMET from etab too. EVPN protocol translate
> all IMETs from evpntab to etab, otherwise even our kernel-based setup
> would not work -- 'bridge' protocol that configures kernel bridge also
> reads just etab.
I do not have multiple IMETs in etab, only one:
root at vpp0-0:/etc/bird# birdc show route table evpntab | grep imet
evpn imet 8298:100 0 2001:678:d78:200::3 [vpp0_3 12:12:38.484 from
2001:678:d78:200::3] * (100) [i]
evpn imet 8298:100 0 2001:678:d78:200::2 [vpp0_2 11:18:21.821 from
2001:678:d78:200::2] * (100) [i]
evpn imet 8298:100 0 2001:678:d78:200::1 [vpp0_1 11:18:21.253 from
2001:678:d78:200::1] * (100) [i]
evpn imet 8298:100 0 2001:678:d78:200:: unicast [evpn1 11:18:07.285] * (120)
root at vpp0-0:/etc/bird# birdc show route table etab | grep 00:00:00:
00:00:00:00:00:00 vlan 100 mpls 10040 unicast [evpn1 12:12:38.484] * (80)
Perhaps I'm holding it wrong (see bird-example.conf). It would actually
be super if I could rely /only/ on etab, as tracking both etab and
evpntab was a fair amount of extra code.
> I agree and this split of work between 'evpn' protocol and 'bridge' protocol
> (with separate 'evpn table' and 'eth table') are going to stay.
Thank you! That's great news for me.
>> I am happy to share the 'vppevpn' protocol with others also, as an example
>> '3P integration'. I do not expect it to be upstreamed into Bird2, unless
>> there are community requests for it.
>> Ondrej, do let me know if you'd like to take a sneak peak at my code (it's
>> in a private repo for now, as it's not ready for wider review yet, but it is
>> mostly functional).
> Having better integration with VPP (or some other userspace dataplane)
> is something we are interested in general, but i would not look at it
> before i finish some other tasks (including merging EVPN) as i am rather
> overwhelmed.
I can volunteer my time to write a vpp protocol (for ip4, ip6, mpls FIB
and interfaces). I'll contact you separately for that, it sounds like a
worthwhile project and I've kind of always wanted to do it.
>>>>>> (3) Setting BGP Next Hop clears MPLS Labelstack, filters cannot set this.
>>>>>> When the BGP Next Hop is changed by an export filter, we lose the MPLS
>>>>>> labelstack. There is no way to add MPLS labelstack in filters (at least,
>>>>>> that I could find), so we cannot use 'next hop address X' to determine the
>>>>>> Type-2 MAC VxLAN endpoint. Note: IMET updates do not use the BGP Next Hop,
>>>>>> but rather a PSMI attribute with the 'router address' already.
>>>>> Resetting MPLS label when changing next hop is intentional, as MPLS labels are
>>>>> (in general) specific to receiving routers.
>>>>>
>>>>> There is gw_mpls (and undocumented/semantically broken gw_mpls_stack)
>>>>> attribute that could be accessed in filters.
>>>>>
>>>>> I am not sure what is your use case here to change it with filters, can
>>>>> you describe it more? What about setting 'router address' in EVPN proto?
>>>> With the oz-evpn branch as-is, setting 'router address' in evpn proto will:
>>>> 1) copy that to the PSMI attribute: good
>>>> 2) not do anything for MAC announcements; they will have BGP.next_hop set to
>>>> the session address.
>>>>
>>>> if the previous patch in (2) is accepted, then 'router address' will be used
>>>> as BGP.next_hop, which will avoid the need to change it with filters with
>>>> (3).
>>> Oh, i see. You are right, this should work automatically for both IMET / PMSI
>>> and MAC.
>>>
>>> I do not like using regular/immediate next hops here in EVPN table, as
>>> it does not fit well semantically and requires formal device. But seems
>>> to me that a reasonable alternative would be to just attach BGP_NEXT_HOP
>>> by EVPN protocol, similarly how BGP_PMSI_TUNNEL is attached. Wil do that.
>>> Any comments?
>> If you were to attach a specific attribute like vxlan_nexthop or vxlan_vni
>> to the etab table entry, I would simply read that and use it instead of the
>> bgp nexthop. That's what happens already today for IMET, as it has the
>> BGP.pmsi_tunnel attribute with the needed ingress-replication
>> 2001:678:d78:200::2 mpls 10040 information. How do other vendors (say
>> Arista, Cisco, Nokia, FRRouting) handle the Type-2 nexthop? My understanding
>> is they use BGP next hop for that (in other words, the same as how Bird does
>> it today).
> I think there is some confusion here. I am talking about evpntab
> entries, not about etab entries. And about your patch that sets router
> IP into their immediate next hop (nh.gw).
I see - then maybe I can try a different approach. The patch, I thought,
makes Bird behave the same as Nokia SRLinux {1], which also sets the
router ip (the local VTEP) as nexthop but what you're saying is I should
not set the /immediate/ nexthop, but rather leave that alone and set the
/BGP Next Hop/? Although as a reminder, I need to be able to set an IPv4
BGP Next Hop on an IPv6 session only for some RTs, not all. See one more
thought on that below ..
>> Not sure I understand what you mean - don't we have this problem also for
>> kernel based vxlan? If we create a vxlan0 interface in a bridge, and set a
>> fdb entry onto it, we also need to know which vxlan nexthop to use. The way
>> I read 'evpn' and 'oz-evpn', we use the BGP nexthop for that purpose.
>> However, if what you're saying is you'd want to remove the BGP Next Hop and
>> instead have an EVPN VxLAN Next Hop attribute to populate the 'etab' gateway
>> field that would work just as well for me. I kind of wonder why you'd go to
>> the trouble obfuscating the BGP Next Hop. Don't other vendors use the same
>> thing (send vxlan packet to the address learned via the BGP Next Hop in
>> Type-2 announcements) ?
> I just mean that immediate next hop fields for evpntab routes received
> through BGP are irrelevant, while the BGP next hop attribute is the
> important one. When 'evpn' protocol takes a route from evpntab and convert
> it to etab entry, it examines BGP next hop, not immediate next hop.
OK I think I understand now.
>>> While i agree that it should work automatically by just setting router
>>> address in protocol evpn, i think that this setup that should work even
>>> without patches:
>>>
>>> protocol evpn {
>>> ...
>>> encapsulation vxlan { router address 192.0.2.1; };
>>> }
>>> protocol bgp {
>>> evpn { import all; export all; next hop address 192.0.2.1; };
>>> local 2001:db8::1 as 65512;
>>> neighbor 2001:db8::2 as 65512;
>>> }
>> I don't think this works for MAC, for IMET it works because that has a
>> custom PSMI BGP attribute which is set to encap0->router_addr). Setting the
>> next hop in this way will clear the mpls labelstack. So we'd end up with:
>> fe:54:00:f0:11:02 vlan 100 mpls 0 unicast [evpn1 2026-02-19] * (80)
>> via 192.0.2.1 on vxlan0 mpls 0
>> and we'd lose the VNI.
> I think it will not clear the MPLS labelstack. This is not setting next
> hop in filters. The difference between
>
> evpn { import all; export all; next hop address 192.0.2.1; };
>
> and
>
> evpn { import all; export all; };
>
> in BGP protocol export is only where the BGP next hop value is taken
> from (explicitly configured one or source address from BGP session), but
> route processing is the same. See bgp_update_next_hop_ip(), the
> !bgp_use_next_hop(s, a) and !bgp_use_gateway(s) case.
I tried this, and you are correct that 'next hop address' works and
leaves the MPLS labelstack alone:
00:00:00:00:00:00 vlan 200 mpls 20040 unicast [evpn2 12:54:44.880] * (80)
via 192.168.10.0 on lo mpls 20040
fe:54:00:f0:11:03 vlan 200 mpls 20040 unicast [evpn2 12:54:44.880] * (80)
via 192.168.10.0 on lo mpls 20040
Now let's suppose I have two evpn protocols, one with an IPv4 router
address and one with an IPv6 router address. In this scenario, I can't
use 'next hop address' because it'll force both to use that address family.
It yields a bad state:
1) as before, the IPv4-only evpn (VNI 20040) works
2) but now, the evpn with an IPv6 router address, sends IMET with IPv6,
and MAC with IPv4
00:00:00:00:00:00 vlan 100 mpls 10040 unicast [evpn1 13:07:07.455] * (80)
via 2001:678:d78:200:: on lo mpls 10040
Type: EVPN univ
mpls_label: 10040
fe:54:00:f0:11:02 vlan 100 mpls 10040 unicast [evpn1 13:07:07.455] * (80)
via 192.168.10.0 on lo mpls 10040
Type: EVPN univ
mpls_label: 10040
An obvious solution is to use a filter, like this one:
filter bgp_evpn_out {
if (rt, 8298, 10040) ~ bgp_ext_community then { bgp_next_hop =
192.168.10.3; }
if (rt, 8298, 20040) ~ bgp_ext_community then { bgp_next_hop =
2001:678:d78:200::3; }
accept;
}
template bgp T_BGP_EVPN {
evpn { import all; export filter bgp_evpn_out; };
local 2001:678:d78:200::3 as 65512;
}
But now the filter does destroy the MPLS labelstack, although the
mpls_label attribute remains:
00:00:00:00:00:00 vlan 100 mpls 10040 unicast [evpn1 13:00:01.031] * (80)
via 2001:678:d78:200:: on lo mpls 10040
Type: EVPN univ
mpls_label: 10040
fe:54:00:f0:11:02 vlan 100 mpls 10040 unicast [evpn1 13:00:01.031] * (80)
* via 2001:678:d78:200:: on lo mpls 0*
Type: EVPN univ
mpls_label: 10040
00:00:00:00:00:00 vlan 200 mpls 20040 unicast [evpn2 13:00:01.031] * (80)
via 192.168.10.0 on lo mpls 20040
Type: EVPN univ
mpls_label: 20040
fe:54:00:f0:11:03 vlan 200 mpls 20040 unicast [evpn2 13:00:01.031] * (80)
* via 192.168.10.0 on lo mpls 0*
Type: EVPN univ
mpls_label: 20040
My conclusion was: I need to be able to apply filters without destroying
the MPLS labels. If I now understand correctly, I can remove the
nh.gw/nh.iface from evpn_announce_mac() and evpn_announce_imet(), but
keep the change in bgp_update_next_hop_ip()
@@ -1314,19 +1310,6 @@ bgp_update_next_hop_ip(struct bgp_export_state
*s, eattr *a, ea_list **to)
}
}
+ /* For L2VPN (EVPN): ensure MPLS label stack is set even if next hop
was filter-overridden */
+ if (s->mpls && bgp_channel_is_l2vpn(s->channel) && !bgp_find_attr(*to,
BA_MPLS_LABEL_STACK))
+ {
+ rta *ra = s->route->attrs;
+ if (ra->nh.labels)
+ bgp_set_attr_data(to, s->pool, BA_MPLS_LABEL_STACK, 0,
ra->nh.label, ra->nh.labels * 4);
+ else
+ {
+ u32 label = ea_get_int(ra->eattrs, EA_MPLS_LABEL, BGP_MPLS_NULL);
+ bgp_set_attr_data(to, s->pool, BA_MPLS_LABEL_STACK, 0, &label, 4);
+ }
+ }
This allows the above filter to work while preserving the labelstack:
00:00:00:00:00:00 vlan 100 mpls 10040 unicast [evpn1 13:15:42.571] * (80)
via 2001:678:d78:200:: on lo mpls 10040
Type: EVPN univ
mpls_label: 10040
fe:54:00:f0:11:02 vlan 100 mpls 10040 unicast [evpn1 13:15:42.571] * (80)
* via 2001:678:d78:200:: on lo mpls 10040*
Type: EVPN univ
mpls_label: 10040
00:00:00:00:00:00 vlan 200 mpls 20040 unicast [evpn2 13:15:42.571] * (80)
via 192.168.10.0 on lo mpls 20040
Type: EVPN univ
mpls_label: 20040
fe:54:00:f0:11:03 vlan 200 mpls 20040 unicast [evpn2 13:15:42.571] * (80)
* via 192.168.10.0 on lo mpls 20040*
Type: EVPN univ
mpls_label: 20040
Of course, open to better solutions :)
groet,
Pim
[1] A:pim at asw121# show network-instance default protocols bgp routes
evpn route-type 2 detail | more
Route Distinguisher: 65500:264
Tag-ID : 0
MAC address : 64:9D:99:D0:70:4D
IP Address : 10.26.0.1
neighbor : 198.19.16.0
path-id : 0
Received paths : 1
Path 1: <Best,Valid,Used,>
ESI : 00:00:00:00:00:00:00:00:00:00
Label : 264
Route source : neighbor 198.19.16.0 (last modified 68d14h37m6s
ago)
Route preference : No MED, LocalPref is 100
Atomic Aggr : false
BGP next-hop : 198.19.18.0
AS Path : i
Communities : [target:65500:264, bgp-tunnel-encap:VXLAN]
RR Attributes : No Originator-ID, Cluster-List is []
Aggregation : None
Unknown Attr : None
Invalid Reason : None
Tie Break Reason : none
Route Flap Damping: None
--
Pim van Pelt<pim at ipng.ch>
PBVP1-RIPEhttps://ipng.ch/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://trubka.network.cz/pipermail/bird-users/attachments/20260220/cf03a3e4/attachment.htm>
-------------- next part --------------
## Manual configuration for vpp1-0
eth table etab;
eth table etab100;
eth table etab200;
evpn table evpntab;
protocol static {
eth { table etab; };
route eth 00:00:01:00:00:01 vlan 100 prohibit;
route eth 00:00:02:00:00:01 vlan 200 prohibit;
}
protocol evpn {
debug all;
eth { table etab; };
evpn { import all; export all; };
rd 8298:100;
import target (rt, 8298, 10040);
export target (rt, 8298, 10040);
encapsulation vxlan {
tunnel device "vxlan0";
router address 2001:678:d78:200::;
};
vni 10040;
vid 100;
};
protocol evpn {
debug all;
eth { table etab; };
evpn { import all; export all; };
rd 8298:200;
import target (rt, 8298, 20040);
export target (rt, 8298, 20040);
encapsulation vxlan {
tunnel device "vxlan0";
router address 192.168.10.0;
};
vni 20040;
vid 200;
};
filter bgp_evpn_out {
# if (rt, 8298, 20040) ~ bgp_ext_community then { bgp_next_hop = 2001:678:d78:200::; }
accept;
}
template bgp T_BGP_EVPN {
evpn { import all; export filter bgp_evpn_out; };
local 2001:678:d78:200:: as 65512;
}
protocol bgp vpp0_1 from T_BGP_EVPN { neighbor 2001:678:d78:200::1 as 65512; }
protocol bgp vpp0_2 from T_BGP_EVPN { neighbor 2001:678:d78:200::2 as 65512; }
protocol bgp vpp0_3 from T_BGP_EVPN { neighbor 2001:678:d78:200::3 as 65512; }
protocol vppevpn bd100 {
debug all;
eth { table etab; import all; export all; };
vxlan ipv6 src 2001:678:d78:200::;
vxlan ipv4 src 192.168.10.0;
vxlan src port 4789;
vxlan dst port 4789;
bridge domain 100;
scan time 5;
vid 100;
vni 10040;
};
protocol vppevpn bd200 {
debug all;
eth { table etab; import all; export all; };
vxlan ipv6 src 2001:678:d78:200::;
vxlan ipv4 src 192.168.10.0;
vxlan src port 4789;
vxlan dst port 4789;
bridge domain 200;
bridge mac age 10;
scan time 5;
vid 200;
vni 20040;
};
More information about the Bird-users
mailing list