Bird 2.0.7 not accepting BGP connections in a VRF

William bird at is.unlawful.id.au
Thu Aug 26 04:49:11 CEST 2021


Hi All,
I have a Debian 11 host (5.10.0-8-amd64 kernel) with Bird 2.0.7.  There 
is a VRF set up called CORE.  Originally VRF and route table ID were 
high (>100000000) but everything seemed fine apart from the fact that 
even though Bird was listening for BGP connections in the VRf, nothing 
was actually being received by the daemon.  I have since dropped the VRF 
ID and routing table down to 32, but no change.

bond0.64 is in the VRF, with 198.18.64.254/30 as its IP.  Neighbour 
device has 198.18.64.253, and the can ping each other happily.  No 
firewall is configured.

Bird listening on the VRF interface (with SSH in MGMT VRF which works 
fine):
# ss -tuplne --cgroup --tipcinfo
Netid           State            Recv-Q            Send-Q                
             Local Address:Port                       Peer Address:Port   
         Process
tcp             LISTEN           0                 8                     
        198.18.64.254%CORE:179                             0.0.0.0:*      
          users:(("bird",pid=3807,fd=6)) uid:106 ino:62370 sk:102a 
cgroup:/system.slice/bird.service <-> cgroup:/system.slice/bird.service
tcp             LISTEN           0                 128                   
              0.0.0.0%MGMT:22                              0.0.0.0:*      
          users:(("sshd",pid=824,fd=3)) ino:12780 sk:1025 
cgroup:/system.slice/ssh.service/vrf/MGMT <-> 
cgroup:/system.slice/ssh.service/vrf/MGMT
tcp             LISTEN           0                 128                   
                 [::]%MGMT:22                                 [::]:*      
          users:(("sshd",pid=824,fd=4)) ino:12790 sk:1026 
cgroup:/system.slice/ssh.service/vrf/MGMT v6only:1 <-> 
cgroup:/system.slice/ssh.service/vrf/MGMT
#

Normally Bird is not started inside the VRF, but even when it does, I 
get no change.  I have tried changing the interface that the BGP 
protocol is listening on, no change there.

Here is the template that the BGP session is initiated from:

template bgp CORE_RTR_BGP4_TMPL {
   # Will use IP of interface neighbour is connected to as IP is not set.
   # Override with "source address <ip>" in protocol definition or
   # 'interface "string"' for IPv6.  See Bird docs, section 6.3.4
   local 198.18.64.254 as CORE_my_asn;
#  vrf "CORE";
   ipv4 {
     import filter CORE_bgp_import4;
     export filter CORE_bgp_export4;
     table CORE_Routes4;
     next hop self;
   };
   vrf CORE_vrf;
#  interface "bond0.64";
#  interface "CORE";
#  direct;
   strict bind on;
}

and here is the protocol def itself:
protocol bgp CORE_LabRTR2_BGP4 from CORE_RTR_BGP4_TMPL {
   description "IPv4 BGP to LabRTR2";
   neighbor 198.18.64.253 as CORE_rcore_asn;
#  disabled;
   passive;
}

I have tried all variants of interface & launching inside and outside 
the VRF with no change.
If I connected locally inside the vrf I can connect to the daemon:

# ip vrf exec CORE telnet -b 198.18.64.254 198.18.64.254 179
Trying 198.18.64.254...
Connected to 198.18.64.254.
Escape character is '^]'.
Connection closed by foreign host.
#

The same command run from the other node (binding IP changed of course), 
no connection.

Packet dump shows the TCP SYNs coming in, but no change.
# tcpdump -enni bond0.64
tcpdump: verbose output suppressed, use -v[v]... for full protocol 
decode
listening on bond0.64, link-type EN10MB (Ethernet), snapshot length 
262144 bytes
12:41:46.465300 02:c0:7e:18:79:d3 > 02:c0:7e:4a:57:e6, ethertype IPv4 
(0x0800), length 74: 198.18.64.253.34525 > 198.18.64.254.179: Flags [S], 
seq 2959174086, win 64240, options [mss 1460,sackOK,TS val 658554227 ecr 
0,nop,wscale 7], length 0
12:41:47.495032 02:c0:7e:18:79:d3 > 02:c0:7e:4a:57:e6, ethertype IPv4 
(0x0800), length 74: 198.18.64.253.34525 > 198.18.64.254.179: Flags [S], 
seq 2959174086, win 64240, options [mss 1460,sackOK,TS val 658555257 ecr 
0,nop,wscale 7], length 0
12:41:49.511037 02:c0:7e:18:79:d3 > 02:c0:7e:4a:57:e6, ethertype IPv4 
(0x0800), length 74: 198.18.64.253.34525 > 198.18.64.254.179: Flags [S], 
seq 2959174086, win 64240, options [mss 1460,sackOK,TS val 658557273 ecr 
0,nop,wscale 7], length 0
12:41:53.639028 02:c0:7e:18:79:d3 > 02:c0:7e:4a:57:e6, ethertype IPv4 
(0x0800), length 74: 198.18.64.253.34525 > 198.18.64.254.179: Flags [S], 
seq 2959174086, win 64240, options [mss 1460,sackOK,TS val 658561401 ecr 
0,nop,wscale 7], length 0
^C
4 packets captured
4 packets received by filter
0 packets dropped by kernel
#

I tried re-arranging the vrf multi-layer routing rules then adding more 
specific rules, no change:

# ip rule
64:	from all oif bond0.64 lookup CORE
64:	from all iif bond0.64 lookup CORE
1000:	from all lookup [l3mdev-table]
32765:	from all lookup local
32766:	from all lookup main
32767:	from all lookup default

VRF routing table:
# ip r s t CORE
broadcast 198.18.64.252 dev bond0.64 proto kernel scope link src 
198.18.64.254
198.18.64.252/30 dev bond0.64 proto kernel scope link src 198.18.64.254
198.18.64.252/30 dev bond0.64 proto bird scope link metric 32
local 198.18.64.254 dev bond0.64 proto kernel scope host src 
198.18.64.254
broadcast 198.18.64.255 dev bond0.64 proto kernel scope link src 
198.18.64.254
broadcast 198.19.64.0 dev CORE_IP proto kernel scope link src 
198.19.64.1
198.19.64.0/24 dev CORE_IP proto kernel scope link src 198.19.64.1
198.19.64.0/24 dev CORE_IP proto bird scope link metric 32
local 198.19.64.1 dev CORE_IP proto kernel scope host src 198.19.64.1
broadcast 198.19.64.255 dev CORE_IP proto kernel scope link src 
198.19.64.1
#

CORE_IP is a dummy interface in the VRF to hold an extra subnets to 
advertise to the other node.

I've seriously run out of ideas.  I started an instance of sshd in the 
VRF too and I'm getting the same thing, but can't figure out why, might 
be related, might not be.  I don't think I've missed anything.

As a last resort I even used strace and attached to the bird process.  
Nothing was showing as being received by the Bird process, but 
connecting locally did.

I have also set rp_filter=0 on related interfaces.

Anyone got any other thoughts/experiences/hacks?  I am stumped why sshd 
in the MGMT vrf works, but not in the CORE VRF too, but that might be 
something else (or it might not be).

Thanks.

Regards,
William


More information about the Bird-users mailing list