More BFD issues

David Petera david.petera at nic.cz
Wed Sep 11 11:19:35 CEST 2024


Hello,

On 8/10/24 12:33, Nico Schottelius via Bird-users wrote:
> Good morning bird'ers,
>
> we have a bit of a strange error in regards to bfd, on two sessions we
> get continuously the following error message:
>
> server142:
>
> --------------------------------------------------------------------------------
> 2024-08-10 09:50:25.533 <ERR> bfd1: Socket error: Destination address required
> 2024-08-10 09:50:25.738 <RMT> bfd1: Bad packet from 2a0a:e5c0:10:1:fa5e:3cff:fe2d:eafc - wrong TTL (254)
> 2024-08-10 09:50:25.840 <RMT> bfd1: Bad packet from 2a0a:e5c0:10:1:fa5e:3cff:fe2d:eb04 - wrong TTL (254)
> 2024-08-10 09:50:26.287 <ERR> bfd1: Socket error: Destination address required
> 2024-08-10 09:50:26.512 <RMT> bfd1: Bad packet from 2a0a:e5c0:10:1:fa5e:3cff:fe2d:eafc - wrong TTL (254)
> 2024-08-10 09:50:26.639 <RMT> bfd1: Bad packet from 2a0a:e5c0:10:1:fa5e:3cff:fe2d:eb04 - wrong TTL (254)
> --------------------------------------------------------------------------------
>
> The two devices with the incorrect TTL are openwrt devices. All routers
> are running BIRD version 2.15.1.
>
> Now things are getting even more interesting, but let me first show the
> rough topology:
>
> --------------------------------------------------------------------------------
>
> s141             ------|------s123
> (alpine linux)         |     (alpine linux)
> s142              -- ibgp ---s122
> (alpine linux)         |     (alpine linux)
>                         |
> vigir28          ----------- vigir29
> (openwrt)                    (openwrt)
>
> All connections are layer2, direct, vigirs only connect to servers, not
> to each other.
> --------------------------------------------------------------------------------
>
> So now comes the interesting facts:
>
> - s141 has bfd up with s122, s123, s142, vigir28
> - s142 has bfd up with s122, s123, s141, no vigir
> - s122 has bfd up with s123, s141, s142, no vigir
> - s123 has bfd up with s122, s141, s142, vigir29
> - vigir28 has bfd s141
> - vigir29 has bfd s123
>
> Each and every device can ping the other one, so I am strangely confused
> as to what is going on.
>
> Additionally, probably correctly, the bgp sessions fail to initiate
> and/or are down:
>
> --------------------------------------------------------------------------------
> s122:
>
> ibgp_s123 BGP        ---        up     2024-07-10    Established
> ibgp_s141 BGP        ---        up     2024-08-04    Established
> ibgp_s142 BGP        ---        up     2024-08-08    Established
> ibgp_vigir28 BGP        ---        start  10:17:53.412  Idle          BGP Error: Hold timer expired
> ibgp_vigir29 BGP        ---        start  10:23:57.781  Idle          BGP Error: Hold timer expired
>
> s123: (bfd & bgp fluctuate for vigir28)
> ibgp_s122 BGP        ---        up     2024-07-10    Established
> ibgp_s141 BGP        ---        up     2024-08-04    Established
> ibgp_s142 BGP        ---        up     2024-08-08    Established
> ibgp_vigir28 BGP        ---        up     10:21:16.449  Established
> ibgp_vigir29 BGP        ---        up     2024-08-08    Established
>
> s141:
> ibgp_s122 BGP        ---        up     2024-08-04    Established
> ibgp_s123 BGP        ---        up     2024-08-04    Established
> ibgp_s142 BGP        ---        up     2024-08-08    Established
> ibgp_vigir28 BGP        ---        start  10:20:42.819  OpenSent      Socket: Connection closed
> ibgp_vigir29 BGP        ---        start  10:25:10.338  OpenSent      BGP Error: Hold timer expired
>
> s142:
> ibgp_s122 BGP        ---        up     2024-08-08    Established
> ibgp_s123 BGP        ---        up     2024-08-08    Established
> ibgp_s141 BGP        ---        up     2024-08-08    Established
> ibgp_vigir28 BGP        ---        start  10:27:20.079  OpenSent      BGP Error: Hold timer expired
> ibgp_vigir29 BGP        ---        start  10:26:21.088  OpenSent      BGP Error: Hold timer expired
>
> vigir28:
> bgp1       BGP        ---        start  10:26:00.453  OpenConfirm   Received: Hold timer expired
> bgp2       BGP        ---        up     10:21:30.592  Established
> bgp3       BGP        ---        start  10:25:09.416  OpenConfirm   BGP Error: Hold timer expired
> bgp4       BGP        ---        start  10:25:07.000  OpenConfirm   Socket: Host is unreachable
>
> vigir29:
> bgp1       BGP        ---        start  10:24:21.241  OpenConfirm   Socket: Host is unreachable
> bgp2       BGP        ---        up     2024-08-08    Established
> bgp3       BGP        ---        start  10:25:15.541  OpenConfirm   Socket: Host is unreachable
> bgp4       BGP        ---        start  10:28:48.584  Idle          Received: Hold timer expired
> --------------------------------------------------------------------------------
>
>
> Some configuration samples:
>
> --------------------------------------------------------------------------------
> vigir28:
>
> log syslog all;
> router id 0.0.1.28;
>
> protocol device { }
> protocol bfd { }
>
> # Just announce, no kernel interaction
> protocol static static6 {
>          ipv6;
>          route 2a0a:e5c0:10:10::/96 unreachable;
> }
> # for getting iBGP routes
> protocol babel {
>          interface "br-lan", "wan" { type wired; authentication mac; password "...";
> };
>          ipv6 { export where (source = RTS_DEVICE) || (source = RTS_BABEL); };
> }
> protocol kernel kernel_v6 {
>          ipv6 { export where source ~ [ RTS_BABEL ]; };
> }
> protocol bgp {
>          local as 213081;
>          neighbor 2a0a:e5c0:10:1::122 as 213081;
>          direct;
>          bfd on;
>
>          ipv6 {
>                  import none;
>                  export where source ~ [ RTS_STATIC ];
>          };
> }
>
> (repeat bgp session for each ibgp peer)
> --------------------------------------------------------------------------------
>
> And s141:
>
> --------------------------------------------------------------------------------
> log stderr all;
>
> protocol device { }
>
> # Using BFD virtually everywhere, enable it globally
> protocol bfd { }
> protocol babel {
>          interface "eth*" {
>                  type wired;
>                  authentication mac;
>                  password "...";
>          };
>
>          # This matches the default of babeld: redistribute all addresses
>          # configured on local interfaces, plus re-distribute all routes received
>          # from other babel peers.
>
>          ipv4 {
>                  export where (source = RTS_DEVICE) || (source = RTS_BABEL);
>          };
>          ipv6 {
>                  export where (source = RTS_DEVICE) || (source = RTS_BABEL);
>          };
> }
> protocol bgp ibgp_vigir28 {
>      local as myas;
>      neighbor 2a0a:e5c0:10:1:fa5e:3cff:fe2d:eafc as myas;
>      direct;
>      bfd on;
>
>      ipv6 {
>        import all;
>        export filter static_and_bgp;
>
>        gateway recursive;
>      };
>
>      ipv4 {
>        import all;
>        export filter static_and_bgp;
>
>        gateway recursive;
>        extended next hop on;
>      };
> }
>
> (repeat bgp session for each ibgp peer)
> --------------------------------------------------------------------------------
>
> s122 + s123 are virtually identical, as well as s141+s142, their
> configurations are generated.
>
> Any help in this direction would be appreciated. My next try will
> probably be to disable bfd on all sessions to see if the bgp sessions
> then stay up.
>
> Best regards,
>
> Nico
>

the BFD 'bfd1: Bad packet from ... - wrong TTL (254)' error can happen 
when the associated BGP has a 'direct' option set in a setup where 
multihop BGP should be used. Some node in the middle decreases the TTL 
of the BFD packets but BFD expects no middle node.
This is probably also the reason why the BGP sessions are not UP as well.

Is is possible that the direct connections are established through 
wireguard (or something similar) in a point-to-multipoint manner?
If so, I would try to change the BGPs to multihop mode.

I have not managed to observe the 'bfd1: Socket error: Destination 
address required' in our simulated setup and would need more information 
about the topology (and the config of wireguard tunnels if they are 
used). Hope this helps, David

-- 
– David Petera (he/him) | BIRD Tech Support | CZ.NIC, z.s.p.o.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://trubka.network.cz/pipermail/bird-users/attachments/20240911/0a70dca8/attachment.htm>


More information about the Bird-users mailing list