More BFD issues
David Petera
david.petera at nic.cz
Wed Sep 11 11:19:35 CEST 2024
Hello,
On 8/10/24 12:33, Nico Schottelius via Bird-users wrote:
> Good morning bird'ers,
>
> we have a bit of a strange error in regards to bfd, on two sessions we
> get continuously the following error message:
>
> server142:
>
> --------------------------------------------------------------------------------
> 2024-08-10 09:50:25.533 <ERR> bfd1: Socket error: Destination address required
> 2024-08-10 09:50:25.738 <RMT> bfd1: Bad packet from 2a0a:e5c0:10:1:fa5e:3cff:fe2d:eafc - wrong TTL (254)
> 2024-08-10 09:50:25.840 <RMT> bfd1: Bad packet from 2a0a:e5c0:10:1:fa5e:3cff:fe2d:eb04 - wrong TTL (254)
> 2024-08-10 09:50:26.287 <ERR> bfd1: Socket error: Destination address required
> 2024-08-10 09:50:26.512 <RMT> bfd1: Bad packet from 2a0a:e5c0:10:1:fa5e:3cff:fe2d:eafc - wrong TTL (254)
> 2024-08-10 09:50:26.639 <RMT> bfd1: Bad packet from 2a0a:e5c0:10:1:fa5e:3cff:fe2d:eb04 - wrong TTL (254)
> --------------------------------------------------------------------------------
>
> The two devices with the incorrect TTL are openwrt devices. All routers
> are running BIRD version 2.15.1.
>
> Now things are getting even more interesting, but let me first show the
> rough topology:
>
> --------------------------------------------------------------------------------
>
> s141 ------|------s123
> (alpine linux) | (alpine linux)
> s142 -- ibgp ---s122
> (alpine linux) | (alpine linux)
> |
> vigir28 ----------- vigir29
> (openwrt) (openwrt)
>
> All connections are layer2, direct, vigirs only connect to servers, not
> to each other.
> --------------------------------------------------------------------------------
>
> So now comes the interesting facts:
>
> - s141 has bfd up with s122, s123, s142, vigir28
> - s142 has bfd up with s122, s123, s141, no vigir
> - s122 has bfd up with s123, s141, s142, no vigir
> - s123 has bfd up with s122, s141, s142, vigir29
> - vigir28 has bfd s141
> - vigir29 has bfd s123
>
> Each and every device can ping the other one, so I am strangely confused
> as to what is going on.
>
> Additionally, probably correctly, the bgp sessions fail to initiate
> and/or are down:
>
> --------------------------------------------------------------------------------
> s122:
>
> ibgp_s123 BGP --- up 2024-07-10 Established
> ibgp_s141 BGP --- up 2024-08-04 Established
> ibgp_s142 BGP --- up 2024-08-08 Established
> ibgp_vigir28 BGP --- start 10:17:53.412 Idle BGP Error: Hold timer expired
> ibgp_vigir29 BGP --- start 10:23:57.781 Idle BGP Error: Hold timer expired
>
> s123: (bfd & bgp fluctuate for vigir28)
> ibgp_s122 BGP --- up 2024-07-10 Established
> ibgp_s141 BGP --- up 2024-08-04 Established
> ibgp_s142 BGP --- up 2024-08-08 Established
> ibgp_vigir28 BGP --- up 10:21:16.449 Established
> ibgp_vigir29 BGP --- up 2024-08-08 Established
>
> s141:
> ibgp_s122 BGP --- up 2024-08-04 Established
> ibgp_s123 BGP --- up 2024-08-04 Established
> ibgp_s142 BGP --- up 2024-08-08 Established
> ibgp_vigir28 BGP --- start 10:20:42.819 OpenSent Socket: Connection closed
> ibgp_vigir29 BGP --- start 10:25:10.338 OpenSent BGP Error: Hold timer expired
>
> s142:
> ibgp_s122 BGP --- up 2024-08-08 Established
> ibgp_s123 BGP --- up 2024-08-08 Established
> ibgp_s141 BGP --- up 2024-08-08 Established
> ibgp_vigir28 BGP --- start 10:27:20.079 OpenSent BGP Error: Hold timer expired
> ibgp_vigir29 BGP --- start 10:26:21.088 OpenSent BGP Error: Hold timer expired
>
> vigir28:
> bgp1 BGP --- start 10:26:00.453 OpenConfirm Received: Hold timer expired
> bgp2 BGP --- up 10:21:30.592 Established
> bgp3 BGP --- start 10:25:09.416 OpenConfirm BGP Error: Hold timer expired
> bgp4 BGP --- start 10:25:07.000 OpenConfirm Socket: Host is unreachable
>
> vigir29:
> bgp1 BGP --- start 10:24:21.241 OpenConfirm Socket: Host is unreachable
> bgp2 BGP --- up 2024-08-08 Established
> bgp3 BGP --- start 10:25:15.541 OpenConfirm Socket: Host is unreachable
> bgp4 BGP --- start 10:28:48.584 Idle Received: Hold timer expired
> --------------------------------------------------------------------------------
>
>
> Some configuration samples:
>
> --------------------------------------------------------------------------------
> vigir28:
>
> log syslog all;
> router id 0.0.1.28;
>
> protocol device { }
> protocol bfd { }
>
> # Just announce, no kernel interaction
> protocol static static6 {
> ipv6;
> route 2a0a:e5c0:10:10::/96 unreachable;
> }
> # for getting iBGP routes
> protocol babel {
> interface "br-lan", "wan" { type wired; authentication mac; password "...";
> };
> ipv6 { export where (source = RTS_DEVICE) || (source = RTS_BABEL); };
> }
> protocol kernel kernel_v6 {
> ipv6 { export where source ~ [ RTS_BABEL ]; };
> }
> protocol bgp {
> local as 213081;
> neighbor 2a0a:e5c0:10:1::122 as 213081;
> direct;
> bfd on;
>
> ipv6 {
> import none;
> export where source ~ [ RTS_STATIC ];
> };
> }
>
> (repeat bgp session for each ibgp peer)
> --------------------------------------------------------------------------------
>
> And s141:
>
> --------------------------------------------------------------------------------
> log stderr all;
>
> protocol device { }
>
> # Using BFD virtually everywhere, enable it globally
> protocol bfd { }
> protocol babel {
> interface "eth*" {
> type wired;
> authentication mac;
> password "...";
> };
>
> # This matches the default of babeld: redistribute all addresses
> # configured on local interfaces, plus re-distribute all routes received
> # from other babel peers.
>
> ipv4 {
> export where (source = RTS_DEVICE) || (source = RTS_BABEL);
> };
> ipv6 {
> export where (source = RTS_DEVICE) || (source = RTS_BABEL);
> };
> }
> protocol bgp ibgp_vigir28 {
> local as myas;
> neighbor 2a0a:e5c0:10:1:fa5e:3cff:fe2d:eafc as myas;
> direct;
> bfd on;
>
> ipv6 {
> import all;
> export filter static_and_bgp;
>
> gateway recursive;
> };
>
> ipv4 {
> import all;
> export filter static_and_bgp;
>
> gateway recursive;
> extended next hop on;
> };
> }
>
> (repeat bgp session for each ibgp peer)
> --------------------------------------------------------------------------------
>
> s122 + s123 are virtually identical, as well as s141+s142, their
> configurations are generated.
>
> Any help in this direction would be appreciated. My next try will
> probably be to disable bfd on all sessions to see if the bgp sessions
> then stay up.
>
> Best regards,
>
> Nico
>
the BFD 'bfd1: Bad packet from ... - wrong TTL (254)' error can happen
when the associated BGP has a 'direct' option set in a setup where
multihop BGP should be used. Some node in the middle decreases the TTL
of the BFD packets but BFD expects no middle node.
This is probably also the reason why the BGP sessions are not UP as well.
Is is possible that the direct connections are established through
wireguard (or something similar) in a point-to-multipoint manner?
If so, I would try to change the BGPs to multihop mode.
I have not managed to observe the 'bfd1: Socket error: Destination
address required' in our simulated setup and would need more information
about the topology (and the config of wireguard tunnels if they are
used). Hope this helps, David
--
– David Petera (he/him) | BIRD Tech Support | CZ.NIC, z.s.p.o.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://trubka.network.cz/pipermail/bird-users/attachments/20240911/0a70dca8/attachment.htm>
More information about the Bird-users
mailing list