How to fix BGP 'Invalid NEXT_HOP attribute' errors?
Ondrej Zajicek
santiago at crfreenet.org
Thu Apr 30 01:23:33 CEST 2020
On Mon, Apr 27, 2020 at 04:42:46PM +0200, Erik Zscheile wrote:
> Hi,
>
> I get relatively much "log spam" about one specific BGP session which logs dozens of 'Invalid NEXT_HOP attribute' messages.
> I haven't found much information online, but have already tried to find out which code path in BIRD produces the message.
> (The peers all have a BIRD-2.0.7 installation).
Hi
First, you need to check whether it is rx or tx error, it could happen in both direction. There are multiple code paths.
The one you find below is from tx, but there are also bgp_apply_next_hop() and bgp_decode_next_hop_ip() on rx path.
We should improve errror message to be more explicit about reason of bad next hop.
Not sure if your description of 'code paths are hit' are based on debugging or just code examination.
> The following code paths in 'proto/bgp/packets.c:bgp_update_next_hop_ip' are hit:
>
> --snip--
> /* Forbid zero next hop */
> if (ipa_zero(nh[0]) && ((len != 32) || ipa_zero(nh[1])))
> WITHDRAW(BAD_NEXT_HOP);
>
> /* Forbid next hop equal to neighbor IP */
> if (ipa_equal(peer, nh[0]) || ((len == 32) && ipa_equal(peer, nh[1])))
> WITHDRAW(BAD_NEXT_HOP);
> --end snip--
>
> Thus, I already concluded that one source of the error is a 'zeroed' next hop. But I don't really understand what the second check does and why.
> I found out that sometimes (after some printf-debugging), 'peer' is equal to 'nh[0]', but the second code path isn't always hit when that is the case, but only sometimes.
> The error only seems to happen if the BGP update message contains a next-hop with the same AF as the 'peer' has. If the peer sends the same BGP message, but via IPv6,
> and same IPv4 next_hop (which is equal to the peer ll address), the message is (somewhat inconsistent with the other behavior) accepted.
> The error does only happen if the previous if block 'if (!a || !bgp_use_next_hop(s, a))' is not hit.
> If it is hit, the next hop is updated to point at the local ll address and the message is accepted.
If condition (!a || !bgp_use_next_hop(s, a)) is not hit, then BIRD
supposes that original next hop should be used, but then it notices that
original next hop has the same IP as the peer IP, which is not allowed.
Such situation should not really happen, because route is not propagated
back to the same BGP session it was received. But it may happen if there
are multiple routers no the same network, i.e. R1 sends route to R2 (with
next hop pointing to R1), R2 sends it to R3 (keeping next hop) and then
R3 sends it back to R1 (keeping next hop). So it depends on your BGP
topology.
--
Elen sila lumenn' omentielvo
Ondrej 'Santiago' Zajicek (email: santiago at crfreenet.org)
OpenPGP encrypted e-mails preferred (KeyID 0x11DEADC3, wwwkeys.pgp.net)
"To err is human -- to blame it on a computer is even more so."
More information about the Bird-users
mailing list