generate default route and export to kernel if remote peer is up
Grant Taylor
gtaylor at tnetconsulting.net
Sat Sep 8 18:11:58 CEST 2018
On 09/08/2018 04:03 AM, Nikola Mitev wrote:
> Unfortunately no. I am creating the second peering now, the one which is
> live is through a Hurricane Electric 6in4 tunnel - it is a free service
> and I am not sure how much I can ask of them.
Okay. I have found it's often worth while to politely ask. Sometimes
people will do what you ask, or tell you how to accomplish what you want
with services that they do offer.
> My only concern here is adding the entire BGP routing table to the
> kernel table - would that be safe to do and easy enough to work with?
> Is there a better way that I'm missing?
It's my understanding that the BGP table(s) (RIB(s)) is (are) inside of
BIRD and not the actual kernel.
BIRD will then take some routes from the multiple RIB(S) that it's
processing and put them into the kernel's routing table (FIB(?)). As
such, it's perfectly safe to have multiple BGP connections (RIBs) in
BIRD. The only potential gotcha is memory. (I have no idea how much
memory is needed, it's just the only potential limitation that I can
think of.)
> You are right - as it happens I am already redesigning that part.
> Multiple default gateways distributed with DHCP seemed like an simple
> solution but it doesn't work for me - not sure what needs to happen for
> a host to actually make use of the secondary gateway.
I think relying on the hosts to do some sort of load balancing or some
having affinity to one gateway and others having affinity to the other
gateway is going to be very prone to failure.
That's where GLBP comes into play. GLBP enables the routers to appear
as a large virtual router that utilizes multiple routers. I think
there's some sort of loose proportion of what traffic goes to what
router, possibly based on tuneable settings. All clients think they
have the single gateway. I think the ""magic is done at the MAC to IP
layer. I.e. some clients think the router's MAC is aa:aa:aa:aa:aa:aa
and others think it's bb:bb:bb:bb:bb:bb. GLBP also allows one router to
act as both if the other router is offline.
Conversely, both HSRP and VRRP both function with a single /active/ device.
I don't know the current state of GLBP on non-Cisco equipment. But
that's the route that I would try to go.
> It will have to be VRRP since the routers are both PC Engines APU boards
> running Debian.
I don't know if VRRP is your only option or not. Do research on first
hop redundancy protocols (FHRP) and what's supported under Linux.
> A negative answer is still a good answer :) Should be able to script
> some pinging/BGP connection state tracking solution. Just wanted to be
> sure I'm not reinventing the wheel as it will no doubt take some time
> to test & get it right.
Agreed.
If your luck is anything like mine, the day to day steady state
operation will be easy to achieve. The problem will come with partial /
total failures of something, or a weird combination of partials on
either side.
I would still suggest some sort of dynamic routing protocol between R1
and R2. That way they both know what the other can get to, including a
default gateway if they have one.
This will allow you to remove the local gateway in the event that the
connection to the directly attached ISP is inaccessible, and the other
to learn about said problem. This is important because if the other
router also has a problem (say Backhoe Bob took out data to both ISPs)
so that they don't ping pong between each other thinking that the other
has a route to the internet.
If I were to try to script something like this today, I'd do it with a
few timers. The first being when the last outgoing traffic was sent and
the second being when the last incoming traffic was received. As long
as the second (incoming) timer is lower than first (outgoing) timer, I
think it's safe to say the connection to the ISP's router is functional.
In the event that the second (incoming) timer is higher than the first
(outgoing) timer, I'd start a third (dead gateway) timer. If the third
(dead gateway) timer ever reaches zero, then I'd know that there is a
problem with the local ISP and I'd withdraw the local default gateway.
Fortunately, BIRD has the ability to monitor kernel routing table
changes (like withdrawing a local default gateway) and update things
accordingly. This also means that the other router will learn the
status change.
I think you would then have a choice, either withdraw the local router
from the FHRP -or- stay in and use the other router (and it's default
gateway) as the route out.
This is also why it's important for both routers to have an idea of each
others state (at least for the default gateway). You want both routers
to be able to return a no route to host message quickly if there is no
functional default gateway.
> In my case it's a 6in4 tunnel which should go down if the remote goes but
> I am yet to find out in what ways may the remote fail. Seems perfectly
> possible the tunnel remains up but the BGP session breaks etc. The BGP
> session breaking doesn't mean outbound routing is broken but is likely
> to cause some asymmetric routing as the replies start coming back through
> the other ISP.
You can likely simulate a failure by adding a {bad,discard,reject} route
for the IPv4 address of the remote 6in4 tunnel endpoint, thus breaking
the tunnel in a controlled manner for testing. You could do similar for
the remote BGP endpoint too, for similar reasons.
I don't recall if you're using your own provider independent globally
routed IPs or not. If you're not, you will be in a strange situation
where the IPs that were going out provider 1 will likely not work to
come back from provider 2. In some ways, NAT does make this a little
bit better as it does provide a clean delineation of where IPs are used.
It also helps avoid the issue of provider aggregate IPs.
> If you mean BGP session traffic on TCP/179 that could work, otherwise
> my link might not be busy enough at times.
I wasn't thinking BGP traffic per say. I was thinking any traffic
coming in from the ISP. This is also why you need the separate timers
for incoming and outgoing traffic, to find the delta between them. (The
third timer is to detect if the errant state is going on for too long.)
> It's easy on a linux box. I'm thinking track the BGP session with e.g.
> 'ss -npt state established | grep :179 |wc -l' say every second and then
> doing a ping every 5s or so. It will need up/down wait timers tuned.
Such does work, particularly for humans. IMHO that doesn't work as well
for automation to monitor things.
I would likely configure a couple of IPTables rules to send (select
traffic) to a user space process via NETLINK (there might be a different
/ better method now). That way you don't need to rely on scraping
kernel status tables or the overhead of sniffing traffic. The user
space process would manage the counters and dynamically add / remove the
configured default gateway to / from the kernel routing table.
> I'll ask and see how far I get :)
Fair enough.
I would ask for the following three categories of routes:
· default gateway
· provider routes
· provider customer routes
I highly doubt that you would get, much less want to process, a full
default free zone feed from one, much less two providers.
> Thanks for your reply.
You're welcome.
Please keep me (us?) in the loop. I'm curious to learn how things turn out.
--
Grant. . . .
unix || die
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 3982 bytes
Desc: S/MIME Cryptographic Signature
URL: <http://trubka.network.cz/pipermail/bird-users/attachments/20180908/d2604550/attachment.p7s>
More information about the Bird-users
mailing list