Anyone using BFD in a multihoming configuration?

eronlloyd · November 24, 2021, 3:28pm

We currently have a multihoming issue with our two upstream providers, in that if either of their circuits has a failure behind the router our border is connected to, it doesn’t release our routes so the other side can fully take over. We generally have to manually take down the circuit between our border and the affected upstream to have them terminate the session and release our routes.

I’m still learning the fundamentals of BGP, but from what I currently understand, this is a common issue that traditional configurations don’t address, plus it’s generally not a good idea to mess with the default BGP timers. We’ve previously spoken with both of them and they can’t/won’t offer any solutions. Here are some helpful links I’ve found thus far:

BFD seems to be the solution, whereby running it on our two border routers (each with a single upstream circuit), which feed transport to our one core router, we should be able to have the affected border router quickly shutdown the BGP session and release routing to the other side without manual intervention.

I’d like to build a test lab to begin experimenting with this. Does anyone have a basic configuration that would work with this topology?

Thanks!

p252 · November 24, 2021, 4:35pm

Are your “upstream providers” an ISP (Internet) or private link (like MPLS)? BFD generally requires peer neighbor relationship so the “upstream provider” would also need to support BFD and configure their side to peer with your CPE. ISPs usually do not support this as there is very good reason for not having quick link failure detection for links attached to the internet; this is why BGP timers are so high. A flapping link could cause a cascade of Internet routers needing to constantly recalculate routes, imagine thousands of those all happening all across the Internet. Private link providers (like MPLS) usually support BFD and it is a very good idea to use. I have personally only used BFD with Cisco and Palo Alto, but, concept is the same.

eronlloyd · November 25, 2021, 3:07am

These are ISPs, yes. I agree on the BGP timers risk in this case, which is why I hoped BFD within our own routing cluster might offer a solution. I’ll keep digging, as I would think this is a common configuration challenge.

dtoux · November 26, 2021, 3:55pm

We have a distributed network with multiple remote nodes connected via wireguard VPN and we run OSPF over it. We use BFD with OSPF on those VPN links. It works reasonably well with IPv4/OSPF but it doesn’t work with IPv6/OSPFv3. Failovers are very fast that we see no frames lost during video calls. This is not the case with IPv6 though (i.e. OSPFv3 over wireguard), but we have an open ticket for VyOS and we hope that it will be addressed soon.

We tried it with iBGP in April/May 2021 with 1.3.0 nightly builds and there were a few issues with BFD and multihop, so it didn’t work for us. We stopped using iBGP since so I don’t know the current status and if issues have been fixed (we opened a couple of tickets about it back then).