We currently have a multihoming issue with our two upstream providers, in that if either of their circuits has a failure behind the router our border is connected to, it doesn’t release our routes so the other side can fully take over. We generally have to manually take down the circuit between our border and the affected upstream to have them terminate the session and release our routes.
I’m still learning the fundamentals of BGP, but from what I currently understand, this is a common issue that traditional configurations don’t address, plus it’s generally not a good idea to mess with the default BGP timers. We’ve previously spoken with both of them and they can’t/won’t offer any solutions. Here are some helpful links I’ve found thus far:
- https://howdoesinternetwork.com/2018/bfd
- https://archive.nanog.org/meetings/nanog45/presentations/Monday/Scholl_BFD_N45.pdf
- http://brbccie.blogspot.com/2014/06/everything-bfd.html
- https://www.reddit.com/r/networking/comments/91a5o1/bgp_convergence_time/
- https://www.ipspace.net/kb/BGPHighAvailability/30-Controlling-BGP-Convergence.html
- https://www.theroutingtable.com/rapid-bgp-fall/
BFD seems to be the solution, whereby running it on our two border routers (each with a single upstream circuit), which feed transport to our one core router, we should be able to have the affected border router quickly shutdown the BGP session and release routing to the other side without manual intervention.
I’d like to build a test lab to begin experimenting with this. Does anyone have a basic configuration that would work with this topology?
Thanks!