Can someone clear something up for me? There are 2 apparent locations where you can set BFD.
#1 set protocols bfd peer … source address … #2 set protocols bgp … neighbor … bfd
I have a VyOS router running 1.3.2 and I only have it set in #1 location and it shows a BFD session with the other end of the BGP link, which is an EdgeOS router in this case. Is that all I need or do I need to also set bfd on the BGP neighbor?
If you want to shorten BGP convergence time (i.e. fast detection of neighbor down) - you have to configure #2 also. Or it will use default BGP timers for peer down detection.
So I gave that a try tonight, but it seems that is not working correctly. With BFD set only as #1 ‘show bfd peers brief’ shows the active peers with “up” status.
If I add #2 also, I get a duplicate set of BFD peers that are not up. See the following screenshot, this router has 3 BFD peers.
vyos@R1:~$ show bfd peers brief
Session count: 1
SessionId LocalAddress PeerAddress Status
========= ============ =========== ======
3951150886 0.0.0.0 10.10.10.2 up
vyos@R1:~$ show bgp summary
IPv4 Unicast Summary (VRF default):
BGP router identifier 192.168.1.16, local AS number 65000 vrf-id 0
BGP table version 0
RIB entries 0, using 0 bytes of memory
Peers 1, using 20 KiB of memory
Neighbor V AS MsgRcvd MsgSent TblVer InQ OutQ Up/Down State/PfxRcd PfxSnt Desc
10.10.10.2 4 65000 7 7 0 0 0 00:04:24 0 0 N/A
Total number of neighbors 1
vyos@R1:~$
BTW, could you pls attach your full configs? That’s the first thing to do actually if you want your question answered
Another idea - source address in BFD peer config. I don’t think it’s necessary (can’t check on 1.3.2 myself as I’m using rolling releases, tested on 1.4-rolling-202308220020).
We’re on 1.3.2, I didn’t think it was that old. We pulled 1.3.3 but had some issues compiling it so stuck with 1.3.2 for now. These are tower routers in production, not a casual thing to just upgrade them all remotely.
Why use multihop if your BFD peers are directly connected? And I’d try to remove source address from BFD peers also (see my snippet - it’s pretty straightforward and copy pasted from documentation basically).
Multihop is used because theoretically it should work on 1 or many hops, you don’t have to change the config.
I just tried removing the multihop part and it completely drops that peer out of the ‘show bfd peers brief’ output. If I put multihop back then it brings it right back up again.
Removed the source address, no change, it’s down when multihop is gone, up when it’s there.
Looking at “how other does it” the below is from Arista when it comes to BFD:
For BFD to function as a failure detection mechanism, it must be enabled for each participating protocol.
So converting the above into VyOS Im not sure what the global BFD setting would do in VyOS (other than just some statistics that “yup - the neighbor replies”).
My guess is that you define the peers in the global setting (perhaps with ability to have custom timers if wanted per neighbor) and then enable BFD for each neighbor (if we talk about BGP) where you want it to exist?
On the other hand that sounds a bit off since the below example is how you do it on an Arista box:
Thanks, I do agree looking at how other platforms do it is usually a helpful thing. EdgeOS seems to be happy just setting it up on the BGP neighbor config
set protocols bgp 1234 neighbor 10.10.18.3 fall-over bfd
And before we dropped OSPF it worked as well in the set protocols OSPF config.
I think here’s some blind spot in documentation As I can understand from my tests - when you configure protocol bfd you can tune timers (or you can use profile as @roedie suggested). After that, when you enable bfd in BGP neighbor VyOS tries to match static BFD peer (configured in protocol bfd clause) to this particular BGP neighbor.
In FRR there are 2 types of BFD configuration, static and dynamic. I believe those translate through to VyOS BFD being configured inside BGP/OSPF/etc and on it’s at, I.E. at protocols → bfd
Which makes sense to me because only my BGP configured BFD comes up and creates a pair, because the other end is an EdgeOS router expecting a Dynamic BFD peer. So this pretty much tell me 99% that all we need is to configure BFD at protocols → bgp/ospf and it will bring up a proper session. protocols → bfd should be left alone unless you are building a specific situation.
set protocol bfd... allows you to manually configure the details. set protocol bgp 1234 neighbor 1.2.3.4 bfd... enables dynamic BFD where it tries to figure out what the other side wants and matches it.
BFD Modes
BFD functions in asynchronous or demand mode, and also offers an echo function. EOS supports asynchronous mode and the echo function.
- Asynchronous Mode
- Demand Mode
Asynchronous Mode
In asynchronous mode, BFD control packets are exchanged by neighboring systems at regular intervals. If a specified number of sequential packets are not received, BFD declares the session to be down.
Demand Mode
In demand mode, once the BFD session is established, the participating systems can request that BFD packets not be sent, then request an exchange of packets only when needed to verify connectivity. EOS does not support demand mode.
Echo Function
When the echo function is in use, echo packets are looped back through the hardware forwarding path of the neighbor system without involving the CPU. Failure is detected by an interruption in the stream of echoed packets. The minimum reception rate for BFD control packets from the neighbor is also changed automatically when the echo function is operational, because liveness detection is supplied by the echo packets.
While BFD control messages are transmitted to port 3784, BFD echo messages use UDP port 3785 for both source and destination.
I assume the “dynamic” mode in VyOS is what Arista referers to as “Demand Mode”?
Personally I think the dynamic mode is a bit sketchy when it comes to BFD.
To me using BFD is a way to speed up the detection if the a specific neighbor really is down or not (specially in a multihop scenario) without having to wait for some BGP timeout to kick in (which you normally dont want to be too low due to the load on the mgmt-cpu which then would occur). Which gives that asynchronous aka static mode is the prefered one in most cases.
That is I know ahead of time that within lets say 3x250=750ms BFD will detect that this path is no longer working and trigger BGP to remove that neighbor (way faster than to wait for a BGP timeout to trigger).