Need redundant links: How VyOS decides which link to take / which is dead?

General question: If I have two default routes over two interfaces with the same metric, how does VyOS decide which one to use? More specifically, how does it decide when one is broken? (interface down is the most trivial but what when interface stays up but link is still dead etc)

To make this a bit more concrete, here is my setup:

My VyOS router (“CenterGate1”) is connected redundantly to two gateways (BorderGate1, BorderGate2) via wireguard and running OSPF. Both gateways announce via BGP and they are also connected via “wg-perimeter” link in case one link goes down.

In-bound traffic works as expected:

  1. Traffic from Europe is likely to come in via BorderGate2 and goes directly to CenterGate1. Traffic from US is likely to come in via BorderGate1 and goes directly to CenterGate1
  2. If either wg-bgate1 or wg-bgate2 is broken, it is first routed through wg-perimeter but still reaches CenterGate1

Now the problem is the out-bound and return traffic: On VyOS I have two default routes, over wg-bgate1 and wg-bgate2 each. But it seems VyOS always prefers just one, say wg-bgate1. Now if I stop bird on wg-bgate1, all return traffic is dropped, even though I now want them to be routed over wg-bgate2.

In particular, wg-bgate1 and wg-bgate2 should be redundant links (carrying the default gateway) that create a resilient connection with respect to:

  • When either link is explicitly down
  • When either link is up but dead (i.e., tunnel endpoint not reachable)
  • When bird on a BorderGate’s is broken/unresponsive
  • When one of the BorderGate’s is down
  • When the internet uplink on either BorderGate is broken

How can I configure VyOS accordingly?

Also, would it be preferable to not use OSPF in the first place but iBGP instead? If yes, why?

So I’m not experienced enough to help explain a fix but I understand the cause here. When you have identical routes with equal metrics, they become ECMP routes. ECMP is useful for certain things but if you want some control over geographical routing then it’s probably not ideal. FRR, the routing engine VyOS uses, has some information on it here: Zebra — FRR latest documentation

I am not sure I understand correctly (and how it has to do with ECMP) but the two default routes are static routes:

protocols {
    static {
        table 33 {
            route {
                next-hop {
                    distance 20
                next-hop {
                    distance 20

I guess I read more into the geographic part than needed, my bad.

I went to type an answer along a different line of thinking but realized something, WG tunnels result in direct connected routes for the link. If the peer is down, the interface is still up and the direct connected route is still in place. I just tested with a WG tunnel I have for a roadwarrior setup by adding the peer as a next-hop to my default static route in addition to my regular next-hop. It tried to still route to the peer despite the peer not being connected. Disabling the WG interface removes the connected routes and therefore that static default route too.

I would’ve suggested the WAN Load Balancing feature since it has link monitoring but it’s incompatible with dynamic routing according to a note on the doc page.

I think a couple options would be a link monitoring script or providing default via dynamic routing instead of manual statics. I think OSPF has some capability to inject default routes and BGP definitely does.

Aaah, I think I see what you are saying. Instead of manually entering the default routes, let bird on BorderGate1 and BorderGate2 send the default route via OSPF … is this correct understanding?

It might get a little bit out of VyOS topic then, but do you know how this would work? Because if I just add the default route, then it would be propagated to all peers. But I definitely only want to have it propagated to CenterGate1…

Also, in this setup, would it be good to consider iBGP instead of OSPF? Or doesn’t matter (in terms of functionality)?


This is the part where I’m lacking experience, unfortunately. I think you could probably configure the metrics so that it’s lower than the actual default of the BorderGate. Maybe try in a lab environment first?

Can’t speak to either unfortunately, my dynamic routing experience is limited to a server cluster mesh network with OSPF for inter-node communication.