WAN load balancing failure - static route stays?

Scenario: I have two interfaces, eth0 (cable modem) and wwan0 (cellular modem). When the cable modem is operational, it should be used, only failing to cellular modem when necessary.

This mostly works fine, except when the cable modem experiences a failure without the link going down. In this case, the DHCP-learned default route stays around even while the wan load balancing has detected the failure.

Any ideas?

Config:

interfaces {
    ethernet eth0 {
        address dhcp
        hw-id 00:0c:29:aa:aa:1e
    }
    ethernet eth1 {
        address 10.254.254.1/24
        hw-id 00:0c:29:aa:aa:28
    }
    loopback lo {
    }
}
load-balancing {
    wan {
        flush-connections
        interface-health eth0 {
            failure-count 1
            nexthop dhcp
            success-count 1
            test 1 {
                resp-time 1
                target 8.8.8.8
                ttl-limit 1
            }
        }
        interface-health wwan0 {
            failure-count 1
            nexthop dhcp
            success-count 1
            test 1 {
                resp-time 1
                target 1.1.1.1
                ttl-limit 1
            }
        }
        rule 1 {
            failover
            inbound-interface eth1
            interface eth0 {
                weight 250
            }
            interface wwan0 {
                weight 1
            }
            protocol all
        }
    }
}
protocols {
    static {
        interface-route 0.0.0.0/0 {
            next-hop-interface wwan0 {
                distance 240
            }
        }
        route 0.0.0.0/0 {
            dhcp-interface eth0
        }
    }
}
service {
    dhcp-server {
        shared-network-name lan {
            name-server 1.1.1.1
            subnet 10.254.254.0/24 {
                default-router 10.254.254.1
                name-server 1.1.1.1
                range 0 {
                    start 10.254.254.10
                    stop 10.254.254.250
                }
            }
        }
    }
    ssh {
    }
}
system {
    config-management {
        commit-revisions 100
    }
    conntrack {
        modules {
            ftp
            h323
            nfs
            pptp
            sip
            sqlnet
            tftp
        }
    }
    console {
        device ttyS0 {
            speed 115200
        }
    }
    host-name vyos
    login {
        user vyos {
            authentication {
                encrypted-password ...
                plaintext-password ""
            }
        }
    }
    ntp {
        server time1.vyos.net {
        }
        server time2.vyos.net {
        }
        server time3.vyos.net {
        }
    }
    syslog {
        global {
            facility all {
                level info
            }
            facility protocols {
                level debug
            }
        }
    }
}
[edit]

When both interfaces are up:

vyos@vyos:~$ show wan-load-balance 
Interface:  eth0
  Status:  active
  Last Status Change:  Fri Sep  2 00:58:28 2022
  +Test:  ping  Target: 8.8.8.8
    Last Interface Success:  0s 
    Last Interface Failure:  1m11s      
    # Interface Failure(s):  0

Interface:  wwan0
  Status:  active
  Last Status Change:  Fri Sep  2 00:58:28 2022
  +Test:  ping  Target: 1.1.1.1
    Last Interface Success:  0s 
    Last Interface Failure:  1m11s      
    # Interface Failure(s):  0

vyos@vyos:~$ show ip route
Codes: K - kernel route, C - connected, S - static, R - RIP,
       O - OSPF, I - IS-IS, B - BGP, E - EIGRP, N - NHRP,
       T - Table, v - VNC, V - VNC-Direct, A - Babel, D - SHARP,
       F - PBR, f - OpenFabric,
       > - selected route, * - FIB route, q - queued, r - rejected, b - backup

S   0.0.0.0/0 [240/0] is directly connected, wwan0, weight 1, 00:00:13
S>* 0.0.0.0/0 [1/0] via 192.168.17.1, eth0, weight 1, 00:01:17
S   0.0.0.0/0 [210/0] via 192.168.17.1, eth0, weight 1, 00:01:17
S>* 8.8.8.8/32 [1/0] via 192.168.17.1, eth0, weight 1, 00:01:17
C>* 10.254.254.0/24 is directly connected, eth1, 00:01:43
C>* 167.20.XX.XX/30 is directly connected, wwan0, 00:01:21
C>* 192.168.17.0/24 is directly connected, eth0, 00:01:17

vyos@vyos:~$ show wan-load-balance status
Chain WANLOADBALANCE_PRE (1 references)
 pkts bytes target     prot opt in     out     source               destination         
  118 15149 ISP_eth0   all  --  eth1   *       0.0.0.0/0            0.0.0.0/0            state NEW
  440 75908 CONNMARK   all  --  eth1   *       0.0.0.0/0            0.0.0.0/0            CONNMARK restore

When eth0 is hard down:

vyos@vyos:~$ show wan-load-balance 
Interface:  eth0
  Status:  failed
  Last Status Change:  Fri Sep  2 01:01:21 2022
  -Test:  ping	Target: 8.8.8.8
    Last Interface Success:  56s	
    Last Interface Failure:  0s	
    # Interface Failure(s):  5

Interface:  wwan0
  Status:  active
  Last Status Change:  Fri Sep  2 00:58:28 2022
  +Test:  ping	Target: 1.1.1.1
    Last Interface Success:  0s	
    Last Interface Failure:  3m43s	
    # Interface Failure(s):  0

vyos@vyos:~$ show ip route
Codes: K - kernel route, C - connected, S - static, R - RIP,
       O - OSPF, I - IS-IS, B - BGP, E - EIGRP, N - NHRP,
       T - Table, v - VNC, V - VNC-Direct, A - Babel, D - SHARP,
       F - PBR, f - OpenFabric,
       > - selected route, * - FIB route, q - queued, r - rejected, b - backup

S>* 0.0.0.0/0 [240/0] is directly connected, wwan0, weight 1, 00:02:47
C>* 10.254.254.0/24 is directly connected, eth1, 00:04:17
C>* 167.20.XX.XX/30 is directly connected, wwan0, 00:03:55

vyos@vyos:~$ show wan-load-balance status
Chain WANLOADBALANCE_PRE (1 references)
 pkts bytes target     prot opt in     out     source               destination         
   92 13032 ISP_wwan0  all  --  eth1   *       0.0.0.0/0            0.0.0.0/0            state NEW
   48 12168 CONNMARK   all  --  eth1   *       0.0.0.0/0            0.0.0.0/0            CONNMARK restore

When eth0 is up but cannot reach target:

vyos@vyos:~$ show wan-load-balance 
Interface:  eth0
  Status:  failed
  Last Status Change:  Fri Sep  2 01:04:39 2022
  -Test:  ping	Target: 8.8.8.8
    Last Interface Success:  20s	
    Last Interface Failure:  0s	
    # Interface Failure(s):  2

Interface:  wwan0
  Status:  active
  Last Status Change:  Fri Sep  2 00:58:28 2022
  +Test:  ping	Target: 1.1.1.1
    Last Interface Success:  0s	
    Last Interface Failure:  6m26s	
    # Interface Failure(s):  0

vyos@vyos:~$ show ip route
Codes: K - kernel route, C - connected, S - static, R - RIP,
       O - OSPF, I - IS-IS, B - BGP, E - EIGRP, N - NHRP,
       T - Table, v - VNC, V - VNC-Direct, A - Babel, D - SHARP,
       F - PBR, f - OpenFabric,
       > - selected route, * - FIB route, q - queued, r - rejected, b - backup

S   0.0.0.0/0 [210/0] via 192.168.17.1, eth0, weight 1, 00:00:21
S>* 0.0.0.0/0 [1/0] via 192.168.17.1, eth0, weight 1, 00:00:21
S   0.0.0.0/0 [240/0] is directly connected, wwan0, weight 1, 00:05:25
S>* 8.8.8.8/32 [1/0] via 192.168.17.1, eth0, weight 1, 00:00:21
C>* 10.254.254.0/24 is directly connected, eth1, 00:06:55
C>* 167.20.XX.XX/30 is directly connected, wwan0, 00:06:33
C>* 192.168.17.0/24 is directly connected, eth0, 00:00:21

vyos@vyos:~$ show wan-load-balance status
Chain WANLOADBALANCE_PRE (1 references)
 pkts bytes target     prot opt in     out     source               destination         
   54 14863 ISP_wwan0  all  --  eth1   *       0.0.0.0/0            0.0.0.0/0            state NEW
    4   468 CONNMARK   all  --  eth1   *       0.0.0.0/0            0.0.0.0/0            CONNMARK restore

Load balancing doesn’t use default routing table to process traffic
It uses own tables
So it seems you need a failover and not loadbalancing, which will monitor one or another target and replace default route in the main routing table
Try to check traffic from host behind VyOS

Thanks, I thought that might be the case. However, traffic from clients on eth1 still does not work when eth0 is failed.

Hi @level410.
What VyOS version you are using?
When both are running, please share output of:

show wan-load-balance 
sudo ip rule
show ip route table all
sudo nft list table ip mangle
sudo nft list table ip nat

Then, when main connection is down, collect same data

Hi Nicolas,

This is my build based on 1.3 stable. I should mention that I added network-manager to this build, but I hope it is not interfering, because I have set network-manager to ignore all interfaces except the cellular interface. This is due to vyos not assigning the cellular IP to the interface once obtained when in raw-ip mode (here).

Anyway, here is the output. There are some artifacts from the script capture but maybe you can ignore them. Hope you have some ideas. Thank you for the help.

Both interfaces online

vyos@vyos:~$ show wan-load-balance 

Interface:  eth0e
  Status:  activee
  Last Status Change:  Fri Sep  2 15:33:58 2022e
  +Test:  ping  Target: 8.8.8.8e
    Last Interface Success:  0s e
    Last Interface Failure:  1m16s      e
    # Interface Failure(s):  0e
e
Interface:  wwan0e
  Status:  activee
  Last Status Change:  Fri Sep  2 15:32:15 2022e
  +Test:  ping  Target: 1.1.1.1e
    Last Interface Success:  0s e
    Last Interface Failure:  3m7s       e
    # Interface Failure(s):  0e
e

evyos@vyos:~$ sudo ip rule
0:	from all lookup local
32764:	from all fwmark 0xca lookup 202
32765:	from all fwmark 0xc9 lookup 201
32766:	from all lookup main
32767:	from all lookup default

vyos@vyos:~$ show ip route table all

Codes: K - kernel route, C - connected, S - static, R - RIP,e
       O - OSPF, I - IS-IS, B - BGP, E - EIGRP, N - NHRP,e
       T - Table, v - VNC, V - VNC-Direct, A - Babel, D - SHARP,e
       F - PBR, f - OpenFabric,e
       > - selected route, * - FIB route, q - queued, r - rejected, b - backupe
e
VRF default table 201:e
K>* 0.0.0.0/0 [0/0] via 192.168.17.1, eth0, 00:01:24e
e
VRF default table 254:e
S   0.0.0.0/0 [210/0] via 192.168.17.1, eth0, weight 1, 00:01:25e
S>* 0.0.0.0/0 [1/0] via 192.168.17.1, eth0, weight 1, 00:01:28e
S   0.0.0.0/0 [240/0] is directly connected, wwan0, weight 1, 00:03:21e
S>* 8.8.8.8/32 [1/0] via 192.168.17.1, eth0, weight 1, 00:01:28e
C>* 10.254.254.0/24 is directly connected, eth1, 00:03:43e
C>* 167.20.XX.XX/30 is directly connected, wwan0, 00:03:21e
C>* 192.168.17.0/24 is directly connected, eth0, 00:01:28e


evyos@vyos:~$ sudo nft list table ip mangle
table ip mangle {
	chain PREROUTING {
		type filter hook prerouting priority mangle; policy accept;
		counter packets 10554 bytes 9528044 jump WANLOADBALANCE_PRE
	}

	chain INPUT {
		type filter hook input priority mangle; policy accept;
	}

	chain FORWARD {
		type filter hook forward priority mangle; policy accept;
	}

	chain OUTPUT {
		type route hook output priority mangle; policy accept;
	}

	chain POSTROUTING {
		type filter hook postrouting priority mangle; policy accept;
	}

	chain WANLOADBALANCE_PRE {
		iifname "eth1" ct state new counter packets 49 bytes 2654 jump ISP_eth0
		iifname "eth1" counter packets 1236 bytes 317867 meta mark set ct mark
	}

	chain ISP_eth0 {
		counter packets 49 bytes 2654 ct mark set 0xc9 
		counter packets 49 bytes 2654 meta mark set 0xc9 
		counter packets 49 bytes 2654 accept
	}

	chain ISP_wwan0 {
		counter packets 34 bytes 2235 ct mark set 0xca 
		counter packets 34 bytes 2235 meta mark set 0xca 
		counter packets 34 bytes 2235 accept
	}
}

vyos@vyos:~$ sudo nft list table ip nat
table ip nat {
	chain PREROUTING {
		type nat hook prerouting priority dstnat; policy accept;
		counter packets 192 bytes 16367 jump VYATTA_PRE_DNAT_HOOK
	}

	chain INPUT {
		type nat hook input priority 100; policy accept;
	}

	chain POSTROUTING {
		type nat hook postrouting priority srcnat; policy accept;
		counter packets 83 bytes 5115 jump VYATTA_PRE_SNAT_HOOK
	}

	chain OUTPUT {
		type nat hook output priority -100; policy accept;
	}

	chain VYATTA_PRE_DNAT_HOOK {
		counter packets 192 bytes 16367 return
	}

	chain VYATTA_PRE_SNAT_HOOK {
		counter packets 83 bytes 5115 jump WANLOADBALANCE
		counter packets 8 bytes 854 return
	}

	chain WANLOADBALANCE {
		ct mark 0xca counter packets 26 bytes 1607 snat to 167.20.XX.XX
		ct mark 0xc9 counter packets 49 bytes 2654 snat to 192.168.17.127
	}
}

eth0 soft down (link up, upstream connectivity lost)

vyos@vyos:~$ show wan-load-balance 

Interface:  eth0e
  Status:  failede
  Last Status Change:  Fri Sep  2 15:36:05 2022e
  -Test:  ping  Target: 8.8.8.8e
    Last Interface Success:  10s        e
    Last Interface Failure:  0s e
    # Interface Failure(s):  1e
e
Interface:  wwan0e
  Status:  activee
  Last Status Change:  Fri Sep  2 15:32:15 2022e
  +Test:  ping  Target: 1.1.1.1e
    Last Interface Success:  0s e
    Last Interface Failure:  4m3s       e
    # Interface Failure(s):  0e
e

vyos@vyos:~$ sudo ip rulee
0:	from all lookup local
32764:	from all fwmark 0xca lookup 202
32765:	from all fwmark 0xc9 lookup 201
32766:	from all lookup main
32767:	from all lookup default
vyos@vyos:~$ show ip route table all

Codes: K - kernel route, C - connected, S - static, R - RIP,e
       O - OSPF, I - IS-IS, B - BGP, E - EIGRP, N - NHRP,e
       T - Table, v - VNC, V - VNC-Direct, A - Babel, D - SHARP,e
       F - PBR, f - OpenFabric,e
       > - selected route, * - FIB route, q - queued, r - rejected, b - backupe
e
VRF default table 201:e
K>* 0.0.0.0/0 [0/0] via 192.168.17.1, eth0, 00:02:26e
e
VRF default table 254:e
S   0.0.0.0/0 [210/0] via 192.168.17.1, eth0, weight 1, 00:00:26e
S>* 0.0.0.0/0 [1/0] via 192.168.17.1, eth0, weight 1, 00:00:26e
S   0.0.0.0/0 [240/0] is directly connected, wwan0, weight 1, 00:04:23e
S>* 8.8.8.8/32 [1/0] via 192.168.17.1, eth0, weight 1, 00:00:26e
C>* 10.254.254.0/24 is directly connected, eth1, 00:04:45e
C>* 167.20.XX.XX/30 is directly connected, wwan0, 00:04:23e
C>* 192.168.17.0/24 is directly connected, eth0, 00:00:26e


evyos@vyos:~$ sudo nft list table ip mangle
table ip mangle {
	chain PREROUTING {
		type filter hook prerouting priority mangle; policy accept;
		counter packets 12030 bytes 11161012 jump WANLOADBALANCE_PRE
	}

	chain INPUT {
		type filter hook input priority mangle; policy accept;
	}

	chain FORWARD {
		type filter hook forward priority mangle; policy accept;
	}

	chain OUTPUT {
		type route hook output priority mangle; policy accept;
	}

	chain POSTROUTING {
		type filter hook postrouting priority mangle; policy accept;
	}

	chain WANLOADBALANCE_PRE {
		iifname "eth1" ct state new counter packets 26 bytes 1483 jump ISP_wwan0
		iifname "eth1" counter packets 0 bytes 0 meta mark set ct mark
	}

	chain ISP_eth0 {
		counter packets 56 bytes 3047 ct mark set 0xc9 
		counter packets 56 bytes 3047 meta mark set 0xc9 
		counter packets 56 bytes 3047 accept
	}

	chain ISP_wwan0 {
		counter packets 60 bytes 3718 ct mark set 0xca 
		counter packets 60 bytes 3718 meta mark set 0xca 
		counter packets 60 bytes 3718 accept
	}
}

vyos@vyos:~$ sudo nft list table ip nat
table ip nat {
	chain PREROUTING {
		type nat hook prerouting priority dstnat; policy accept;
		counter packets 225 bytes 18902 jump VYATTA_PRE_DNAT_HOOK
	}

	chain INPUT {
		type nat hook input priority 100; policy accept;
	}

	chain POSTROUTING {
		type nat hook postrouting priority srcnat; policy accept;
		counter packets 97 bytes 5905 jump VYATTA_PRE_SNAT_HOOK
	}

	chain OUTPUT {
		type nat hook output priority -100; policy accept;
	}

	chain VYATTA_PRE_DNAT_HOOK {
		counter packets 225 bytes 18902 return
	}

	chain VYATTA_PRE_SNAT_HOOK {
		counter packets 97 bytes 5905 jump WANLOADBALANCE
		counter packets 8 bytes 854 return
	}

	chain WANLOADBALANCE {
		ct mark 0xca counter packets 35 bytes 2108 snat to 167.20.XX.XX
		ct mark 0xc9 counter packets 54 bytes 2943 snat to 192.168.17.127
	}
}

eth0 hard down

vyos@vyos:~$ show wan-load-balance 

Interface:  eth0e
  Status:  failede
  Last Status Change:  Fri Sep  2 15:36:05 2022e
  -Test:  ping  Target: 8.8.8.8e
    Last Interface Success:  1m6s       e
    Last Interface Failure:  0s e
    # Interface Failure(s):  6e
e
Interface:  wwan0e
  Status:  activee
  Last Status Change:  Fri Sep  2 15:32:15 2022e
  +Test:  ping  Target: 1.1.1.1e
    Last Interface Success:  0s e
    Last Interface Failure:  4m59s      e
    # Interface Failure(s):  0e
e

evyos@vyos:~$ sudo ip rule
0:	from all lookup local
32764:	from all fwmark 0xca lookup 202
32765:	from all fwmark 0xc9 lookup 201
32766:	from all lookup main
32767:	from all lookup default

vyos@vyos:~$ show ip route table all

Codes: K - kernel route, C - connected, S - static, R - RIP,e
       O - OSPF, I - IS-IS, B - BGP, E - EIGRP, N - NHRP,e
       T - Table, v - VNC, V - VNC-Direct, A - Babel, D - SHARP,e
       F - PBR, f - OpenFabric,e
       > - selected route, * - FIB route, q - queued, r - rejected, b - backupe
e
VRF default table 254:e
S>* 0.0.0.0/0 [240/0] is directly connected, wwan0, weight 1, 00:05:16e
C>* 10.254.254.0/24 is directly connected, eth1, 00:05:38e
C>* 167.20.XX.XX/30 is directly connected, wwan0, 00:05:16e


evyos@vyos:~$ sudo nft list table ip mangle
table ip mangle {
	chain PREROUTING {
		type filter hook prerouting priority mangle; policy accept;
		counter packets 17185 bytes 15856253 jump WANLOADBALANCE_PRE
	}

	chain INPUT {
		type filter hook input priority mangle; policy accept;
	}

	chain FORWARD {
		type filter hook forward priority mangle; policy accept;
	}

	chain OUTPUT {
		type route hook output priority mangle; policy accept;
	}

	chain POSTROUTING {
		type filter hook postrouting priority mangle; policy accept;
	}

	chain WANLOADBALANCE_PRE {
		iifname "eth1" ct state new counter packets 78 bytes 5423 jump ISP_wwan0
		iifname "eth1" counter packets 1818 bytes 102243 meta mark set ct mark
	}

	chain ISP_eth0 {
		counter packets 56 bytes 3047 ct mark set 0xc9 
		counter packets 56 bytes 3047 meta mark set 0xc9 
		counter packets 56 bytes 3047 accept
	}

	chain ISP_wwan0 {
		counter packets 112 bytes 7658 ct mark set 0xca 
		counter packets 112 bytes 7658 meta mark set 0xca 
		counter packets 112 bytes 7658 accept
	}
}
vyos@vyos:~$ sudo nft list table ip nat
table ip nat {
	chain PREROUTING {
		type nat hook prerouting priority dstnat; policy accept;
		counter packets 251 bytes 20574 jump VYATTA_PRE_DNAT_HOOK
	}

	chain INPUT {
		type nat hook input priority 100; policy accept;
	}

	chain POSTROUTING {
		type nat hook postrouting priority srcnat; policy accept;
		counter packets 114 bytes 7093 jump VYATTA_PRE_SNAT_HOOK
	}

	chain OUTPUT {
		type nat hook output priority -100; policy accept;
	}

	chain VYATTA_PRE_DNAT_HOOK {
		counter packets 251 bytes 20574 return
	}

	chain VYATTA_PRE_SNAT_HOOK {
		counter packets 114 bytes 7093 jump WANLOADBALANCE
		counter packets 8 bytes 854 return
	}

	chain WANLOADBALANCE {
		ct mark 0xca counter packets 52 bytes 3296 snat to 167.20.XX.XX
		ct mark 0xc9 counter packets 54 bytes 2943 snat to 192.168.17.127
	}
}

Route table 202 is missing in your config… most probably due to it uses wwan interface, and vyos has no config about it.
You may try:

sudo ip route add default via A.B.C.D dev wwan0 table 202       # Where A.B.C.D is the nexthop in wwan interface

Or using interface without ip address (since it seems to be like that in your default routing table:

sudo ip route add default dev wwan0 table 202

It worked! I’m not sure I totally understand where table 202 comes from, but I’ll do some reading. Thank you!

This topic was automatically closed 2 days after the last reply. New replies are no longer allowed.