load-balancing feature not working properly

Hi,

I’ve been trying to setup Wan Failover feature on VyOs version 1.0.4, but everytime I try to set it up it does not workd as spected, this is my configuration:

for Gateways I have:

route 0.0.0.0/0 {
next-hop x.x.x.206 {
}
next-hop x.x.x.57 {
}
}
route 8.8.8.8/32 {
next-hop x.x.x.57 {
}
}
route 208.67.222.222/32 {
next-hop x.x.x.206 {
}
}

For Wan Load Balancing:

wan {
flush-connections
interface-health eth2 {
failure-count 3
nexthop x.x.x.57
success-count 1
test 1 {
resp-time 2
target 8.8.8.8
ttl-limit 1
type ping
}
}
interface-health eth3 {
failure-count 3
nexthop x.x.x.206
success-count 1
test 1 {
resp-time 2
target 208.67.222.222
ttl-limit 1
type ping
}
}
rule 100 {

     destination {
         address 172.16.0.0/12
     }
     exclude
     inbound-interface eth0
 }
 rule 101 {
     
     destination {
         address 172.16.0.0/12
     }
     exclude
     inbound-interface eth1
 }
 rule 1000 {
     destination {
         address 0.0.0.0/0
     }
     failover
     inbound-interface eth0
     interface eth2 {
         weight 10
     }
     interface eth3 {
         weight 1
     }
     protocol all
 }

For Routing this is what it gets whe issuing command sh ip route:
Codes: K - kernel route, C - connected, S - static, R - RIP, O - OSPF,
I - ISIS, B - BGP, > - selected route, * - FIB route

S>* 0.0.0.0/0 [1/0] via x.x.x.206
via x.x.x.57, eth2
S>* 8.8.8.8/32 [1/0] via x.x.x.57, eth2
S>* 208.67.222.222/32 [1/0] via x.x.x.206, eth3

Now when I type the command sh wan-load-balance:
Interface: eth2
Status: active
Last Status Change: Wed Feb 18 13:19:44 2015
+Test: ping Target: 8.8.8.8
Last Interface Success: 0s
Last Interface Failure: 22m6s
# Interface Failure(s): 0

Interface: eth3
Status: active
Last Status Change: Wed Feb 18 13:19:44 2015
+Test: ping Target: 208.67.222.222
Last Interface Success: 0s
Last Interface Failure: 2m37s
# Interface Failure(s): 0

When trying to failover by soft-disabling eth2 (link is up) interface the routing table is not updated, It shows eth2 interface as active and no traffic is sent thru interface eth3, but if eth2 interface is down the traffic goes to interface eth3. The VyOs is installed on a virtual machine.

Also when I issue command sh wan-load-balance connection it shows nothing:

conntrack v1.2.1 (conntrack-tools): 221 flow entries have been shown.
Type State Src Dst Packets Bytes

With command sh wan-load-balance status it shows:
Chain WANLOADBALANCE_PRE (1 references)
pkts bytes target prot opt in out source destination
436 1057K ACCEPT all – eth0 * 0.0.0.0/0 172.16.0.0/12
162 17244 ACCEPT all – eth1 * 0.0.0.0/0 172.16.0.0/12
172 46182 ISP_eth2 all – eth0 * 0.0.0.0/0 0.0.0.0/0 state NEW
0 0 CONNMARK all – eth0 * 0.0.0.0/0 0.0.0.0/0 CONNMARK restore

What could be happening that the Wan Failover feature is not working properly?

Hi,

Probaly you must add difrent metric in second nex-hop in default route trace. If interface down (beter way to check, load balancing feature, to completly disable interface witch A/D status) you default trace set not properly.

Try add “distance” to next-hop:

for Gateways I have:

route 0.0.0.0/0 {
next-hop x.x.x.206 {
}
next-hop x.x.x.57 distance 10 {
}
}

P.s when you disable interface, and load balancing show status active from failover interface, check route table, show ip route

I added:

route 0.0.0.0/0 {
next-hop x.x.x.206 {
distance 10
}
next-hop x.x.x.57 {
}
}

My main Internet connection is x.x.x.57 and the backup and slow connection is x.x.x.206.

and the interface is disconnected
When I issue the command show ip route , it shows:

S 0.0.0.0/0 [10/0] via x.x.x.206
S>* 0.0.0.0/0 [1/0] via x.x.x.57, eth2
S>* 8.8.8.8/32 [1/0] via x.x.x.57, eth2

S>* 208.67.222.222/32 [1/0] via x.x.x.206, eth3

Then, when disabling ETH2(main Internet) interface and issue the command show ip route, it shows:

S>* 0.0.0.0/0 [10/0] via x.x.x.206, eth3
S 0.0.0.0/0 [1/0] via x.x.x.57 inactive
S>* 8.8.8.8/32 [1/0] via x.x.x.57 (recursive via x.x.x.206)

S>* 208.67.222.222/32 [1/0] via x.x.x.206, eth3

The sh wan-load-balance command shows:

Interface: eth2
Status: failed
Last Status Change: Tue Mar 3 08:32:22 2015
-Test: ping Target: 8.8.8.8
Last Interface Success: 34s
Last Interface Failure: 1s
# Interface Failure(s): 3

Interface: eth3
Status: active
Last Status Change: Thu Feb 26 18:27:50 2015
+Test: ping Target: 208.67.222.222
Last Interface Success: 1s
Last Interface Failure: 2h39m22s
# Interface Failure(s): 0
Traffic is failovered on interface eth3 as expected.

But If I disconnect (virtual link is up) Internet traffic on eth2 interface and then check the same commands, it shows the following:

show ip route:

S 0.0.0.0/0 [10/0] via x.x.x.206
S>* 0.0.0.0/0 [1/0] via x.x.x.57, eth2
S>* 8.8.8.8/32 [1/0] via x.x.x.57, eth2

S>* 208.67.222.222/32 [1/0] via x.x.x.206, eth3

sh wan-load-balance :
Interface: eth2
Status: failed
Last Status Change: Tue Mar 3 08:50:22 2015
-Test: ping Target: 8.8.8.8
Last Interface Success: 59s
Last Interface Failure: 2s
# Interface Failure(s): 3

Interface: eth3
Status: active
Last Status Change: Thu Feb 26 18:27:50 2015
+Test: ping Target: 208.67.222.222
Last Interface Success: 1s
Last Interface Failure: 2h39m22s
# Interface Failure(s): 0

So far what I see is that the Failover mechanism is not routing traffic to the eth3 interface when eth2 Interface is no able to communicate to the Internet, my question is, Is there any way to change routing when Internet traffic gets interrupted on eth2?, perhaps via scripting or via Vyos commands.

The order in which you add your static routes is crucial, and if you look at your current config, you will notice that eth3 is read first, and eth2 second. That may be why eth3 was working, but eth2 was not. The system is supposed to use one interface to reach one target (eth2 is for 8.8.8.8) and another interface for a second target (eth3 is for 208.67.222.222). If the system is not able to reach a target via the specified interface, then that interface will be discarded in the load-balancing setup.

Now that you added weights (eth2 10, eth3 1) to your default route, you are telling the system to go ahead and use eth2 as your only interface for reaching those targets, so when it goes down, you are left with no way of reaching those targets, as the other interface will be removed from the routes because it was never used to reach the targets in the first place.

But all of that is besides the point… just do ‘del protocols static route’ and get rid of your default route.

I did remove Static route 0.0.0.0/0 and the load-balancing setting did not set any route it just showed:

S>* 8.8.8.8/32 [1/0] via x.x.x.57, eth2

S>* 208.67.222.222/32 [1/0] via x.x.x.206, eth3

[quote]The order in which you add your static routes is crucial, and if you look at your current config, you will notice that eth3 is read first, and eth2 second. That may be why eth3 was working, but eth2 was not. The system is supposed to use one interface to reach one target (eth2 is for 8.8.8.8) and another interface for a second target (eth3 is for 208.67.222.222). If the system is not able to reach a target via the specified interface, then that interface will be discarded in the load-balancing setup.

Now that you added weights (eth2 10, eth3 1) to your default route, you are telling the system to go ahead and use eth2 as your only interface for reaching those targets, so when it goes down, you are left with no way of reaching those targets, as the other interface will be removed from the routes because it was never used to reach the targets in the first place.

But all of that is besides the point… just do ‘del protocols static route’ and get rid of your default route.[/quote]

I did as suggested but still there were no traffic, the only way to make the 2 WAN Interfaces work is by removing the physical link (Vmware link) but in this case it just uses the static route 0.0.0.0/0 and using the weights as 1 for eth2 and 10 for eth3.

I still think that the fail-over mechanism need something to be set when the Interface is on a Virtualized enviroment.

The pings on both Interfaces work and the Load-Balance recognize the failure but when the ping is not successful the routing table does not have the static route to 0.0.0.0/0 on either interfaces.

Any suggestion will be welcomed.