WAN Load Balancing not removing default route

ehayon · July 31, 2015, 10:33pm

I’m running VyOS 1.1.5. I have wan load-balancing configured, but when a WAN link fails, it looks like the now “dead” default route is not being removed from the routing table. This is resulting in 50% packet loss on outbound connections as load-balancing is not actually taking effect.

Any ideas?

kaloyan · February 3, 2016, 10:01am

Hello ehayon,

I am facing exactly the same issue as you do. Unfortunately I couldn’t find any resolution to this one so far.
I am using Vyos-1.1.6 (helium)

Here’s my config:

[quote]test@VYOS-1-1-6$ show configuration commands | match wan
set load-balancing wan interface-health eth0.10 failure-count ‘3’
set load-balancing wan interface-health eth0.10 nexthop ‘192.168.1.225’
set load-balancing wan interface-health eth0.10 test 10 resp-time ‘2’
set load-balancing wan interface-health eth0.10 test 10 target ‘8.8.8.8’
set load-balancing wan interface-health eth0.10 test 10 type ‘ping’
set load-balancing wan interface-health eth0.15 failure-count ‘3’
set load-balancing wan interface-health eth0.15 nexthop ‘172.12.1.177’
set load-balancing wan interface-health eth0.15 test 10 resp-time ‘2’
set load-balancing wan interface-health eth0.15 test 10 target ‘8.8.8.8’
set load-balancing wan interface-health eth0.15 test 10 type ‘ping’
set load-balancing wan rule 128 ‘failover’
set load-balancing wan rule 128 inbound-interface ‘eth1’
set load-balancing wan rule 128 interface eth0.10 weight ‘100’
set load-balancing wan rule 128 interface eth0.15 weight ‘150’
set load-balancing wan rule 128 protocol ‘all’

test@VYOS-1-1-6$ show configuration commands | match ‘static route’
set protocols static route 0.0.0.0/0 next-hop 192.168.1.225
set protocols static route 0.0.0.0/0 next-hop 172.12.1.177
[/quote]

Interface eth1 keeps the local network and interfaces eth0.10 and eth0.15 are the internet providers.

If any of the internet providers goes down, this results in 50% packetloss, which makes the whole configuration useless.
Let me know if you have found a solution to this one.

I used this one as a reference:

davidusd · February 18, 2016, 9:17am

Hi,

You add this command

You can read this problem in file attachment. Chapter ‘Failover’

Cheer,

kaloyan · February 24, 2016, 10:58am

Hello David,

I tried this as well, but no change.

Best regards,
Kaloyan

davidusd · February 25, 2016, 2:25am

Hi kaloyan,

This is my config & topology. I think you should ping target next-hop.

load-balancing {
    wan {
        flush-connections
        interface-health eth1 {
            failure-count 5
            nexthop 1.1.1.1
            success-count 1
            test 10 {
                resp-time 5
                target 1.1.1.1
                ttl-limit 1
                type ping
            }
        }
        interface-health eth2 {
            failure-count 4
            nexthop 2.2.2.1
            success-count 1
            test 10 {
                resp-time 5
                target 2.2.2.1
                ttl-limit 1
                type ping
            }
        }
        rule 10 {
            failover
            inbound-interface eth0
            interface eth1 {
                weight 20
            }
            interface eth2 {
                weight 10
            }
            protocol all
        }
    }
}

protocols {
    static {
        route 0.0.0.0/0 {
            next-hop 1.1.1.1 {
            }
            next-hop 2.2.2.1 {
            }
        }
    }
}

Thanks,

kaloyan · May 14, 2016, 9:08am

Hello David,

Thank you for your help. Unfortunately it is not working on my side. I have applied exactly the same config and when I test it with removing one of the providers I got 50% packetloss.

I will continue investigating this and will let you know if I fix it.

Hello David,

I am starting to think that the reason my setup to not work is because of the VLANs that I have. I will try to check this now.

jclopes · May 15, 2016, 1:43pm

Hi,

Change the target IP on eth0.15. The IPs for testing must be diferent!

16again · May 15, 2016, 2:42pm

I don’t have the correct answer, but might give some insight:
-remote test hosts do NOT need to be different (edgeOS using its 2WAN wizard will also ping same host on both WAN interfaces
-Better not use next hop as ping test. In lots of cases, this might even be the ISP router on your own site! The provider DNS server would be better choice (…if pingable)
-No problem if the “dead” route should remains in the main routing table…since your packets shouldn’t be using that table! The load balancing stuff creates an additional route table, where routes do disappear when remote object isn’t pingable. Do check that table for correctness. (and see if packets from LAN are forced to use that table)

jclopes · May 15, 2016, 2:57pm

Can U paste Your complete configuration?