Load balancing issues

Hey,

I have used vyos for little over a month now. I mostly like it but load balancing specifically has been nothing but terrible(and I really do mean just absolutely terrible!). I don’t see a lot of discussion about it and I am trying to figure out if vyos team has plans to fix it in the near term?

How do we need understand what is going here from your description?
At the first glance you don’t know how to use it.

Sorry, My bad. I have raised issues with load balancing feature and described my issues in detail. I’ll do it again.

  • It clears connections that were not setup over the WAN that just went down. Wireguard connection to remote peers is setup over WAN1, WAN1 disappears. It should only clear the wireguard connection to remote peers that was set up over WAN1. In reality, It clears that connection and it clears some connections in the wireguard tunnel!

  • With pppoe interfaces, It takes a lot of time to detect a failure and transfer it over to WAN2. I have noticed this when there is a issue at my ISPs upstream so pppoe tunnel stays up but it can not ping any thing.

  • On WAN1 recovery, It wipes out connections that were setup over WAN2 while WAN1 was unavailable. This causes additional interruptions where I’ll much prefer lesser interruptions. What it can do in this situation is leave the connections on WAN2 active and setup new connections over WAN1. This reduces interruptions and is more graceful I guess compared to wiping out all WAN connections every time primary WAN fails and recovers.

  • I can not specify destination or source groups or local in the config.

My load balancing config looks like this today.

 wan {
     disable-source-nat
     enable-local-traffic
     flush-connections
     interface-health eth1 {
         failure-count 3
         nexthop 100.0.0.1
         test 1 {
             resp-time 1
             target 1.0.0.1
             ttl-limit 64
             type ping
         }
     }
     interface-health pppoe0 {
         failure-count 3
         nexthop 100.64.0.1
         test 1 {
             resp-time 1
             target 1.0.0.1
             ttl-limit 64
             type ping
         }
     }
     rule 1 {
         description "Exclude LAN traffic"
         destination {
             address 10.0.0.0/8
         }
         exclude
         inbound-interface br0+
         protocol all
         source {
             address 0.0.0.0/0
         }
     }
     rule 2 {
         description "Exclude traffic to 5g modem"
         destination {
             address 192.168.8.0/24
         }
         exclude
         inbound-interface br0+
         protocol all
         source {
             address 0.0.0.0/0
         }
     }
     rule 3 {
         description "Exclude Fiber Modem traffic"
         destination {
             address 192.168.1.0/24
         }
         exclude
         inbound-interface br0+
         protocol all
         source {
             address 0.0.0.0/0
         }
     }
     rule 4 {
         description "Exclude outgoing DNS traffic"
         destination {
             port 53
         }
         exclude
         inbound-interface br0+
         protocol udp
         source {
             address 0.0.0.0/0
         }
     }
     rule 5 {
         description "Exclude outgoing DNS traffic"
         destination {
             port 53
         }
         exclude
         inbound-interface br0+
         protocol tcp
         source {
             address 0.0.0.0/0
         }
     }
     rule 6 {
         description "Exclude WAN4 traffic"
         destination {
             address 0.0.0.0/0
         }
         exclude
         inbound-interface br0+
         source {
             address 10.0.50.21
         }
     }
     rule 7 {
         description "Exclude HZ traffic"
         destination {
             address 23.227.38.0/24
         }
         exclude
         inbound-interface br0+
     }
     rule 8 {
         description "Exclude WAN4 traffic"
         destination {
             address 0.0.0.0/0
         }
         exclude
         inbound-interface br0+
         source {
             address 10.0.50.14
         }
     }
     rule 9 {
         description WAN1_ONLY
         exclude
         inbound-interface br0.150
         protocol all
         source {
             address 10.0.150.0/24
         }
     }
     rule 10 {
         description WAN2_ONLY
         exclude
         inbound-interface br0.160
         protocol all
         source {
             address 10.0.160.0/24
         }
     }
     rule 20 {
         description WAN1_WAN2_FAILOVER
         failover
         inbound-interface br0+
         interface eth1 {
             weight 1
         }
         interface pppoe0 {
             weight 2
         }
         protocol all
         source {
             address 0.0.0.0/0
         }
     }
     sticky-connections {
         inbound
     }
 }

Flush connections are hardcoded in the codebase https://github.com/search?q=repo%3Avyos%2Fvyatta-wanloadbalance%20flush&type=code
Don’t use the same target to check, it is a bad idea.
I wonder if you can try “protocol failover route” + custom SNAT rules

I’ll change it.

I wonder if you can try “protocol failover route” + custom SNAT rules

I had looked into it when I was setting it up. If I recall correctly, Failover Protocol does not work with PPPoE interfaces.

There is a feature request to extend failover routes to support dynamically assigned routes, but you’re correct, it’s not supported yet

https://vyos.dev/T5647

If you get a new IP dynamically each time for pppoe needs to improve it as @JeffWDH suggested

I’m expecting to fully replace load balancing with failover route in the feature :wink:
At least for failover cases

2 Likes