High-availability virtual-server not load balancing traffic

This is with VyOS 1.5-rolling-202501110007. I’m trying to use high-availability virtual-server to balance HTTP traffic across multiple backends and it ends up sending 100% of the connections to a single backend unless I manually make changes via ipvsadm

Here’s the config:

virtual-server http {
     address 50.106.9.46
     algorithm weighted-round-robin
     delay-loop 1
     port 8080
     protocol tcp
     real-server 172.16.0.1 {
         port 80
     }
     real-server 172.16.0.2 {
         port 80
     }
     real-server 172.16.1.1 {
         port 80
     }
     real-server 172.16.1.2 {
         port 80
     }
     real-server 172.31.255.1 {
         connection-timeout 5
         port 80
     }
     real-server 172.31.255.2 {
         connection-timeout 5
         port 80
     }
 }

Running a load test from an external server with 256 open connections results in all of the traffic going to the same back end:

TCP  50.106.9.46:http-alt wrr
  -> 172.16.0.1:http              Masq    1      0          0
  -> 172.16.0.2:http              Masq    1      0          0
  -> 172.16.1.1:http              Masq    1      256        0
  -> 172.16.1.2:http              Masq    1      0          0
  -> scottstuff.net:http          Masq    1      0          0
  -> scottstuff.net:http          Masq    1      0          0

Exactly which backend gets the traffic varies from time to time, but it’s almost always a single backend. Changing the LB algorithm in VyOS doesn’t seem to matter.

However, manually changing the algorithm via ipvsadm and then changing it back makes future traffic balance correctly:

# ipvsadm -E -t 50.106.9.46:8080 -s rr
# ipvsadm -E -t 50.106.9.46:8080 -s wrr
... run test ...
TCP  50.106.9.46:http-alt wrr
  -> 172.16.0.1:http              Masq    1      42         0
  -> 172.16.0.2:http              Masq    1      43         0
  -> 172.16.1.1:http              Masq    1      43         256
  -> 172.16.1.2:http              Masq    1      42         0
  -> scottstuff.net:http          Masq    1      43         0
  -> scottstuff.net:http          Masq    1      43         0

I saw the same behavior with a nightly build from July and just upgraded, but still see the same basic problem even with the newest nightly.

Note that the slightly mis-matched connection-timeout settings are from adding additional backends to try to work around this problem; they don’t really seem to make any difference to the system’s behavior.

@ScottLaird I’m not personally familiar with the high-availability virtual-server features, but it looks like you have a fairly comprehensive bug report there. You may wish go ahead and file a big report with any additional details and reproduction steps at Phabricator, per the Report a Bug docs.

I don’t see difference before /after?
Can we see it from the output somehow?
I mean it shows WRR in both cases so probably bug with Keepaived

There shouldn’t be a difference; this is the odd bit. I’m just changing the algorithm and then changing it back immediately. It could be a keepalived bug, a kernel bug, or possibly something odd in the way that VyOS invokes them. I haven’t looked at any the code yet.

In any case, though, I’m able to reproduce it easily.

Okay, it looks like VyOS is pretty clearly just calling systemctl reload-or-restart keepalived.service in high-availability.py.

It looks like I have keepalived 1:2.2.8-1, which isn’t actually listed as a package version for Bookworm; in any case, none of the release notes for Keepalived 2.2.8-2.3.2 have anything in them that immediately leaps out at me as applying here.

The fact that this can be fixed via ipvsadm makes me think it might actually be a kernel problem.

@ScottLaird could you add a bug report? on https://vyos.dev/
with simple steps to reproduce?

Just waiting for my account to be activated so I can create the bug.

2 Likes

Filed as ⚓ T7059 High-availability virtual-server not load balancing traffic between backend servers

1 Like

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.