Regression in local traffic WAN load balancing between 1.5-stream-2025Q1 and Q2

I have a VyOS router connected to my LAN (eth0) and to two upstream internet providers (eth1 and eth2). It runs pdns-recursor, which is used by the machines on my LAN to resolve DNS.

I recently upgraded from 1.5-stream-2025-Q1 to 1.5-stream-2025-Q2 and I immediately started to see DNS resolution issues on the machines on the LAN. After some troubleshooting, I noticed that I had DNS requests going out through the wrong interface (or with the wrong address). More specifically, I would see packets going out through eth1, but using eth2’s address, and vice versa. Of course, these packets would be ignored by the upstream providers, resulting in DNS timeouts.

The problem is very easy to reproduce by running the following command directly on VyOS:

for N in $(seq 6); do dig enix.fr @4.2.2.$N; done

On 1.5-stream-2025-Q1 all 6 requests get a response immediately.

On 1.5-stream-2025-Q2, requests will randomly timeout.

Interestingly, I ran into this issue when I first deployed VyOS (on 1.5-stream-2025-Q1), and if I remember correctly, I solved it by adding set load-balancing wan enable-local-traffic and by adding a couple of NAT rules. (I don’t know if this was the right decision, so don’t hesitate to point out if there is a better way to do this!)

The relevant configuration snippet is:

load-balancing {
    wan {
        enable-local-traffic
        interface-health eth1 {
            nexthop dhcp
        }
        interface-health eth2 {
            nexthop dhcp
        }
        rule 1 {
            inbound-interface eth0
            interface eth1 {
            }
            interface eth2 {
            }
        }
    }
}
nat {
    source {
        rule 100 {
            outbound-interface {
                name eth1
            }
            source {
                address 10.0.0.0/24
            }
            translation {
                address masquerade
            }
        }
        rule 102 {
            outbound-interface {
                name eth1
            }
            source {
                address 192.168.2.0/24
            }
            translation {
                address masquerade
            }
        }
        rule 200 {
            outbound-interface {
                name eth2
            }
            source {
                address 10.0.0.0/24
            }
            translation {
                address masquerade
            }
        }
        rule 201 {
            outbound-interface {
                name eth2
            }
            source {
                address 192.168.1.0/24
            }
            translation {
                address masquerade
            }
        }
    }
}

The problem appeared immediately after I upgraded to 1.5-stream-2025-Q2 and disappeared as soon as I rolled back to 1.5-stream-2025-Q1.

I don’t really need to upgrade to 1.5-stream-2025-Q2 so this is not a big problem for me; but since it seems to be a regression, I thought I’d open a bug report.

1.5-stream-2025-Q2 moved from the legacy Vyatta WLB solution to the VyOS WLB solution. You can see that in this blog post: VyOS Stream 1.5-2025-Q2 is available for download. What you’re seeing is likely a bug in how the the WLB solution generates its SNAT rules. Can you submit a bug to https://vyos.dev/ detailing what you’re seeing?

With that said, since you’re already defining your own SNAT rules, you can maybe solve this by disabling SNAT for WLB.

set load-balancing wan disable-source-nat

You can read about that here: WAN load balancing — VyOS 1.5.x (circinus) documentation

That may cause unexpected results though since the WLB chains are run after the SNAT chains. I haven’t tested the WLB solution, so this is all from a quick glance at the source code.