Failover loadbalance with PPPOE

Sorry I was not paying attention and missed that you are using PPPoE, which most likely means you to should TCP-MSS-clamping, else TCP will fail to work with most sites. (Google with Chrome might work as they use HTTP over QUIC, but others will fail)…

Basically if you do a tcpdump on the outgoing interface you’ll see that most TCP connections will stall and retransmit constantly before they eventually timeout and reset.


And now the bad news… TCP MSS clamping and WAN load-balancing don’t mix well together. In fact I stumbled into this issue at first and had to do quite a bit of hacking to make it right.

I’ve described the whole problem and solution in the following thread.


After you read the above (and apply it) you’ll also have to configure TCP-MSS clamping as bellow:

policy {
    route pppoe-mangle-in {
        rule 1 {
            protocol tcp
            set {
                tcp-mss 1452
            }
            tcp {
                flags SYN,!RST
            }
        }
    }
    route pppoe-mangle-out {
        rule 1 {
            destination {
                group {
                    network-group !lan
                }
            }
            protocol tcp
            set {
                tcp-mss 1452
            }
            tcp {
                flags SYN,!RST
            }
        }
    }
...
}

Then apply it to your pppoe interfaces like so:

        pppoe 0 {
            ....
            policy {
                route pppoe-mangle-in
            }

Then also to all your “lan” interfaces like so:

        eth0 {
            ....
            policy {
                route pppoe-mangle-out
            }

Also notice that there is a network-group called lan which should contain all your local networks… (You’ll also have to adjust the MTU of 1452 if the PPPoE overhead in your case is larger.


However if you are doing this in a larger deployment, and especially if you are using VM’s, I would strongly suggest that you use two separate VyOS routers, one for WAN balancing, and the other one for NAT, DHCP, TCP MSS clamping, etc.

This will keep your setup “sane”, especially since WAN-load-balancing doesn’t play nice with almost none other feature…