My goal is to enable configuration sync between two VyOS routers in a residential WAN and document the best process for others.
In a typical residential WAN scenario, there is a router (“VyOS”) with a WAN connection to a cable modem, fiber ONT or similar (“ONT”), receiving an IP address via DHCP client on its WAN port.
In order to enable HA and config sync, what should be put into the “TODO” spot?
In an ideal scenario, config sync syncs the DHCP lease from client to replica, but the replica’s WAN is inactive. How can this be achieved? Is there an alternative which would ensure that the WAN IP address on both master and replica are the same at the time of failover?
I used to do this at home with my setup, my ISP used PPPoE at the time.
It’s hard to do, because you have to leave the Interface on the secondary Vyos node shutdown, until such time as you can confirm the primary is 100% offline. This is because your ISP will usually only allow one session at a time.
I did this with VRRP and failover scripts, when the secondary detected it was master it unshut its WAN interface, allowing PPPoE to establish, and when it detected the master was back alive again it shudown the WAN interface. I found this tended to fail on me though if mastership flapped quickly a few times, I’d end up still with two WAN interfaces active.
I used conntrack-sync to ensure that sessions kept working, but I don’t know if it really actually worked seeing as Primary had a working PPPoE interface while Secondary didn’t. I did get pretty hitless failover, but I think that’s due to the tcp loose connection tracking, not actual session sync.
I didn’t do anything to ensure the configs stayed in sync, that I did manually.
In the end I removed it all and went back to a single machine, I found I’d made it more complex and more prone to failure than was worth the hassle. It did work though, but as mentioned tended to fail once a month in new an interesting ways, or the secondary wouldn’t have disabled its WAN interface and I’d be hammering my ISP with PPPoE that was never going to succed etc.
The “TODO” could be another VyOS router that does 1:1 NAT to a VRRP WAN IP on the two downstream VyOS routers.
Why you’d do this, well maybe for testing, lab purposes. Certainly doesn’t really make sense in production unless the HA gateways are doing something that makes them more prone to failure.
That’s why I said you’d only do this for lab or testing purposes. Maybe you don’t have more than one IP on the WAN, so a switch won’t work. Especially if it’s a home lab, you might need a NAT router in front of the HA pair in order to be able to test VRRP on the WAN interfaces.
Because you have to be able to either broadcast VRRP over a static subnet, or use unicast. And there is no way to set it to use DHCP even if it could work.
But you could do VRRP on the LAN and leave the WAN as DHCP on each gateway. You would not be able to keep states on failover, but the Internet would still work after re-establishing connections.