I’ve been tackling this problem for 3 days now and read a ton of forum posts and many (outdated / pre 1.3) guides. So my head is spinning a bit.
VyOS 1.4-rolling locally acts as firewall, gateway, dhcpd, the usual stuff. It has 2 WAN uplinks, one pppoe and one LTE over another modem. For sake of ease let’s call it “Homey”.
In the datacenter there’s another VyOS (same version) acting as a VPN / routing instance. Let’s call him “Speedy”.
Homey should establish two distinct wireguard tunnels to Speedy (in the datacenter) via two different WANs. From there, speedy routes traffic over to either servers on the VPN or out 0.0.0.0/0 to its’ gateway.
Pretty easy one might think. Before this post get’s any longer than it should be, I refrain from telling you all the methods I tried over the last few days. I’d rather ask: How would you tackle this problem?
Addition: Speedy has two wireguard daemons listening in on different ports. Also, currently Homey successfully establishes the two tunnels via the default gateway, not via the two seperate links.
As a starter - does it work if you got a single wg-tunnel at Homey towards Speedy and then toggle between the two uplinks (as in one WAN at a time by connecting/disconnecting uplink cable)?
Just to rule things out for both transmission methods (so it isnt some ISP along the road who tries to block your encrypted VPN attempts).
Second I would attempt to use this as two different tunnels with different passphrase/pki.
Since source-address cannot be specified I would then look at using fwmark and then let the firewall act on that in order to which WAN ip to SNAT each tunnel into (so that tunnel1 only uses ISP_A and tunnel2 only uses ISP_B).
If not then at least regarding MTU where a PPPoE ISP for sure wont let through 1500 bytes MTU while a cellphone provider might.
When in doubt I would try 1280 bytes as MTU (lowest allowable for IPv6 where lowest allowable for IPv4 is 576 bytes) even if that isnt optimized (you want as large MTU as possible) but 1280 bytes on the clearside of the wg-tunnel should work as bare minimum (and that would allow for all sort of encapsulation on the cryptoside of the tunnel like tunnel in tunnel in tunnel in PPPoE and so on).
Yes, once I manually set Homeys default gateway to “the other” WAN link, both wg-tunnels get established and are handshaking normally over this specific WAN link. However, the local-policy tables never get applied. For clarification, Homey initiates the wireguard tunnel, not another LAN client.
Already did, different pki for both peers Homey (2 wg clients) and Speedy (2 wg servers). Additionally, different ports.
That was exactly my latest idea before I opted to post on this forum. After a break I’ll attempt the fwmarks in order to funnel traffic out a specific wg-tunnel. Thank you again, have a great weekend!
Exactly one of my first thoughts. Since cell phone connections use carrier grade NAT almost always and encapsulate like ten times, I already reduced the MTU. However, this wasn’t the cause as the wg-tunnel successfully establishes over either WAN link once I set a default gateway and traffic flows over it perfectly fine.
May you’re right, my thinking got overcomplicated within the last 24h.
I had an idea. Since I control both endpoints (Homey and Speedy), what about setting two unique IPs on loopback of Homey and push it to the main routing table (/32). Valid point or quick thought that blows up in smoke?
(EDIT: corrected the names)
Edit2: For a better understanding: The WAN2 / Cell phone link has a modem I control. Once I push a static route to homey via that modem, the modem again has another static routing entry. And the moment the packet destined for the 2nd wg-tunnel @ Speedy leaves Homey over WAN2, my modem forwards it according to its’ static routing table, which points to Speedy also.
spot on! As we speak I’m in the WebUI of the provider trying to find the menu for IP management. That would be much easier and also universal to any machines joining the network at a later point.
FYI, an additional IP for the same VPS costs 1$ less than another VPS node. IMO that defies the purpose. I should rather create another VPS instance for nearly the same price and have double redundancy on both sides… at least when it comes to links/routes. Of course I’d place Speedy 2 at another physical datacenter.
[ Homey ] ----> WAN PPPOE ----> [ wg0 @ Speedy ]
[ Homey ] ----> WAN GSM --------> [ wg1 @ Speedy ]
Two VPS nodes:
[ Homey ] ----> WAN PPPOE ----> [ wg0 @ Speedy 1 ]
[ Homey ] ----> WAN GSM --------> [ wg1 @ Speedy 2 ]
set system ip multipath layer4-hashing
set system ipv6 multipath layer4-hashing
So it should be possible to at Homey define wg0 going for lo0 (well in VyOS terminology dum0) as its peer (public IP of Speedy) and default route while wg1 goes for the same lo0 aka dum0 of Speedy as destination.
That is on Homey you will see two possible nexthops for your traffic once both tunnels are up at the same time. ECMP will make the loadsharing per session (aka 5-tuple that is combo of proto+srcip+dstip+srcport+dstport will use the same physical path).
and tunnel B is down then you might run into blackholing where half of your ECMP sessions ends up with tunnel_ip_B as nexthop and gets dropped since the tunnel is down.
Same goes if you use regular cost based routing such as first send to tunnel_ip_A and if that fails then tunnel_ip_B - if the ip is reachable even if the tunnel is down then everything is sent to tunnel_ip_A and if the tunnel is down that would get dropped even if ISP_A is unplugged and ISP_B is plugged in.
But thats an easy test to verify how wireguard behaves.
I think the reason is that there is nothing to PBR at.
But also comparing with other vendors PBR is a bad thing since this often means that the packets will be punted to the mgmt-plane rather than have the proper acceleration of the data-plane. As in PBR should always be avoided if possible.
On the other hand VyOS is a software router so it doesnt really matter (until the DPDK/VPP supports becomes fully supported then it depends on if PBR can be offloaded to the DPDK/VPP stack or not).
As I interpret this usecase is that there is a VyOS at both locations (Homey and Speedy).
Homey have 2 ISP’s to choose from (one is PPPoE and the other is a cellular provider).
The VyOS at Homey should take whatever the LAN clients are doing and encrypt it and send as wireguard over WAN to Speedy.
This is “normally” done with a single wireguard tunnel and then let the OS decide which uplink is prefered/reachable and use that.
But in this usecase the user wants to utilize both uplinks at once.
Without wireguard the solution is to simply adjust the ECMP settings for IPv4 and IPv6 and then have two default routes towards destination.
But the question here is how to achieve this with wireguard using the config frontend of VyOS at Homey (and Speedy)?
It is clean routing + PBR.
Each source address should connect to the destination address via its own ISP.
Doesn’t matter if you use source address policy or marking.
It doesn’t matter if you then use OSPF for loadbalancing, wan load balancing or failover route
FQDN doesn’t help as in IPSec it works like ID’s that Strongswan parse VS wireguard which just resolve domain name to IP address
Spot on, VyOS 1.4-rolling (as of July 2023) running on both endpoints, acting as the local gateway, router, firewall on bare metal hardware (Homey) and on the other side as a KVM/QEMU VPS instance at a data center with a fixed IPv4 (Speedy).
Correct, both wg-tunnels should remain established with a keepalive of 30 secs due to the fact that both the PPPoE and the GSM link are NAT’ed with dynamic IPs.
Once both tunnels are up and running, I’d load-balance / failover from there running ping health checks through the wireguard tunnel, which a) tests the WAN uplink itself and b) the existence of an established wg-tunnel at the same time. Two birds with one stone.
Baby steps! I believe once Homey is able to establish wg0 over PPPoE and wg1 over GSM, the following steps are easy.
Yes I think the solution would be different if Homey VyOS wouldn’t be a jack of all traits system also initiating both wg-tunnels. If both local wireguard instances would be two LAN clients, things would be much easier since then it’d be a matter of source IP / destination IP -----> tunnel A / tunnel B routing.
I’ll post the relevant config parts once I cleaned up the mess later today.
Since this is an established thread for the same issue I have I’ll add on.
As I mentioned this scenario works well in another linux router OS, openWrt, and configuring similar settings that work in OpenWrt have an issue in vyos.
Currently my vyos router isn’t easily accessible because it shares the same hardware as the openwrt router, so unless I see something substantial I can’t tinker with it at will.
Reading this thread I see there was a suggestion to use another IP and setup PBR for each tunnel, I think that’s the easy solution and I used that initially, but paying for another IP just for a single app isn’t reasonable.
I think the winning config is pinning each WAN into it’s own routing table with the vrf commands, however looking at my openwrt config that is working, I notice that even though I can route across each wireguard tunnel from the default table, the wireguard interfaces are actually in separate route tables.
During my troubleshooting in vyos, I removed the wireguard interfaces from vrf assignments
Create a vrf/route table for each WAN
Configure FW mark on each WG interface
Set local-route policies to set the vrf/table of the matching wireguard tunnel interfaces
create a blackhole route for 0.0.0.0/0 (and ::/0 ) in each vrf with a distance of 254
assign each WAN interface to seperate vrf/tables
establish a routing protocol on each end of the tunnel and set costs/preferences as appropriate (equal for ECMP or tiered for failover)
Optional: include a route for default that points across the tunnel, either a static route to a loopback/dummy address advertised by the speedy/remote side and allow your local side’s networks to masquerade
Suggested: assign the individual wireguard interfaces to the matching WAN vrf/table.
You should see udp packets to/from your remote IP address and expected wireguard port only on one interface of your local vyos. i.e. DSL wireguard tunnel is port 4566 and LTE tunnel is 2534, running
sudo tcpdump -i eth1 host your.remote.ip.here
where eth1 is the DSL interface, you should only see 4566 to and from your remote IP
I’m curious if you are able to get your wireguard peers to handshake sucessfully with this, including my suggestion to add the wireguard interfaces to the same vrf/table
Here’s more info about what I did on OpenWrt, however adding the blackholes instead of firewall rules to pin wg traffic to the proper WAN interface if it’s offline was something I discovered yesterday.
even in Vyos, running sudo ip rule show should look similar
sudo ip rule show
0: from all lookup local
1: from all iif lo lookup default
2: from all fwmark 0x34dd lookup 54
3: from all fwmark 0x34ee lookup 55
4: from all fwmark 0x34ff lookup 56
5: from all to 192.168.100.1 lookup 54
6: from all to 192.168.1.1 lookup 54
7: from all to 192.168.117.1 lookup 55
10000: from 10.33.23.10 lookup 56
10000: from 10.33.23.2 lookup 54
10000: from 10.33.23.6 lookup 55
10000: from 192.168.117.24 lookup 55
10000: from 100.97.22.65 lookup 54
20000: from all to 10.33.23.10/30 lookup 56
20000: from all to 10.33.23.2/30 lookup 54
20000: from all to 10.33.23.6/30 lookup 55
20000: from all to 192.168.117.24/24 lookup 55
20000: from all to 100.97.22.65/10 lookup 54