Two WANs, two wireguards, one datacenter instance, many attempts, no joy

Power · September 30, 2023, 3:49pm

Good day,
long time user, first time poster here.

I’ve been tackling this problem for 3 days now and read a ton of forum posts and many (outdated / pre 1.3) guides. So my head is spinning a bit.

Situation:
VyOS 1.4-rolling locally acts as firewall, gateway, dhcpd, the usual stuff. It has 2 WAN uplinks, one pppoe and one LTE over another modem. For sake of ease let’s call it “Homey”.

In the datacenter there’s another VyOS (same version) acting as a VPN / routing instance. Let’s call him “Speedy”.

Goal:
Homey should establish two distinct wireguard tunnels to Speedy (in the datacenter) via two different WANs. From there, speedy routes traffic over to either servers on the VPN or out 0.0.0.0/0 to its’ gateway.

Pretty easy one might think. Before this post get’s any longer than it should be, I refrain from telling you all the methods I tried over the last few days. I’d rather ask: How would you tackle this problem?

Addition: Speedy has two wireguard daemons listening in on different ports. Also, currently Homey successfully establishes the two tunnels via the default gateway, not via the two seperate links.

Thank you very much folks!

Apachez · September 30, 2023, 5:27pm

As a starter - does it work if you got a single wg-tunnel at Homey towards Speedy and then toggle between the two uplinks (as in one WAN at a time by connecting/disconnecting uplink cable)?

Just to rule things out for both transmission methods (so it isnt some ISP along the road who tries to block your encrypted VPN attempts).

Second I would attempt to use this as two different tunnels with different passphrase/pki.

Since source-address cannot be specified I would then look at using fwmark and then let the firewall act on that in order to which WAN ip to SNAT each tunnel into (so that tunnel1 only uses ISP_A and tunnel2 only uses ISP_B).

Apachez · September 30, 2023, 5:37pm

If not then at least regarding MTU where a PPPoE ISP for sure wont let through 1500 bytes MTU while a cellphone provider might.

When in doubt I would try 1280 bytes as MTU (lowest allowable for IPv6 where lowest allowable for IPv4 is 576 bytes) even if that isnt optimized (you want as large MTU as possible) but 1280 bytes on the clearside of the wg-tunnel should work as bare minimum (and that would allow for all sort of encapsulation on the cryptoside of the tunnel like tunnel in tunnel in tunnel in PPPoE and so on).

Power · September 30, 2023, 5:39pm

Hi Apachez,
thanks for the speedy reply

Yes, once I manually set Homeys default gateway to “the other” WAN link, both wg-tunnels get established and are handshaking normally over this specific WAN link. However, the local-policy tables never get applied. For clarification, Homey initiates the wireguard tunnel, not another LAN client.

Already did, different pki for both peers Homey (2 wg clients) and Speedy (2 wg servers). Additionally, different ports.

That was exactly my latest idea before I opted to post on this forum. After a break I’ll attempt the fwmarks in order to funnel traffic out a specific wg-tunnel. Thank you again, have a great weekend!

Power · September 30, 2023, 5:45pm

Exactly one of my first thoughts. Since cell phone connections use carrier grade NAT almost always and encapsulate like ten times, I already reduced the MTU. However, this wasn’t the cause as the wg-tunnel successfully establishes over either WAN link once I set a default gateway and traffic flows over it perfectly fine.

Apachez · September 30, 2023, 6:10pm

Regarding my first question I was thinking about if you remove all the PBR or whatever you were attempting.

That is setup Homey with a single wg-tunnel towards Speedy.

Now use WAN1, does it work? Unplug WAN1 and use WAN2 instead, does it still work?

Power · September 30, 2023, 8:53pm

May you’re right, my thinking got overcomplicated within the last 24h.

I had an idea. Since I control both endpoints (Homey and Speedy), what about setting two unique IPs on loopback of Homey and push it to the main routing table (/32). Valid point or quick thought that blows up in smoke?

(EDIT: corrected the names)

Edit2: For a better understanding: The WAN2 / Cell phone link has a modem I control. Once I push a static route to homey via that modem, the modem again has another static routing entry. And the moment the packet destined for the 2nd wg-tunnel @ Speedy leaves Homey over WAN2, my modem forwards it according to its’ static routing table, which points to Speedy also.

16again · September 30, 2023, 9:42pm

Consider using a 2nd public IP address at speedy for 2nd WG listener.
Then on homey , you can add 2 seperate /32 routes for both WG endpoints

Power · September 30, 2023, 9:44pm

Hey 16again,
spot on! As we speak I’m in the WebUI of the provider trying to find the menu for IP management. That would be much easier and also universal to any machines joining the network at a later point.

EDIT:
FYI, an additional IP for the same VPS costs 1$ less than another VPS node. IMO that defies the purpose. I should rather create another VPS instance for nearly the same price and have double redundancy on both sides… at least when it comes to links/routes. Of course I’d place Speedy 2 at another physical datacenter.

Current setup:
[ Homey ] ----> WAN PPPOE ----> [ wg0 @ Speedy ]
[ Homey ] ----> WAN GSM --------> [ wg1 @ Speedy ]

Two VPS nodes:
[ Homey ] ----> WAN PPPOE ----> [ wg0 @ Speedy 1 ]
[ Homey ] ----> WAN GSM --------> [ wg1 @ Speedy 2 ]

Apachez · September 30, 2023, 10:00pm

VyOS supports ECMP:

set system ip multipath layer4-hashing
set system ipv6 multipath layer4-hashing

So it should be possible to at Homey define wg0 going for lo0 (well in VyOS terminology dum0) as its peer (public IP of Speedy) and default route while wg1 goes for the same lo0 aka dum0 of Speedy as destination.

That is on Homey you will see two possible nexthops for your traffic once both tunnels are up at the same time. ECMP will make the loadsharing per session (aka 5-tuple that is combo of proto+srcip+dstip+srcport+dstport will use the same physical path).

Power · September 30, 2023, 10:48pm

Just as a reference and context for future readers, here’s someone with a comparable scenario:
https://vyos.dev/T3702

Power · September 30, 2023, 11:02pm

Question to avoid confusion: Are connection-mark and fwmark logically the same thing under different names?

set firewall name VYOS-PPPOE rule 50 connection-mark 10

and

set policy local-route rule 10 fwmark 10

Power · September 30, 2023, 11:33pm

I found this:

Imagine that as a feature for wireguard. It seems the ipsec implementation of VyOS supports it. Currently, one can

set interface wireguard wg0 peer speedy address 1.2.3.4

but not a FQDN.

Viacheslav · October 1, 2023, 1:47am

What is the problem to use PBR?
I didn’t see your rules. As you have different source ip/ports it should work fine,

Apachez · October 1, 2023, 6:20am

According to wireguard-tools - Required tools for WireGuard, such as wg(8) and wg-quick(8) fwmark is the way to go if you want to specify egress interface.

For example lock so wg0 only uses ISP_A and wg1 only uses ISP_B.

And since you define a local tunnel ip (when wgX is configured) you should be able to use that as nexthop for your regular routing (mostly as device as nexthop).

Note however Im not sure if the local tunnel ip is reachable if the tunnel is down so you might need to implement a dynamic routing protocol of your choice in order to not blackhole traffic.

For example if your routing table on the local VyOS looks something like this:

0.0.0.0/0 nexthop <tunnel_ip_A>
0.0.0.0/0 nexthop <tunnel_ip_B>

and tunnel B is down then you might run into blackholing where half of your ECMP sessions ends up with tunnel_ip_B as nexthop and gets dropped since the tunnel is down.

Same goes if you use regular cost based routing such as first send to tunnel_ip_A and if that fails then tunnel_ip_B - if the ip is reachable even if the tunnel is down then everything is sent to tunnel_ip_A and if the tunnel is down that would get dropped even if ISP_A is unplugged and ISP_B is plugged in.

But thats an easy test to verify how wireguard behaves.

Apachez · October 1, 2023, 6:28am

I think the reason is that there is nothing to PBR at.

But also comparing with other vendors PBR is a bad thing since this often means that the packets will be punted to the mgmt-plane rather than have the proper acceleration of the data-plane. As in PBR should always be avoided if possible.

On the other hand VyOS is a software router so it doesnt really matter (until the DPDK/VPP supports becomes fully supported then it depends on if PBR can be offloaded to the DPDK/VPP stack or not).

As I interpret this usecase is that there is a VyOS at both locations (Homey and Speedy).

Homey have 2 ISP’s to choose from (one is PPPoE and the other is a cellular provider).

The VyOS at Homey should take whatever the LAN clients are doing and encrypt it and send as wireguard over WAN to Speedy.

This is “normally” done with a single wireguard tunnel and then let the OS decide which uplink is prefered/reachable and use that.

But in this usecase the user wants to utilize both uplinks at once.

Without wireguard the solution is to simply adjust the ECMP settings for IPv4 and IPv6 and then have two default routes towards destination.

But the question here is how to achieve this with wireguard using the config frontend of VyOS at Homey (and Speedy)?

Correct me if I have misunderstood the assignment

Viacheslav · October 1, 2023, 7:27am

It is clean routing + PBR.
Each source address should connect to the destination address via its own ISP.
Doesn’t matter if you use source address policy or marking.
It doesn’t matter if you then use OSPF for loadbalancing, wan load balancing or failover route

FQDN doesn’t help as in IPSec it works like ID’s that Strongswan parse VS wireguard which just resolve domain name to IP address

Power · October 1, 2023, 12:07pm

Spot on, VyOS 1.4-rolling (as of July 2023) running on both endpoints, acting as the local gateway, router, firewall on bare metal hardware (Homey) and on the other side as a KVM/QEMU VPS instance at a data center with a fixed IPv4 (Speedy).

Correct, both wg-tunnels should remain established with a keepalive of 30 secs due to the fact that both the PPPoE and the GSM link are NAT’ed with dynamic IPs.

Once both tunnels are up and running, I’d load-balance / failover from there running ping health checks through the wireguard tunnel, which a) tests the WAN uplink itself and b) the existence of an established wg-tunnel at the same time. Two birds with one stone.

Baby steps! I believe once Homey is able to establish wg0 over PPPoE and wg1 over GSM, the following steps are easy.

Yes I think the solution would be different if Homey VyOS wouldn’t be a jack of all traits system also initiating both wg-tunnels. If both local wireguard instances would be two LAN clients, things would be much easier since then it’d be a matter of source IP / destination IP -----> tunnel A / tunnel B routing.

I’ll post the relevant config parts once I cleaned up the mess later today.

Power · October 1, 2023, 12:14pm

As a side note, I tested different ports for the wireguard tunnel since I heard that some ISPs like to throttle, mangle, even outright block the usual suspects UDP 1194 / 51820 and so forth.

ACiD_GRiM · October 2, 2023, 7:01am

Since this is an established thread for the same issue I have I’ll add on.

As I mentioned this scenario works well in another linux router OS, openWrt, and configuring similar settings that work in OpenWrt have an issue in vyos.

Currently my vyos router isn’t easily accessible because it shares the same hardware as the openwrt router, so unless I see something substantial I can’t tinker with it at will.

Reading this thread I see there was a suggestion to use another IP and setup PBR for each tunnel, I think that’s the easy solution and I used that initially, but paying for another IP just for a single app isn’t reasonable.

I think the winning config is pinning each WAN into it’s own routing table with the vrf commands, however looking at my openwrt config that is working, I notice that even though I can route across each wireguard tunnel from the default table, the wireguard interfaces are actually in separate route tables.
During my troubleshooting in vyos, I removed the wireguard interfaces from vrf assignments

Key elements:

Create a vrf/route table for each WAN
Configure FW mark on each WG interface
Set local-route policies to set the vrf/table of the matching wireguard tunnel interfaces
create a blackhole route for 0.0.0.0/0 (and ::/0 ) in each vrf with a distance of 254
assign each WAN interface to seperate vrf/tables
establish a routing protocol on each end of the tunnel and set costs/preferences as appropriate (equal for ECMP or tiered for failover)
Optional: include a route for default that points across the tunnel, either a static route to a loopback/dummy address advertised by the speedy/remote side and allow your local side’s networks to masquerade

Suggested: assign the individual wireguard interfaces to the matching WAN vrf/table.
You should see udp packets to/from your remote IP address and expected wireguard port only on one interface of your local vyos. i.e. DSL wireguard tunnel is port 4566 and LTE tunnel is 2534, running

sudo tcpdump -i eth1 host your.remote.ip.here

where eth1 is the DSL interface, you should only see 4566 to and from your remote IP

I’m curious if you are able to get your wireguard peers to handshake sucessfully with this, including my suggestion to add the wireguard interfaces to the same vrf/table

Here’s more info about what I did on OpenWrt, however adding the blackholes instead of firewall rules to pin wg traffic to the proper WAN interface if it’s offline was something I discovered yesterday.

even in Vyos, running sudo ip rule show should look similar

sudo ip rule show
0:	from all lookup local 
1:	from all iif lo lookup default 
2:	from all fwmark 0x34dd lookup 54 
3:	from all fwmark 0x34ee lookup 55 
4:	from all fwmark 0x34ff lookup 56 
5:	from all to 192.168.100.1 lookup 54 
6:	from all to 192.168.1.1 lookup 54 
7:	from all to 192.168.117.1 lookup 55 
10000:	from 10.33.23.10 lookup 56 
10000:	from 10.33.23.2 lookup 54 
10000:	from 10.33.23.6 lookup 55 
10000:	from 192.168.117.24 lookup 55 
10000:	from 100.97.22.65 lookup 54 
20000:	from all to 10.33.23.10/30 lookup 56 
20000:	from all to 10.33.23.2/30 lookup 54 
20000:	from all to 10.33.23.6/30 lookup 55 
20000:	from all to 192.168.117.24/24 lookup 55 
20000:	from all to 100.97.22.65/10 lookup 54