I’ve had an issue where clients with dynamic ips can’t reconnect to site to site write guard interfaces. Rebooting the core router was the only workaround.
This is with both keepalives on and off
I recently tried disabling and reenabling the interfaces after discovering the old ip was still listed as the peer. But vyos doesnt actually remove these wg interfaces. So when they are enabled with ‘del int wireg wg10 dis’ no change
Only disabling them with wireguard config, then removing the links with ‘ip link del wg10’ and then reenabling fixes this issue without rebooting the whole router.
Can the root issue Be improved so peers arent persisted indefinitely, or so i can more simply script around the issue, can disabling the wireguard interface also remove peers while leaving the interface up?
I’ve had this issue for several months across multiple rolling nightlies. Also I moved from a standalone gateway VPS that then tunneled to the core, to terminating the tunnels directly on the core edge.
Current build is VyOS 1.4-rolling-202308060317 on the core edge
The clients are various openwrt 23.05 routers.
Here’s the relevent config on the vyos router
set interfaces wireguard wg100 address '10.33.23.1/30'
set interfaces wireguard wg100 address 'fd00:f9a8:9a7e:300::1/64'
set interfaces wireguard wg100 ip adjust-mss 'clamp-mss-to-pmtu'
set interfaces wireguard wg100 ipv6 adjust-mss 'clamp-mss-to-pmtu'
set interfaces wireguard wg100 mtu '1340'
set interfaces wireguard wg100 peer npancwangw01-wan allowed-ips '0.0.0.0/0'
set interfaces wireguard wg100 peer npancwangw01-wan allowed-ips '::/0'
set interfaces wireguard wg100 peer npancwangw01-wan preshared-key
set interfaces wireguard wg100 peer npancwangw01-wan public-key
set interfaces wireguard wg100 port '9468'
set firewall name WAN_v4-to-LOCAL rule 1 action 'accept'
set firewall name WAN_v4-to-LOCAL rule 1 destination address 'x.x.x.11-x.x.x.14'
set firewall name WAN_v4-to-LOCAL rule 1 destination port '443,9468-9470,24680-24690'
set firewall name WAN_v4-to-LOCAL rule 1 protocol 'udp'
set firewall name WAN_v4-to-LOCAL rule 1 state new 'enable'
I re-enabled keep alives and the issue still persists. I wonder why the peer’s are stuck.
I’m not sure how to dynamically monitor wg uptime so I can reset the interface if the peer isn’t available for more than 5 minutes
It would be nice if the disable command for wireguard executed this to stop the peers which would more effectively disable the interface beyond just setting the if down
for i in $(wg show wg0 | grep ‘peer:’ | awk ‘{print $2}’); do wg set wg0 peer $i remove ; done
Something more elegent and built into vyos would be wonderful, but I worked around this seemingly well known wireguard drawback with dynamic clients
maybe something like
set interface wireguard wg0 peer remote-peer01 max-handshake-age 300
Which would automatically reset the wireguard interface if the last handshake exceeds the specified time
I created the below script in /config/scripts which is called as reset-vbond.sh ifname max-handshake-age-in-sec
#!/usr/bin/bash
WG=$1
DEAD=$2
HANDSHAKE=$(wg show $WG latest-handshakes)
PEER=$(echo "${HANDSHAKE}" | awk '{printf $1}')
TIME=$(echo "${HANDSHAKE}" | awk '{printf $2}')
if [[ $TIME == 0 ]]; then exit;fi
if [ $(($(date +'%s')-${TIME})) -gt $DEAD ]; then
/usr/bin/vbash -c "
source /opt/vyatta/etc/functions/script-template
configure
set interface wireguard $WG disable
commit
sudo ip link del dev $WG
del interface wireguard $WG disable
commit"
fi
Then I setup a recurring task scheduler for each interface
set system task-scheduler task check-wg0 executable arguments "wg0 300"
set system task-scheduler task check-wg0 executable path /config/scripts/reset-vbond.sh
set system task-scheduler task check-wg0 interval 5