Wireguard Intermittent Loss

jbhardman · November 1, 2022, 8:54pm

I setup a tunnel to a remote WG instance. I have other clients on that side of the tunnel that I work with. Randomly, but frequently, I will lose access to those instances. I will ssh in to a box on that side, and all be working, then all of a sudden lose my session. I tested it with ping, and I see this pattern over and over:

PING 10.0.6.3 (10.0.6.3): 56 data bytes
64 bytes from 10.0.6.3: icmp_seq=0 ttl=62 time=27.737 ms
64 bytes from 10.0.6.3: icmp_seq=1 ttl=62 time=25.056 ms
64 bytes from 10.0.6.3: icmp_seq=2 ttl=62 time=24.426 ms
64 bytes from 10.0.6.3: icmp_seq=3 ttl=62 time=24.478 ms
64 bytes from 10.0.6.3: icmp_seq=4 ttl=62 time=24.931 ms
64 bytes from 10.0.6.3: icmp_seq=5 ttl=62 time=25.199 ms
64 bytes from 10.0.6.3: icmp_seq=6 ttl=62 time=24.346 ms
64 bytes from 10.0.6.3: icmp_seq=7 ttl=62 time=24.064 ms
64 bytes from 10.0.6.3: icmp_seq=8 ttl=62 time=24.749 ms
64 bytes from 10.0.6.3: icmp_seq=9 ttl=62 time=24.363 ms
64 bytes from 10.0.6.3: icmp_seq=10 ttl=62 time=23.786 ms
Request timeout for icmp_seq 11
Request timeout for icmp_seq 12
Request timeout for icmp_seq 13
Request timeout for icmp_seq 14
Request timeout for icmp_seq 15
Request timeout for icmp_seq 16
Request timeout for icmp_seq 17
64 bytes from 10.0.6.3: icmp_seq=18 ttl=62 time=24.136 ms
64 bytes from 10.0.6.3: icmp_seq=19 ttl=62 time=24.655 ms
64 bytes from 10.0.6.3: icmp_seq=20 ttl=62 time=24.910 ms
64 bytes from 10.0.6.3: icmp_seq=21 ttl=62 time=24.587 ms
64 bytes from 10.0.6.3: icmp_seq=22 ttl=62 time=24.245 ms
64 bytes from 10.0.6.3: icmp_seq=23 ttl=62 time=24.518 ms
64 bytes from 10.0.6.3: icmp_seq=24 ttl=62 time=24.197 ms

This rate of dropped packets is common:

--- 10.0.6.3 ping statistics ---
135 packets transmitted, 112 received, 17.037% packet loss, time 134745ms
rtt min/avg/max/mdev = 24.818/32.591/258.612/26.369 ms

I’m not sure where to start diagnosing, because it seems my tunnel config and nat and firewall rules are good, or it wouldn’t work at all. When I use other connections to the same remote tunnel, like the wireguard client on Mac, iOS, or even directly from Linux laptop, no such issue. Only when routing through vyos. Which configs would be helpful to see? Here’s what I can think of:

nat {
    source {
        rule 100 {
            outbound-interface eth0
            translation {
                address masquerade
            }
        }
        rule 101 {
            outbound-interface wg0
            translation {
                address masquerade
            }
        }
    }
}

wireguard wg0 {
        address 10.0.6.2/24
        description "DO VPN SFO2"
        peer dosfo2 {
            address 'remoteserverip'
            allowed-ips 0.0.0.0/0
            persistent-keepalive 15
            port 51820
            public-key 'key'
        }
        port 51821
        private-key 'key'
    }

policy {
    local-route {
        rule 10 {
            destination 10.0.6.0/24
            set {
                table 10
            }
        }
    }
}

tjh · November 1, 2022, 9:42pm

What version of Vyos?

What does show log kernel show you (if anything)

Dmitry · November 2, 2022, 8:03am

Hi @jbhardman, I saw the similar picture when 2 WG client with the same IP in tunnel try to work simultaneously. Try to check this.

echowings · November 2, 2022, 9:33pm

Try to ping wireguard server ip address on client side. If you got same packet loss, then this is a connection issue. not vyos or wireguard issue.

jbhardman · November 2, 2022, 11:44pm

No duplicate IP’s.
Pinging server IP (10.0.6.1) yields the same result. I disabled the wg0 interface on the router, and used the Wireguard software on MacOS to connect directly. Here’s the ping results:

--- 10.0.6.1 ping statistics ---
107 packets transmitted, 107 packets received, 0.0% packet loss
round-trip min/avg/max/stddev = 22.286/22.954/23.842/0.280 ms

I turned that off. Turned back on the interface on the vyos router, and here’s the ping results (pinging from router):

--- 10.0.6.1 ping statistics ---
101 packets transmitted, 86 received, 14.8515% packet loss, time 100480ms
rtt min/avg/max/mdev = 21.731/22.651/26.220/0.648 ms

Not a thing mentioned in the logs.
This is the “table 10” that I reference on the local-route above. Is this wrong? I got it online:

 route 0.0.0.0/0 {
     blackhole {
         distance 255
     }
     interface wg0 {
     }
     next-hop 10.0.6.1 {
     }
 }

The thing that gets me is, if it was a bad config, I’d expect nothing. This intermittent thing is odd to me.

Viacheslav · November 3, 2022, 6:26am

With distance 255 route should be inactive

jbhardman · November 3, 2022, 12:56pm

Thanks for the tip. I’m not even really sure what that does. I removed it, but, it’s still not working right.

--- 10.0.6.1 ping statistics ---
86 packets transmitted, 71 received, 17.4419% packet loss, time 85469ms
rtt min/avg/max/mdev = 25.218/28.626/53.477/3.263 ms

 table 10 {
     route 0.0.0.0/0 {
         interface wg0 {
         }
         next-hop 10.0.6.1 {
         }
     }
 }

jbhardman · November 3, 2022, 9:27pm

Well, major egg on my face.
When I checked for duplicate IPs, I just checked the wg0.conf on remote server. No duplicates were listed. I had a VM that I had completely forgotten about, connected, using the config that I had given to the vyos machine. I had two machines connected using the same IP.
So sorry I missed that and it was even pointed out to me.

Thanks!

system · November 5, 2022, 9:27pm

This topic was automatically closed 2 days after the last reply. New replies are no longer allowed.