VyOS router dropping isakmp packets silently on wan interface

Hi,

Some Windows 7 clients are having trouble to connect our IPSec/IKEV2 VPN server from the LTE (4G) mobile data network. A VyOS router located on the edge of our Autonomous System (AS) is discarding packets marked as fragmented on the WAN interface instead of forwarding them to the VPN server.
As we can see from the capture below, the first two packets exchanged on UDP port 500 are forwarded normally. When the client sends the first fragmented packet destined for UDP port 4500 containing the IKE_AUTH MID = 01 Initiator Request, this packet and subsequent packets are discarded by our VyOS WAN interface.
Capturing the packages on the VyOS WAN interface we can see that tcpdump identifies “len mismatch” (isakmp 1952 / ip 1468) in these packages and for some reason VyOS discards them.

These packets pass through multiple routers in your path but none of them discard packets, only VyOS.
When we route these packets through a Cisco router on another ISP link, the packets are routed to the VPN server and everything works normally.

Does anyone have any idea how to solve this?

Thanks

Capture with packets arriving at the VyOS router WAN:

vyos@MY-VYOS-ROUTER:~$ sudo tcpdump -i eth0 host 179.172.90.90
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth0, link-type EN10MB (Ethernet), capture size 262144 bytes
19:09:11.371087 IP 179-172-90-90.user.vivozap.com.br.isakmp > my.vpn.server.isakmp: isakmp: parent_sa ikev2_init[I]
19:09:11.373363 IP my.vpn.server.isakmp > 179-172-90-90.user.vivozap.com.br.isakmp: isakmp: parent_sa ikev2_init[R]
19:09:11.566815 IP 179-172-90-90.user.vivozap.com.br.ipsec-nat-t > my.vpn.server.ipsec-nat-t: NONESP-encap: isakmp: child_sa ikev2_auth[I]
19:09:13.591781 IP 179-172-90-90.user.vivozap.com.br.ipsec-nat-t > my.vpn.server.ipsec-nat-t: NONESP-encap: isakmp: child_sa ikev2_auth[I]
19:09:16.590730 IP 179-172-90-90.user.vivozap.com.br.ipsec-nat-t > my.vpn.server.ipsec-nat-t: NONESP-encap: isakmp: child_sa ikev2_auth[I]
^C
5 packets captured
7 packets received by filter
0 packets dropped by kernel

Capture packets on LAN interface (eth1) showing packets have been dropped

vyos@MY-VYOS-ROUTER:~$ sudo tcpdump -i eth1 host 179.172.90.90
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth1, link-type EN10MB (Ethernet), capture size 262144 bytes
19:09:30.438363 IP 179-172-90-90.user.vivozap.com.br.isakmp > my.vpn.server.isakmp: isakmp: parent_sa ikev2_init[I]
19:09:30.440416 IP my.vpn.server.isakmp > 179-172-90-90.user.vivozap.com.br.isakmp: isakmp: parent_sa ikev2_init[R]
^C
2 packets captured
15 packets received by filter
0 packets dropped by kernel
1 packet dropped by interface

Verbose capture on WAN interface showing more details (len mismatch) of dropped packets:

19:10:16.019806 IP (tos 0x0, ttl 109, id 6742, offset 0, flags [+], proto UDP (17), length 1500)
179-172-90-90.user.vivozap.com.br.ipsec-nat-t > my.vpn.server.ipsec-nat-t: NONESP-encap: isakmp 2.0 msgid 00000001: child_sa ikev2_auth[I]: [|v2e] (len mismatch: isakmp 1952/ip 1468)
19:10:18.013782 IP (tos 0x0, ttl 109, id 6818, offset 0, flags [+], proto UDP (17), length 1500)
179-172-90-90.user.vivozap.com.br.ipsec-nat-t > my.vpn.server.ipsec-nat-t: NONESP-encap: isakmp 2.0 msgid 00000001: child_sa ikev2_auth[I]: [|v2e] (len mismatch: isakmp 1952/ip 1468)
19:10:20.023659 IP (tos 0x0, ttl 109, id 6886, offset 0, flags [+], proto UDP (17), length 1500)
179-172-90-90.user.vivozap.com.br.ipsec-nat-t > my.vpn.server.ipsec-nat-t: NONESP-encap: isakmp 2.0 msgid 00000001: child_sa ikev2_auth[I]: [|v2e] (len mismatch: isakmp 1952/ip 1468)

VyOS configuration ( VyOS 1.2.3 on Dell PowerEdge 2950)

high-availability {
vrrp {
group eth1-10 {
advertise-interval 1
interface eth1
priority 110
rfc3768-compatibility
virtual-address x.x.x.x/28
vrid 10
}
}
}
interfaces {
ethernet eth0 {
address 10.6.5.124/29
description WAN
duplex auto
hw-id 00:1e:4f:19:3d:1c
smp-affinity auto
speed auto
}
ethernet eth1 {
address y.y.y.y/28
description LAN
duplex auto
hw-id 00:1e:4f:19:3d:1e
smp-affinity auto
speed auto
traffic-policy {
out DOWNSTREAM
}
}
ethernet eth2 {
duplex auto
hw-id 00:15:17:47:e0:08
smp-affinity auto
speed auto
}
loopback lo {
address 10.0.0.1/32
}
}
policy {
as-path-list AS99999 {
rule 10 {
action permit
regex ^ } } as-path-list FilterIn-AS88888 { rule 100 { action permit description "Filtra AS-path com tamanho maximo de 4" regex "^88888(_[0-9]+){0,3}"
}
}
prefix-list MEU-BLOCO {
rule 10 {
action permit
prefix Y.Y.Y.0/24
}
}
route-map LOCAL-PREF {
rule 100 {
action permit
set {
local-preference 200
}
}
}
route-map MYORG-OUT {
rule 10 {
action permit
match {
ip {
address {
prefix-list MEU-BLOCO
}
}
}
}
}
}
protocols {
bgp 99999 {
address-family {
ipv4-unicast {
network X.X.X.0/24 {
}
}
}
neighbor 10.0.0.2 {
address-family {
ipv4-unicast {
nexthop-self
}
}
remote-as 99999
timers {
connect 15
holdtime 30
keepalive 10
}
update-source lo
}
neighbor 10.6.5.121 {
address-family {
ipv4-unicast {
filter-list {
export AS99999
import FilterIn-AS88888
}
route-map {
export MYORG-OUT
}
}
}
remote-as 88888
timers {
holdtime 90
keepalive 30
}
}
}
static {
route 0.0.0.0/0 {
next-hop X.X.X.2 {
distance 25
}
}
route 10.0.0.2/32 {
next-hop X.X.X.2 {
}
}
route 172.17.1.19/32 {
next-hop X.X.X.6 {
}
}
route X.X.X.0/24 {
blackhole {
}
}
route X.X.X.64/26 {
next-hop X.X.X.6 {
distance 1
}
}
route y.y.y.128/25 {
next-hop X.X.X.6 {
}
}
route a.a.a.a/32 {
next-hop x.x.x.2 {
}
}
}
}
service {
snmp {
community abcde {
authorization ro
client k.k.k.k
network X.X.X.0/24
}
community public {
authorization ro
client k.k.k.k
network X.X.X.0/24
}
}
ssh {
port 22
}
}
system {
config-management {
commit-revisions 20
}
console {
device ttyS0 {
speed 9600
}
}
flow-accounting {
interface eth0
netflow {
engine-id 100
server y.y.y.y4 {
port 3001
}
version 5
}
syslog-facility daemon
}
host-name MYROUTER
login {
user vyos {
authentication {
encrypted-password $6$F9xeE2sWNRjL.RSs$h6GF87hszfyiRUq//Z7IK13Enqf3Nslz5PpjFz.
plaintext-password “”
}
level admin
}
user nagios {
authentication {
encrypted-password $6$6ZwU3JMhzPNGBo$l/elc1WZdMHDDD8i2ep1Nxet/pnyhKnjCMjIkg6.
plaintext-password “”
}
full-name “Nagios ssh”
level operator
}
}
name-server 8.8.8.8
name-server 1.1.1.1
ntp {
server a.ntp.br {
}
server b.ntp.br {
}
server ntp.pop-rs.rnp.br {
prefer
}
}
syslog {
global {
facility all {
level notice
}
facility protocols {
level debug
}
}
host 172.17.1.19 {
facility all {
level info
}
}
}
time-zone America/Sao_Paulo
}
traffic-policy {
shaper DOWNSTREAM {
bandwidth 250Mbit
class 80 {
bandwidth 80%
burst 45k
ceiling 90%
match MONITOR {
description “servicos de monitoramento do link”
ip {
protocol icmp
source {
address m.m.m.m/32
}
}
}
queue-type priority
}
class 90 {
bandwidth 80%
burst 45k
ceiling 90%
description “Trafego local”
match LOCAL-ETH1 {
ip {
source {
address y.y.y.y/32
}
}
}
queue-type priority
}
class 100 {
bandwidth 20%
burst 15k
ceiling 30%
description “Rede Wireless”
match WIFI-DOWN {
description “Range Wireless”
ip {
destination {
address x.x.x.7/32
}
}
}
queue-type priority
}
default {
bandwidth 80%
burst 45k
ceiling 90%
queue-type fair-queue
}
description “Politica de downstream para o link ISP1”
}
}

We have been able to go a little further in diagnosing the problem. It is occurring when a customer connects through a specific mobile data carrier (LTE). We have two traffic captures: one with the connection ok by the operator “x”, and another with error in the connection by the operator “y”. Looking at the captures, the only hypothesis that seems plausible is that one or some of the fragmented packets is being dropped somewhere along the route between the “Y” operator and the edge VyOS router. In error capture, the three fragmented packages have the same content. This leads us to believe that these are three attempts of the application to transmit and that only the first packet of each attempt has passed.

As VyOS does not receive all the fragments necessary to reassemble the package, it discards the remaining fragments, since “packet reassembles failed” and “fragments dropped after timeout” counters were incremented.