Hi all,
Long-time EdgeOS/VyOS user, struggling right now with intermittent IPSec drop issues with VyOS 1.3.0RC5 (though I’ve had issues across a few versions, just testing RC5 as its latest and could include fixes to my issues).
Scenario
I’ve got a couple VPNs up, each to a Ubiquiti EdgeRouter on the other end. Periodically (takes 1 day or more at times), the VPN will drop, the IPSec SA will disappear, and the VyOS router will never try to re-initiate the VPN until I explicitly ask it to (via reset vpn ipsec-peer
or similar).
Environment
ryanb@ubnt01-sec:~$ show version
Version: VyOS 1.3.0-rc5
Release Train: equuleus
Built by: Sentrium S.L.
Built on: Tue 29 Jun 2021 08:26 UTC
Build UUID: 36f7c218-6ebb-497f-9ec5-676241e5c13a
Build Commit ID: 892e8689b3234e
Architecture: x86_64
Boot via: installed image
System type: VMware guest
Hardware vendor: VMware, Inc.
Hardware model: VMware Virtual Platform
Hardware S/N: VMware-42 39 17 9d b5 1f a4 1b-94 7f a3 b1 00 c7 51 5c
Hardware UUID: 9d173942-1fb5-1ba4-947f-a3b100c7515c
Copyright: VyOS maintainers and contributors
VyOS IPSec configuration:
set vpn ipsec esp-group esp-azure compression 'disable'
set vpn ipsec esp-group esp-azure lifetime '3600'
set vpn ipsec esp-group esp-azure mode 'tunnel'
set vpn ipsec esp-group esp-azure pfs 'disable'
set vpn ipsec esp-group esp-azure proposal 1 encryption 'aes256'
set vpn ipsec esp-group esp-azure proposal 1 hash 'sha1'
set vpn ipsec esp-group esp-azure proposal 2 encryption 'aes256'
set vpn ipsec esp-group esp-azure proposal 2 hash 'sha256'
set vpn ipsec esp-group esp-azure proposal 3 encryption 'aes128'
set vpn ipsec esp-group esp-azure proposal 3 hash 'sha1'
set vpn ipsec ike-group ike-azure ikev2-reauth 'no'
set vpn ipsec ike-group ike-azure key-exchange 'ikev2'
set vpn ipsec ike-group ike-azure lifetime '28800'
set vpn ipsec ike-group ike-azure proposal 1 dh-group '2'
set vpn ipsec ike-group ike-azure proposal 1 encryption 'aes256'
set vpn ipsec ike-group ike-azure proposal 1 hash 'sha1'
set vpn ipsec ike-group ike-azure proposal 2 dh-group '2'
set vpn ipsec ike-group ike-azure proposal 2 encryption 'aes256'
set vpn ipsec ike-group ike-azure proposal 2 hash 'sha256'
set vpn ipsec ike-group ike-azure proposal 3 dh-group '2'
set vpn ipsec ike-group ike-azure proposal 3 encryption 'aes128'
set vpn ipsec ike-group ike-azure proposal 3 hash 'sha1'
set vpn ipsec ike-group ike-azure proposal 4 dh-group '2'
set vpn ipsec ike-group ike-azure proposal 4 encryption 'aes128'
set vpn ipsec ike-group ike-azure proposal 4 hash 'sha256'
set vpn ipsec ipsec-interfaces interface 'eth0'
set vpn ipsec logging log-level '1'
set vpn ipsec logging log-modes 'any'
set vpn ipsec logging log-modes 'ike'
set vpn ipsec logging log-modes 'esp'
set vpn ipsec logging log-modes 'net'
set vpn ipsec nat-traversal 'disable'
set vpn ipsec options disable-route-autoinstall
set vpn ipsec site-to-site peer 2.2.2.2 authentication mode 'pre-shared-secret'
set vpn ipsec site-to-site peer 2.2.2.2 authentication pre-shared-secret 'secret'
set vpn ipsec site-to-site peer 2.2.2.2 connection-type 'initiate'
set vpn ipsec site-to-site peer 2.2.2.2 default-esp-group 'esp-azure'
set vpn ipsec site-to-site peer 2.2.2.2 ike-group 'ike-azure'
set vpn ipsec site-to-site peer 2.2.2.2 ikev2-reauth 'inherit'
set vpn ipsec site-to-site peer 2.2.2.2 local-address '1.1.1.1'
set vpn ipsec site-to-site peer 2.2.2.2 vti bind 'vti10'
EdgeOS peer config:
set vpn ipsec allow-access-to-local-interface disable
set vpn ipsec auto-firewall-nat-exclude enable
set vpn ipsec disable-uniqreqids
set vpn ipsec esp-group FOO0 compression disable
set vpn ipsec esp-group FOO0 lifetime 3600
set vpn ipsec esp-group FOO0 mode tunnel
set vpn ipsec esp-group FOO0 pfs disable
set vpn ipsec esp-group FOO0 proposal 1 encryption aes256
set vpn ipsec esp-group FOO0 proposal 1 hash sha256
set vpn ipsec esp-group FOO2 compression disable
set vpn ipsec esp-group FOO2 lifetime 27000
set vpn ipsec esp-group FOO2 mode tunnel
set vpn ipsec esp-group FOO2 pfs disable
set vpn ipsec esp-group FOO2 proposal 1 encryption aes256
set vpn ipsec esp-group FOO2 proposal 1 hash sha256
set vpn ipsec ike-group FOO0 ikev2-reauth no
set vpn ipsec ike-group FOO0 key-exchange ikev2
set vpn ipsec ike-group FOO0 lifetime 28800
set vpn ipsec ike-group FOO0 proposal 1 dh-group 2
set vpn ipsec ike-group FOO0 proposal 1 encryption aes256
set vpn ipsec ike-group FOO0 proposal 1 hash sha256
set vpn ipsec ike-group FOO2 ikev2-reauth no
set vpn ipsec ike-group FOO2 key-exchange ikev2
set vpn ipsec ike-group FOO2 lifetime 28800
set vpn ipsec ike-group FOO2 proposal 1 dh-group 2
set vpn ipsec ike-group FOO2 proposal 1 encryption aes256
set vpn ipsec ike-group FOO2 proposal 1 hash sha256
set vpn ipsec site-to-site peer 1.1.1.1 authentication mode pre-shared-secret
set vpn ipsec site-to-site peer 1.1.1.1 authentication pre-shared-secret secret
set vpn ipsec site-to-site peer 1.1.1.1 connection-type respond
set vpn ipsec site-to-site peer 1.1.1.1 default-esp-group FOO2
set vpn ipsec site-to-site peer 1.1.1.1 description ipsec
set vpn ipsec site-to-site peer 1.1.1.1 ike-group FOO2
set vpn ipsec site-to-site peer 1.1.1.1 ikev2-reauth inherit
set vpn ipsec site-to-site peer 1.1.1.1 local-address 2.2.2.2
set vpn ipsec site-to-site peer 1.1.1.1 vti bind vti3
set vpn ipsec site-to-site peer 1.1.1.1 vti esp-group FOO2
Logs
7/19/2021 00:20,Info,ubnt01-sec.lan,06[IKE] <peer-2.2.2.2-tunnel-vti|13> giving up after 5 retransmits
7/19/2021 00:19,Info,ubnt01-sec.lan,%ADJCHANGE: neighbor 192.168.1.1(Unknown) in vrf default Down BGP Notification send
7/19/2021 00:19,Info,ubnt01-sec.lan,%NOTIFICATION: sent to neighbor 192.168.1.1 4/0 (Hold Timer Expired) 0 bytes
7/19/2021 00:19,Info,ubnt01-sec.lan,05[NET] <peer-2.2.2.2-tunnel-vti|13> sending packet: from 1.1.1.1[4500] to 2.2.2.2[4500] (464 bytes)
7/19/2021 00:19,Info,ubnt01-sec.lan,05[IKE] <peer-2.2.2.2-tunnel-vti|13> retransmit 5 of request with message ID 22
7/19/2021 00:18,Info,ubnt01-sec.lan,12[NET] <peer-2.2.2.2-tunnel-vti|13> sending packet: from 1.1.1.1[4500] to 2.2.2.2[4500] (464 bytes)
7/19/2021 00:18,Info,ubnt01-sec.lan,12[IKE] <peer-2.2.2.2-tunnel-vti|13> retransmit 4 of request with message ID 22
7/19/2021 00:18,Info,ubnt01-sec.lan,08[NET] <peer-2.2.2.2-tunnel-vti|13> sending packet: from 1.1.1.1[4500] to 2.2.2.2[4500] (464 bytes)
7/19/2021 00:18,Info,ubnt01-sec.lan,08[IKE] <peer-2.2.2.2-tunnel-vti|13> retransmit 3 of request with message ID 22
7/19/2021 00:17,Info,ubnt01-sec.lan,02[NET] <peer-2.2.2.2-tunnel-vti|13> sending packet: from 1.1.1.1[4500] to 2.2.2.2[4500] (464 bytes)
7/19/2021 00:17,Info,ubnt01-sec.lan,02[IKE] <peer-2.2.2.2-tunnel-vti|13> retransmit 2 of request with message ID 22
7/19/2021 00:17,Info,ubnt01-sec.lan,11[NET] <peer-2.2.2.2-tunnel-vti|13> sending packet: from 1.1.1.1[4500] to 2.2.2.2[4500] (464 bytes)
7/19/2021 00:17,Info,ubnt01-sec.lan,11[IKE] <peer-2.2.2.2-tunnel-vti|13> retransmit 1 of request with message ID 22
7/19/2021 00:17,Info,ubnt01-sec.lan,14[NET] <peer-2.2.2.2-tunnel-vti|13> sending packet: from 1.1.1.1[4500] to 2.2.2.2[4500] (464 bytes)
7/19/2021 00:17,Info,ubnt01-sec.lan,14[ENC] <peer-2.2.2.2-tunnel-vti|13> generating CREATE_CHILD_SA request 22 [ SA No KE ]
7/19/2021 00:17,Info,ubnt01-sec.lan,14[IKE] <peer-2.2.2.2-tunnel-vti|13> initiating IKE_SA peer-2.2.2.2-tunnel-vti[15] to 2.2.2.2
7/19/2021 00:05,Info,ubnt01-sec.lan,06[IKE] <peer-2.2.2.2-tunnel-vti|13> CHILD_SA closed
7/19/2021 00:05,Info,ubnt01-sec.lan,06[IKE] <peer-2.2.2.2-tunnel-vti|13> received DELETE for ESP CHILD_SA with SPI c999eefe
7/19/2021 00:05,Info,ubnt01-sec.lan,06[ENC] <peer-2.2.2.2-tunnel-vti|13> parsed INFORMATIONAL response 21 [ D ]
7/19/2021 00:05,Info,ubnt01-sec.lan,06[NET] <peer-2.2.2.2-tunnel-vti|13> received packet: from 2.2.2.2[4500] to 1.1.1.1[4500] (80 bytes)
7/19/2021 00:05,Info,ubnt01-sec.lan,07[NET] <peer-2.2.2.2-tunnel-vti|13> sending packet: from 1.1.1.1[4500] to 2.2.2.2[4500] (80 bytes)
7/19/2021 00:05,Info,ubnt01-sec.lan,07[ENC] <peer-2.2.2.2-tunnel-vti|13> generating INFORMATIONAL request 21 [ D ]
7/19/2021 00:05,Info,ubnt01-sec.lan,07[IKE] <peer-2.2.2.2-tunnel-vti|13> sending DELETE for ESP CHILD_SA with SPI c12191de
7/19/2021 00:05,Info,ubnt01-sec.lan,07[IKE] <peer-2.2.2.2-tunnel-vti|13> closing CHILD_SA peer-2.2.2.2-tunnel-vti{134} with SPIs c12191de_i (5289 bytes) c999eefe_o (5289 bytes) and TS 0.0.0.0/0 === 0.0.0.0/0
7/19/2021 00:05,Info,ubnt01-sec.lan,07[IKE] <peer-2.2.2.2-tunnel-vti|13> outbound CHILD_SA peer-2.2.2.2-tunnel-vti{136} established with SPIs cd9504ec_i c4734dd7_o and TS 0.0.0.0/0 === 0.0.0.0/0
7/19/2021 00:05,Info,ubnt01-sec.lan,07[IKE] <peer-2.2.2.2-tunnel-vti|13> inbound CHILD_SA peer-2.2.2.2-tunnel-vti{136} established with SPIs cd9504ec_i c4734dd7_o and TS 0.0.0.0/0 === 0.0.0.0/0
7/19/2021 00:05,Info,ubnt01-sec.lan,07[CFG] <peer-2.2.2.2-tunnel-vti|13> selected proposal: ESP:AES_CBC_256/HMAC_SHA2_256_128/NO_EXT_SEQ
7/19/2021 00:05,Info,ubnt01-sec.lan,07[ENC] <peer-2.2.2.2-tunnel-vti|13> parsed CREATE_CHILD_SA response 20 [ SA No TSi TSr ]
7/19/2021 00:05,Info,ubnt01-sec.lan,07[NET] <peer-2.2.2.2-tunnel-vti|13> received packet: from 2.2.2.2[4500] to 1.1.1.1[4500] (208 bytes)
7/19/2021 00:05,Info,ubnt01-sec.lan,15[NET] <peer-2.2.2.2-tunnel-vti|13> sending packet: from 1.1.1.1[4500] to 2.2.2.2[4500] (288 bytes)
7/19/2021 00:05,Info,ubnt01-sec.lan,15[ENC] <peer-2.2.2.2-tunnel-vti|13> generating CREATE_CHILD_SA request 20 [ N(REKEY_SA) SA No TSi TSr ]
7/19/2021 00:05,Info,ubnt01-sec.lan,15[IKE] <peer-2.2.2.2-tunnel-vti|13> establishing CHILD_SA peer-2.2.2.2-tunnel-vti{136} reqid 2
7/19/2021 00:05,Info,ubnt01-sec.lan,15[KNL] creating rekey job for CHILD_SA ESP/0xc12191de/1.1.1.1
EdgeOS logs (peer)
Jul 19 00:00:39 08[KNL] creating acquire job for policy 192.168.1.1/32[tcp/34697] === 10.10.0.1/32[tcp/bgp] with reqid {2}
Jul 19 00:04:28 14[KNL] creating acquire job for policy 192.168.1.1/32[tcp/33693] === 10.10.0.1/32[tcp/bgp] with reqid {2}
Jul 19 00:05:42 05[IKE] <peer-1.1.1.1-tunnel-vti|11930> inbound CHILD_SA peer-1.1.1.1-tunnel-vti{1521} established with SPIs c4734dd7_i cd9504ec_o and TS 0.0.0.0/0 === 0.0.0.0/0
Jul 19 00:05:43 14[IKE] <peer-1.1.1.1-tunnel-vti|11930> closing CHILD_SA peer-1.1.1.1-tunnel-vti{1519} with SPIs c999eefe_i (5289 bytes) c12191de_o (5289 bytes) and TS 0.0.0.0/0 === 0.0.0.0/0
Jul 19 00:05:43 14[IKE] <peer-1.1.1.1-tunnel-vti|11930> outbound CHILD_SA peer-1.1.1.1-tunnel-vti{1521} established with SPIs c4734dd7_i cd9504ec_o and TS 0.0.0.0/0 === 0.0.0.0/0
Jul 19 00:08:01 08[KNL] creating acquire job for policy 192.168.1.1/32[tcp/40371] === 10.10.0.1/32[tcp/bgp] with reqid {2}
Jul 19 00:11:25 12[KNL] creating acquire job for policy 192.168.1.1/32[tcp/40541] === 10.10.0.1/32[tcp/bgp] with reqid {2}
Jul 19 00:14:24 09[IKE] <peer-1.1.1.1-tunnel-vti|11930> initiating IKE_SA peer-1.1.1.1-tunnel-vti[12071] to 1.1.1.1
Jul 19 00:14:53 06[KNL] creating acquire job for policy 192.168.1.1/32[tcp/45073] === 10.10.0.1/32[tcp/bgp] with reqid {2}
Jul 19 00:17:50 15[KNL] creating acquire job for policy 192.168.1.1/32[tcp/bgp] === 10.1.0.13/32[tcp/42177] with reqid {4}
Jul 19 00:17:50 10[IKE] <peer-1.1.1.1-tunnel-vti|12073> initiating IKE_SA peer-1.1.1.1-tunnel-vti[12073] to 1.1.1.1
Jul 19 00:18:34 06[KNL] creating acquire job for policy 192.168.1.1/32[tcp/42995] === 10.10.0.1/32[tcp/bgp] with reqid {2}
Jul 19 00:20:35 06[KNL] creating delete job for CHILD_SA ESP/0x00000000/1.1.1.1
Jul 19 00:21:54 05[KNL] creating acquire job for policy 192.168.1.1/32[tcp/36997] === 10.1.0.13/32[tcp/bgp] with reqid {4}
Jul 19 00:21:54 05[IKE] <peer-1.1.1.1-tunnel-vti|12075> initiating IKE_SA peer-1.1.1.1-tunnel-vti[12075] to 1.1.1.1
Jul 19 00:22:23 15[KNL] creating acquire job for policy 192.168.1.1/32[tcp/37755] === 10.10.0.1/32[tcp/bgp] with reqid {2}
Jul 19 00:24:39 12[KNL] creating delete job for CHILD_SA ESP/0x00000000/1.1.1.1
Jul 19 00:25:37 04[KNL] creating acquire job for policy 192.168.1.1/32[tcp/44323] === 10.1.0.13/32[tcp/bgp] with reqid {4}
Jul 19 00:25:37 04[IKE] <peer-1.1.1.1-tunnel-vti|12077> initiating IKE_SA peer-1.1.1.1-tunnel-vti[12077] to 1.1.1.1
Jul 19 00:25:50 12[KNL] creating acquire job for policy 192.168.1.1/32[tcp/38955] === 10.10.0.1/32[tcp/bgp] with reqid {2}
Jul 19 00:28:22 09[KNL] creating delete job for CHILD_SA ESP/0x00000000/1.1.1.1
Jul 19 00:29:28 13[KNL] creating acquire job for policy 192.168.1.1/32[tcp/44937] === 10.10.0.1/32[tcp/bgp] with reqid {2}
Jul 19 00:29:31 09[KNL] creating acquire job for policy 192.168.1.1/32[tcp/44945] === 10.1.0.13/32[tcp/bgp] with reqid {4}
Jul 19 00:29:31 07[IKE] <peer-1.1.1.1-tunnel-vti|12079> initiating IKE_SA peer-1.1.1.1-tunnel-vti[12079] to 1.1.1.1
Jul 19 00:32:16 07[KNL] creating delete job for CHILD_SA ESP/0x00000000/1.1.1.1
Jul 19 00:32:16 07[KNL] creating delete job for CHILD_SA ESP/0x00000000/1.1.1.1
Jul 19 00:32:41 05[KNL] creating acquire job for policy 192.168.1.1/32[tcp/41633] === 10.10.0.1/32[tcp/bgp] with reqid {2}
Jul 19 00:33:04 09[KNL] creating acquire job for policy 192.168.1.1/32[tcp/33455] === 10.1.0.13/32[tcp/bgp] with reqid {4}
Jul 19 00:33:04 07[IKE] <peer-1.1.1.1-tunnel-vti|12082> initiating IKE_SA peer-1.1.1.1-tunnel-vti[12082] to 1.1.1.1
Jul 19 00:35:49 13[KNL] creating delete job for CHILD_SA ESP/0x00000000/1.1.1.1
Jul 19 00:36:11 07[KNL] creating acquire job for policy 192.168.1.1/32[tcp/39795] === 10.10.0.1/32[tcp/bgp] with reqid {2}
Jul 19 00:36:29 10[KNL] creating acquire job for policy 192.168.1.1/32[tcp/35383] === 10.1.0.13/32[tcp/bgp] with reqid {4}
Jul 19 00:36:29 06[IKE] <peer-1.1.1.1-tunnel-vti|12083> initiating IKE_SA peer-1.1.1.1-tunnel-vti[12083] to 1.1.1.1
Jul 19 00:39:14 07[KNL] creating delete job for CHILD_SA ESP/0x00000000/1.1.1.1
Jul 19 00:39:36 06[KNL] creating acquire job for policy 192.168.1.1/32[tcp/42913] === 10.10.0.1/32[tcp/bgp] with reqid {2}
Jul 19 00:40:07 14[KNL] creating acquire job for policy 192.168.1.1/32[tcp/33337] === 10.1.0.13/32[tcp/bgp] with reqid {4}
Jul 19 00:40:07 05[IKE] <peer-1.1.1.1-tunnel-vti|12085> initiating IKE_SA peer-1.1.1.1-tunnel-vti[12085] to 1.1.1.1
Jul 19 00:42:52 06[KNL] creating delete job for CHILD_SA ESP/0x00000000/1.1.1.1
Notes
- I’ve got monitoring setup to 2.2.2.2, and certainly have never received a down notification from the endpoint, so I don’t think its actually truly down and not listening.
- EdgeOS logs show it continually trying to create an acquire job for the VPN. I can confirm that even now, many hours after the VPN went down, I can see IKE packets coming in on my WAN interface, and the VyOS device effectively ignores them:
ryanb@ubnt01-sec:~$ sudo tcpdump -vvv -ni eth0 host 2.2.2.2 tcpdump: listening on eth0, link-type EN10MB (Ethernet), capture size 262144 bytes 08:34:50.609267 IP (tos 0x0, ttl 53, id 50840, offset 0, flags [DF], proto UDP (17), length 364) 2.2.2.2.500 > 1.1.1.1.500: [udp sum ok] isakmp 2.0 msgid 00000000 cookie 459e4349059a92c1->0000000000000000: parent_sa ikev2_init[I]: (sa: len=44 (p: #1 protoid=isakmp transform=4 len=44 (t: #1 type=encr id=aes (type=keylen value=0100)) (t: #2 type=integ id=#12 ) (t: #3 type=prf id=#5 ) (t: #4 type=dh id=modp1024 ))) (v2ke: len=128 group=modp1024 681c74c49c11cc9ee42a67b91cd8b3f4fcbdfddfe530bba3276bdc0023556a06fc687659c5bf0922c681c24e15389062b54b597257382eeb26308631b38a56b670d95872ae84d005ae21b266fab6cb7bab685702255a636851d6e60d23ab44f577021ccb803dc272e12c4713e0c7cd7a6beb7ec5cb2129b7bb6f5655da79c3ed) (nonce: len=32 nonce=(0f68cde8bd8dbf317fb897f60261b33fb7bd77c80b9080141f7f2eb133f6f4f0) ) (n: prot_id=#0 type=16388(nat_detection_source_ip)) (n: prot_id=#0 type=16389(nat_detection_destination_ip)) (n: prot_id=#0 type=16430(status)) (n: prot_id=#0 type=16431(status)) (n: prot_id=#0 type=16406(status)) 08:35:32.599670 IP (tos 0x0, ttl 53, id 52672, offset 0, flags [DF], proto UDP (17), length 364) 2.2.2.2.500 > 1.1.1.1.500: [udp sum ok] isakmp 2.0 msgid 00000000 cookie 459e4349059a92c1->0000000000000000: parent_sa ikev2_init[I]: (sa: len=44 (p: #1 protoid=isakmp transform=4 len=44 (t: #1 type=encr id=aes (type=keylen value=0100)) (t: #2 type=integ id=#12 ) (t: #3 type=prf id=#5 ) (t: #4 type=dh id=modp1024 ))) (v2ke: len=128 group=modp1024 681c74c49c11cc9ee42a67b91cd8b3f4fcbdfddfe530bba3276bdc0023556a06fc687659c5bf0922c681c24e15389062b54b597257382eeb26308631b38a56b670d95872ae84d005ae21b266fab6cb7bab685702255a636851d6e60d23ab44f577021ccb803dc272e12c4713e0c7cd7a6beb7ec5cb2129b7bb6f5655da79c3ed) (nonce: len=32 nonce=(0f68cde8bd8dbf317fb897f60261b33fb7bd77c80b9080141f7f2eb133f6f4f0) ) (n: prot_id=#0 type=16388(nat_detection_source_ip)) (n: prot_id=#0 type=16389(nat_detection_destination_ip)) (n: prot_id=#0 type=16430(status)) (n: prot_id=#0 type=16431(status)) (n: prot_id=#0 type=16406(status))
- Even using
clear vpn ipsec-peer
on the EdgeOS side does not ever get anything logged in VyOS. It just ignores the INIT request entirely.
Thoughts on how I can proceed here?