Site-to-Site VPN via VTI flapping, where is the problem?

Hi,

I am having a problem running site to site VPN over VTI interfaces between two datacenters.
It seems the connection will regularly drop out and re-establish after a few minutes. This happens roughly 2-6 times per hour.
We’re using BGP to route between sites, neighbors are configured as the far side VTI IP.

I’m wondering if there is anything I’m missing that is needed to establish a reliable tunnel.

Here is the relevant config, with public IPs changed

Site V:

[code]interfaces {
ethernet eth0 {
address 1.2.3.204/27
}
vti vti100 {
address 10.1.64.1/28
mtu 1436
}
}

vpn {
ipsec {
esp-group my-esp {
compression disable
lifetime 3600
mode tunnel
pfs enable
proposal 1 {
encryption aes128
hash sha1
}
}
ike-group my-ike {
dead-peer-detection {
action restart
interval 15
timeout 30
}
ikev2-reauth no
key-exchange ikev1
lifetime 28800
proposal 1 {
dh-group 2
encryption aes128
hash sha1
}
}
ipsec-interfaces {
interface eth0
}
site-to-site {
peer 3.2.1.213 {
authentication {
mode pre-shared-secret
pre-shared-secret ****************
}
connection-type initiate
description “Site to Site VPN”
ike-group my-ike
local-address 1.2.3.204
vti {
bind vti100
esp-group my-esp
}
}[/code]

Site T

interfaces { ethernet eth0 { address 3.2.1.213/28 } vti vti101 { address 10.1.64.66/28 disable mtu 1436 } } vpn { ipsec { esp-group my-esp { compression disable lifetime 3600 mode tunnel pfs enable proposal 1 { encryption aes128 hash sha1 } } ike-group my-ike { dead-peer-detection { action restart interval 15 timeout 30 } ikev2-reauth no key-exchange ikev1 lifetime 28800 proposal 1 { dh-group 2 encryption aes128 hash sha1 } } ipsec-interfaces { interface eth0 } site-to-site { peer 1.2.3.204 { authentication { mode pre-shared-secret pre-shared-secret **************** } connection-type initiate description "Site to Site VPN" ike-group my-ike ikev2-reauth inherit local-address 3.2.1.213 vti { bind vti100 esp-group my-esp } } }

With the above configurations, I see this in the logs at site T:

Jul 3 22:09:07 T-VYOS01 pluto[7696]: "peer-1.2.3.204-tunnel-vti" #665: replacing stale IPsec SA Jul 3 22:09:07 T-VYOS01 pluto[7696]: "peer-1.2.3.204-tunnel-vti" #667: initiating Quick Mode PSK+ENCRYPT+TUNNEL+PFS+UP to replace #665 {using isakmp#664} Jul 3 22:10:17 T-VYOS01 pluto[7696]: "peer-1.2.3.204-tunnel-vti" #667: max number of retransmissions (2) reached STATE_QUICK_I1 Jul 3 22:10:17 T-VYOS01 pluto[7696]: "peer-1.2.3.204-tunnel-vti" #667: starting keying attempt 2 of an unlimited number Jul 3 22:10:17 T-VYOS01 pluto[7696]: "peer-1.2.3.204-tunnel-vti" #668: initiating Quick Mode PSK+ENCRYPT+TUNNEL+PFS+UP to replace #667 {using isakmp#664} Jul 3 22:11:27 T-VYOS01 pluto[7696]: "peer-1.2.3.204-tunnel-vti" #668: max number of retransmissions (2) reached STATE_QUICK_I1 Jul 3 22:11:27 T-VYOS01 pluto[7696]: "peer-1.2.3.204-tunnel-vti" #668: starting keying attempt 3 of an unlimited number Jul 3 22:11:27 T-VYOS01 pluto[7696]: "peer-1.2.3.204-tunnel-vti" #669: initiating Quick Mode PSK+ENCRYPT+TUNNEL+PFS+UP to replace #668 {using isakmp#664} Jul 3 22:12:37 T-VYOS01 pluto[7696]: "peer-1.2.3.204-tunnel-vti" #669: max number of retransmissions (2) reached STATE_QUICK_I1 Jul 3 22:12:37 T-VYOS01 pluto[7696]: "peer-1.2.3.204-tunnel-vti" #669: starting keying attempt 4 of an unlimited number Jul 3 22:12:37 T-VYOS01 pluto[7696]: "peer-1.2.3.204-tunnel-vti" #670: initiating Quick Mode PSK+ENCRYPT+TUNNEL+PFS+UP to replace #669 {using isakmp#664} Jul 3 22:13:47 T-VYOS01 pluto[7696]: "peer-1.2.3.204-tunnel-vti" #670: max number of retransmissions (2) reached STATE_QUICK_I1 Jul 3 22:13:47 T-VYOS01 pluto[7696]: "peer-1.2.3.204-tunnel-vti" #670: starting keying attempt 5 of an unlimited number Jul 3 22:13:47 T-VYOS01 pluto[7696]: "peer-1.2.3.204-tunnel-vti" #671: initiating Quick Mode PSK+ENCRYPT+TUNNEL+PFS+UP to replace #670 {using isakmp#664} Jul 3 22:14:57 T-VYOS01 pluto[7696]: "peer-1.2.3.204-tunnel-vti" #671: max number of retransmissions (2) reached STATE_QUICK_I1 Jul 3 22:14:57 T-VYOS01 pluto[7696]: "peer-1.2.3.204-tunnel-vti" #671: starting keying attempt 6 of an unlimited number Jul 3 22:14:57 T-VYOS01 pluto[7696]: "peer-1.2.3.204-tunnel-vti" #672: initiating Quick Mode PSK+ENCRYPT+TUNNEL+PFS+UP to replace #671 {using isakmp#664} Jul 3 22:16:07 T-VYOS01 pluto[7696]: "peer-1.2.3.204-tunnel-vti" #672: max number of retransmissions (2) reached STATE_QUICK_I1 Jul 3 22:16:07 T-VYOS01 pluto[7696]: "peer-1.2.3.204-tunnel-vti" #672: starting keying attempt 7 of an unlimited number Jul 3 22:16:07 T-VYOS01 pluto[7696]: "peer-1.2.3.204-tunnel-vti" #673: initiating Quick Mode PSK+ENCRYPT+TUNNEL+PFS+UP to replace #672 {using isakmp#664} Jul 3 22:17:17 T-VYOS01 pluto[7696]: "peer-1.2.3.204-tunnel-vti" #673: max number of retransmissions (2) reached STATE_QUICK_I1 Jul 3 22:17:17 T-VYOS01 pluto[7696]: "peer-1.2.3.204-tunnel-vti" #673: starting keying attempt 8 of an unlimited number Jul 3 22:17:17 T-VYOS01 pluto[7696]: "peer-1.2.3.204-tunnel-vti" #674: initiating Quick Mode PSK+ENCRYPT+TUNNEL+PFS+UP to replace #673 {using isakmp#664} Jul 3 22:18:27 T-VYOS01 pluto[7696]: "peer-1.2.3.204-tunnel-vti" #674: max number of retransmissions (2) reached STATE_QUICK_I1 Jul 3 22:18:27 T-VYOS01 pluto[7696]: "peer-1.2.3.204-tunnel-vti" #674: starting keying attempt 9 of an unlimited number Jul 3 22:18:27 T-VYOS01 pluto[7696]: "peer-1.2.3.204-tunnel-vti" #675: initiating Quick Mode PSK+ENCRYPT+TUNNEL+PFS+UP to replace #674 {using isakmp#664} Jul 3 22:18:52 T-VYOS01 pluto[7696]: "peer-1.2.3.204-tunnel-vti" #676: responding to Quick Mode Jul 3 22:18:52 T-VYOS01 pluto[7696]: "peer-1.2.3.204-tunnel-vti" #676: Dead Peer Detection (RFC 3706) enabled Jul 3 22:18:52 T-VYOS01 pluto[7696]: "peer-1.2.3.204-tunnel-vti" #676: IPsec SA established {ESP=>0xc73099e6 <0xcf098a15} Jul 3 22:18:57 T-VYOS01 pluto[7696]: "peer-1.2.3.204-tunnel-vti" #675: Dead Peer Detection (RFC 3706) enabled Jul 3 22:18:57 T-VYOS01 pluto[7696]: "peer-1.2.3.204-tunnel-vti" #675: sent QI2, IPsec SA established {ESP=>0xc3cf8d38 <0xc8a2f149} Jul 3 22:23:22 T-VYOS01 pluto[7696]: "peer-1.2.3.204-tunnel-vti" #665: IPsec SA expired (superseded by #675) Jul 3 22:23:22 T-VYOS01 pluto[7696]: "peer-1.2.3.204-tunnel-vti" #664: ignoring Delete SA payload: PROTO_IPSEC_ESP SA(0xc7e23901) not found (maybe expired)

Now, If I add

vpn { ipsec { auto-update 60
That seems to stabilize things… still drops every hour or two but definitely less frequently.
However, that seems like kind of an ugly way of improving things, and I don’t think it should be necessary.
It also drops a lot of this in the logs:

Jul 4 20:56:45 T-VYOS01 pluto[28022]: loading secrets from "/etc/ipsec.secrets" Jul 4 20:56:45 T-VYOS01 pluto[28022]: loaded PSK secret for 10.1.64.50 10.1.64.49 Jul 4 20:56:45 T-VYOS01 pluto[28022]: loaded PSK secret for 3.2.1.213 1.2.3.204 Jul 4 20:56:45 T-VYOS01 pluto[28022]: loading secrets from "/etc/dmvpn.secrets" Jul 4 20:56:45 T-VYOS01 pluto[28022]: Changing to directory '/etc/ipsec.d/crls' Jul 4 20:57:45 T-VYOS01 pluto[28022]: forgetting secrets Jul 4 20:57:45 T-VYOS01 pluto[28022]: loading secrets from "/etc/ipsec.secrets" Jul 4 20:57:45 T-VYOS01 pluto[28022]: loaded PSK secret for 10.1.64.50 10.1.64.49 Jul 4 20:57:45 T-VYOS01 pluto[28022]: loaded PSK secret for 3.2.1.213 1.2.3.204 Jul 4 20:57:45 T-VYOS01 pluto[28022]: loading secrets from "/etc/dmvpn.secrets" Jul 4 20:57:45 T-VYOS01 pluto[28022]: Changing to directory '/etc/ipsec.d/crls' Jul 4 20:58:45 T-VYOS01 pluto[28022]: forgetting secrets Jul 4 20:58:45 T-VYOS01 pluto[28022]: loading secrets from "/etc/ipsec.secrets" Jul 4 20:58:45 T-VYOS01 pluto[28022]: loaded PSK secret for 10.1.64.50 10.1.64.49 Jul 4 20:58:45 T-VYOS01 pluto[28022]: loaded PSK secret for 3.2.1.213 1.2.3.204 Jul 4 20:58:45 T-VYOS01 pluto[28022]: loading secrets from "/etc/dmvpn.secrets" Jul 4 20:58:45 T-VYOS01 pluto[28022]: Changing to directory '/etc/ipsec.d/crls'

Any thoughts or insights as to where the problem might be? Or what needs to be changed?