GRE over IPSec sporadic interruptions

Hi,

I am facing a strange sporadic stability issue with one of my GRE over IPSec tunnels. In my environment I have 2 tunnels from my Vyos router (1.1.8) (located behind a NAT) terminating on a Mikrotik router on two different IPs (routed via two different ISPs). On the VyOS side the tunnels terminate on the same IP.

The tunnels usually function 100%. For some reason, at certain seemingly periods, there seems to be some sort of interruption in the tunnel that is routed over ISP2. Immediately afterwards IPSec phase 1 establishes but then it takes ages (e.g. many minutes) for phase 2 to establish. When I manually bring down the tunnels in many different ways, both phases always reestablish immediately.

I want to blame ISP2 but I during the time when the issue occurs, the IPs that the IPSec terminates on is reachable from both sides, no changes in trace route, P1 is up etc.

Below my configuration and logs. Any idea why this could be occurring and perhaps how to resolve it?

interfaces {
dummy dum1 {
    address 169.254.200.2/32
    description "ISP1"
}
dummy dum2 {
    address 169.254.200.6/32
    description "ISP2"
}

tunnel tun1 {
address 169.254.100.106/30
description "GRE ISP1"
encapsulation gre
local-ip 169.254.200.2
multicast disable
remote-ip 169.254.200.1
}
tunnel tun2 {
address 169.254.100.110/30
description "GRE ISP2"
encapsulation gre
local-ip 169.254.200.6
multicast disable
remote-ip 169.254.200.5
}

esp-group ESP-GROUP {
        compression disable
        lifetime 3600
        mode tunnel
        pfs enable
        proposal 1 {
            encryption aes128
            hash sha1
        }
    }

ike-group IKE-GROUP {
        dead-peer-detection {
            action restart
            interval 15
            timeout 30
        }
        ikev2-reauth no
        key-exchange ikev1
        lifetime 28800
        proposal 1 {
            dh-group 2
            encryption aes128
            hash sha1
        }
    }

peer 12.28.12.33 {
authentication {
mode pre-shared-secret
pre-shared-secret ****************
}
connection-type initiate
default-esp-group ESP-GROUP
description "ISP1"
ike-group IKE-GROUP
ikev2-reauth inherit
local-address 172.16.2.22
tunnel 1 {
    allow-nat-networks disable
    allow-public-networks disable
    local {
        prefix 169.254.200.2/32
        }
    protocol gre
    remote {
    prefix 169.254.200.1/32
    }
}
}
	
peer 19.17.32.65 {
authentication {
mode pre-shared-secret
pre-shared-secret ****************
}
connection-type initiate
default-esp-group ESP-GROUP
description "ISP2"
ike-group IKE-GROUP
ikev2-reauth inherit
local-address 172.16.2.22
tunnel 1 {
    allow-nat-networks disable
    allow-public-networks disable
    local {
        prefix 169.254.200.6/32
        }
    protocol gre
    remote {
    prefix 169.254.200.5/32
    }
}
}

Jan 17 02:02:34 vr1 pluto[3079]: packet from 19.17.32.65:500: ignoring Vendor ID payload [Cisco-Unity]
Jan 17 02:02:34 vr1 pluto[3079]: packet from 19.17.32.65:500: received Vendor ID payload [Dead Peer Detection]
Jan 17 02:02:34 vr1 pluto[3079]: “peer-19.17.32.65-tunnel-1” #24187: responding to Main Mode
Jan 17 02:02:35 vr1 pluto[3079]: “peer-19.17.32.65-tunnel-1” #24187: Peer ID is ID_IPV4_ADDR: ‘19.17.32.65’
Jan 17 02:02:35 vr1 pluto[3079]: “peer-19.17.32.65-tunnel-1” #24187: sent MR3, ISAKMP SA established
Jan 17 02:02:35 vr1 pluto[3079]: “peer-19.17.32.65-tunnel-1” #24187: cannot respond to IPsec SA request because no connection is known for 169.254.200.6/32===172.16.2.22[172.16.2.22]…19.17.32.65[19.17.32.65]===169.254.200.5/32
Jan 17 02:02:35 vr1 pluto[3079]: “peer-19.17.32.65-tunnel-1” #24187: sending encrypted notification INVALID_ID_INFORMATION to 19.17.32.65:500
Jan 17 02:02:45 vr1 pluto[3079]: “peer-19.17.32.65-tunnel-1” #24187: Quick Mode I1 message is unacceptable because it uses a previously used Message ID 0x1266518e (perhaps this is a duplicated packet)
Jan 17 02:02:45 vr1 pluto[3079]: “peer-19.17.32.65-tunnel-1” #24187: sending encrypted notification INVALID_MESSAGE_ID to 19.17.32.65:500
Jan 17 02:02:55 vr1 pluto[3079]: “peer-19.17.32.65-tunnel-1” #24187: Quick Mode I1 message is unacceptable because it uses a previously used Message ID 0x1266518e (perhaps this is a duplicated packet)
Jan 17 02:02:55 vr1 pluto[3079]: “peer-19.17.32.65-tunnel-1” #24187: sending encrypted notification INVALID_MESSAGE_ID to 19.17.32.65:500
Jan 17 02:03:06 vr1 pluto[3079]: “peer-19.17.32.65-tunnel-1” #24187: cannot respond to IPsec SA request because no connection is known for 169.254.200.6/32===172.16.2.22[172.16.2.22]…19.17.32.65[19.17.32.65]===169.254.200.5/32
Jan 17 02:03:06 vr1 pluto[3079]: “peer-19.17.32.65-tunnel-1” #24187: sending encrypted notification INVALID_ID_INFORMATION to 19.17.32.65:500
Jan 17 02:03:16 vr1 pluto[3079]: “peer-19.17.32.65-tunnel-1” #24187: Quick Mode I1 message is unacceptable because it uses a previously used Message ID 0xd5302499 (perhaps this is a duplicated packet)
Jan 17 02:03:16 vr1 pluto[3079]: “peer-19.17.32.65-tunnel-1” #24187: sending encrypted notification INVALID_MESSAGE_ID to 19.17.32.65:500
Jan 17 02:03:26 vr1 pluto[3079]: “peer-19.17.32.65-tunnel-1” #24187: Quick Mode I1 message is unacceptable because it uses a previously used Message ID 0xd5302499 (perhaps this is a duplicated packet)
Jan 17 02:03:26 vr1 pluto[3079]: “peer-19.17.32.65-tunnel-1” #24187: sending encrypted notification INVALID_MESSAGE_ID to 19.17.32.65:500
Jan 17 02:03:39 vr1 pluto[3079]: “peer-19.17.32.65-tunnel-1” #24187: cannot respond to IPsec SA request because no connection is known for 169.254.200.6/32===172.16.2.22[172.16.2.22]…19.17.32.65[19.17.32.65]===169.254.200.5/32
Jan 17 02:03:39 vr1 pluto[3079]: “peer-19.17.32.65-tunnel-1” #24187: sending encrypted notification INVALID_ID_INFORMATION to 19.17.32.65:500
Jan 17 02:03:49 vr1 pluto[3079]: “peer-19.17.32.65-tunnel-1” #24187: Quick Mode I1 message is unacceptable because it uses a previously used Message ID 0x2c2c06d0 (perhaps this is a duplicated packet)
Jan 17 02:03:49 vr1 pluto[3079]: “peer-19.17.32.65-tunnel-1” #24187: sending encrypted notification INVALID_MESSAGE_ID to 19.17.32.65:500
Jan 17 02:03:59 vr1 pluto[3079]: “peer-19.17.32.65-tunnel-1” #24187: Quick Mode I1 message is unacceptable because it uses a previously used Message ID 0x2c2c06d0 (perhaps this is a duplicated packet)
Jan 17 02:03:59 vr1 pluto[3079]: “peer-19.17.32.65-tunnel-1” #24187: sending encrypted notification INVALID_MESSAGE_ID to 19.17.32.65:500
Jan 17 02:04:11 vr1 pluto[3079]: “peer-19.17.32.65-tunnel-1” #24187: cannot respond to IPsec SA request because no connection is known for 169.254.200.6/32===172.16.2.22[172.16.2.22]…19.17.32.65[19.17.32.65]===169.254.200.5/32
Jan 17 02:04:11 vr1 pluto[3079]: “peer-19.17.32.65-tunnel-1” #24187: sending encrypted notification INVALID_ID_INFORMATION to 19.17.32.65:500
Jan 17 02:04:21 vr1 pluto[3079]: “peer-19.17.32.65-tunnel-1” #24187: Quick Mode I1 message is unacceptable because it uses a previously used Message ID 0x1885828d (perhaps this is a duplicated packet)
Jan 17 02:04:21 vr1 pluto[3079]: “peer-19.17.32.65-tunnel-1” #24187: sending encrypted notification INVALID_MESSAGE_ID to 19.17.32.65:500
Jan 17 02:04:31 vr1 pluto[3079]: “peer-19.17.32.65-tunnel-1” #24187: Quick Mode I1 message is unacceptable because it uses a previously used Message ID 0x1885828d (perhaps this is a duplicated packet)
Jan 17 02:04:31 vr1 pluto[3079]: “peer-19.17.32.65-tunnel-1” #24187: sending encrypted notification INVALID_MESSAGE_ID to 19.17.32.65:500
Jan 17 02:04:44 vr1 pluto[3079]: “peer-19.17.32.65-tunnel-1” #24187: cannot respond to IPsec SA request because no connection is known for 169.254.200.6/32===172.16.2.22[172.16.2.22]…19.17.32.65[19.17.32.65]===169.254.200.5/32
Jan 17 02:04:44 vr1 pluto[3079]: “peer-19.17.32.65-tunnel-1” #24187: sending encrypted notification INVALID_ID_INFORMATION to 19.17.32.65:500
Jan 17 02:04:54 vr1 pluto[3079]: “peer-19.17.32.65-tunnel-1” #24187: Quick Mode I1 message is unacceptable because it uses a previously used Message ID 0xb83bd39b (perhaps this is a duplicated packet)
Jan 17 02:04:54 vr1 pluto[3079]: “peer-19.17.32.65-tunnel-1” #24187: sending encrypted notification INVALID_MESSAGE_ID to 19.17.32.65:500
Jan 17 02:05:04 vr1 pluto[3079]: “peer-19.17.32.65-tunnel-1” #24187: Quick Mode I1 message is unacceptable because it uses a previously used Message ID 0xb83bd39b (perhaps this is a duplicated packet)

log trimmed for repeated message

Jan 17 02:20:11 vr1 pluto[3079]: “peer-19.17.32.65-tunnel-1” #24187: Quick Mode I1 message is unacceptable because it uses a previously used Message ID 0xab0d37f0 (perhaps this is a duplicated packet)
Jan 17 02:20:11 vr1 pluto[3079]: “peer-19.17.32.65-tunnel-1” #24187: sending encrypted notification INVALID_MESSAGE_ID to 19.17.32.65:500
Jan 17 02:20:21 vr1 pluto[3079]: “peer-19.17.32.65-tunnel-1” #24187: Quick Mode I1 message is unacceptable because it uses a previously used Message ID 0xab0d37f0 (perhaps this is a duplicated packet)
Jan 17 02:20:21 vr1 pluto[3079]: “peer-19.17.32.65-tunnel-1” #24187: sending encrypted notification INVALID_MESSAGE_ID to 19.17.32.65:500
Jan 17 02:20:34 vr1 pluto[3079]: “peer-19.17.32.65-tunnel-1” #24187: cannot respond to IPsec SA request because no connection is known for 169.254.200.6/32===172.16.2.22[172.16.2.22]…19.17.32.65[19.17.32.65]===169.254.200.5/32
Jan 17 02:20:34 vr1 pluto[3079]: “peer-19.17.32.65-tunnel-1” #24187: sending encrypted notification INVALID_ID_INFORMATION to 19.17.32.65:500
Jan 17 02:20:44 vr1 pluto[3079]: “peer-19.17.32.65-tunnel-1” #24187: Quick Mode I1 message is unacceptable because it uses a previously used Message ID 0x1b32ced9 (perhaps this is a duplicated packet)
Jan 17 02:20:44 vr1 pluto[3079]: “peer-19.17.32.65-tunnel-1” #24187: sending encrypted notification INVALID_MESSAGE_ID to 19.17.32.65:500
Jan 17 02:20:54 vr1 pluto[3079]: “peer-19.17.32.65-tunnel-1” #24187: Quick Mode I1 message is unacceptable because it uses a previously used Message ID 0x1b32ced9 (perhaps this is a duplicated packet)
Jan 17 02:20:54 vr1 pluto[3079]: “peer-19.17.32.65-tunnel-1” #24187: sending encrypted notification INVALID_MESSAGE_ID to 19.17.32.65:500
Jan 17 02:21:06 vr1 pluto[3079]: “peer-19.17.32.65-tunnel-1” #24187: cannot respond to IPsec SA request because no connection is known for 169.254.200.6/32===172.16.2.22[172.16.2.22]…19.17.32.65[19.17.32.65]===169.254.200.5/32
Jan 17 02:21:06 vr1 pluto[3079]: “peer-19.17.32.65-tunnel-1” #24187: sending encrypted notification INVALID_ID_INFORMATION to 19.17.32.65:500
Jan 17 02:21:09 vr1 pluto[3079]: “peer-19.17.32.65-tunnel-1” #24190: initiating Quick Mode PSK+ENCRYPT+TUNNEL+PFS+UP to replace #24181 {using isakmp#24187}
Jan 17 02:21:10 vr1 pluto[3079]: “peer-19.17.32.65-tunnel-1” #24190: Dead Peer Detection (RFC 3706) enabled
Jan 17 02:21:10 vr1 pluto[3079]: “peer-19.17.32.65-tunnel-1” #24190: sent QI2, IPsec SA established {ESP=>0x00d643f4 <0xc9b1e94b}

Hi!

Am i correct if i assume both tunnels are between the two same devices?

In that case:
you say that you on one device are using two ip’s, one ip on each ISP. on the other device you are using only one ip? am i correct?
in that case i think you would have routing issues on the device that has two ip’s.
The reason for this is that ONE ip, can only be routed out ONE interface on the vyos device.
if you have multiple routes to that destination in the routing-table you are actually loadbalance that traffic out of that device.
This can create strange issues when f.ex a tunnel is bound to the source ip of interface1 but the packet is traversing out of interface2.
To make this work you need to ensure that a packet with source ip X is traversing out the correct path with a policy route where you hardcode what interface and nexthop is going to be used on that packet.
Another solution is to use two ip’s on both devices, then you route each destination ip the correct path on both devices… :slight_smile:

Hi runar,

Thanks for the feedback. You are correct in that the VyOS device has one public IP (eth0) (behind NAT) and on the Mikrotik router there are two public IPs assigned to two different interfaces. But please note that the public IPs are only used to terminate the IPSec on. As per the config, GRE is then encapsulate traffic across the IPSec tunnels and this gives me individual / unique GRE interfaces on both routers (one per tunnel) over which I exchange private BGP routes. The BGP routes will thus route to a specific GRE interface. On the IPSec side, I don’t see the issue with terminating multiple IPSec tunnels on one interface (eth0). Unless I am misunderstanding you?

Thanks again

I am talking about issues on the device with two interfaces. let me try to explain:

device1 int1 - 1.1.1.1 (isp1)
device1 int2 - 2.2.2.2 (isp2)
device2 int = 3.3.3.3

on device1, you configure two ipsec tunnels with destination of 3.3.3.3 on both tunnels because thats the only ip the other device have.
you also configure one tunnel with source 1.1.1.1 and the other one with source 2.2.2.2
now, to reach 3.3.3.3 you need a routing entry in your routing table, if you point that route to isp1, all your traffic will go out the interface towards isp1, and vise versa on the other isp… if you create a route towards 3.3.3.3 pointing out both interfaces (isp1 and isp2) you essentially create a loadbalancer, and traffic could pass out both interfaces.
Inn the two first cases traffic destined for the interface of the other isp (the one that not have the 3.3.3.3 route) will not function, because it comes from a source not owned by that isp. on the last (route to 3.3.3.3 out both isp’s) you will be in a situation that depending on where the loadbalancing algoritm figures out it will work, migth work or will not work because traffic for both interfaces actually could go out both interfaces without you knowing it.

anyways, to make this work you need to do policy routing so you specify that traffic from 1.1.1.1 to 3.3.3.3 MUST exit trough the interface towards isp1. and the same for 2.2.2.2 and isp2. this has to be done on the mikrotik device but i’ve never used their devices so i cant say how its done…

Everything i talk about here is the transport layer under the ipsec/gre tunnels, and not traffic inside the tunnel. thats another story… :slight_smile:

Thanks and I understand what you are saying. I do have these policies (on Mikrotik its referred to as mangle rules) in place to ensure the traffic for IPSec ISP1 routes out to the ISP1 interface and the same for ISP2. I have confirmed this by looking at the packet captures and I can confirm that the traffic is routing out the correct interfaces.

i will advise to start testing on 1.2 rollings
as we not going to fix it in 1.1.x as it’s already EOLed

Hi, can you perhaps tell me when 1.2 will be released to the AWS Marketplace?

Hello,
we expect new listing be live this week
Working with AWS marketplace team on that

Cool, thank you
Cool, thank you