DMVPN added DPD restart and still does not work

I tested DMVPN according to the document and it worked normally for the first time. But after the HUB restarts, the DMVPN network cannot continue to be used normally. I need to restart SPOKE to restore them normally. Checking the information shows that it can be solved after adding the DPD of IKE, but the problem still exists after I tested it.

I tested it in EVE-NG, and the vyos version is as follows:

vyos@vyos# run show version

Version:          VyOS 1.3-rolling-202009030118
Release Train:    equuleus

Built by:         autobuild@vyos.net
Built on:         Thu 03 Sep 2020 01:18 UTC
Build UUID:       83c8515b-213b-4ac6-9b9a-2a0f24bbbda7
Build Commit ID:  221fd153830307

Architecture:     x86_64
Boot via:         installed image
System type:      KVM guest

Hardware vendor:  QEMU
Hardware model:   Standard PC (i440FX + PIIX, 1996)
Hardware S/N:     Unknown
Hardware UUID:    Unknown

Copyright:        VyOS maintainers and contributors

The DPD add command is as follows:

set vpn ipsec ike-group IKE-SPOKE dead-peer-detection action restart
set vpn ipsec ike-group IKE-SPOKE dead-peer-detection interval 3
set vpn ipsec ike-group IKE-SPOKE dead-peer-detection timeout 3

Hello @toadzhou, DPD works for DMVPN in our lab. Can you describe reproducing steps?

My steps are basically the same as the document, I just used OSPF instead of static routing. Below is my configuration file

DMVPN-HUB

vyos@vyos:~$ show configuration
interfaces {
    ethernet eth0 {
        address 10.10.1.2/24
        hw-id 50:00:00:04:00:00
    }
    ethernet eth1 {
        address 192.168.1.1/24
        hw-id 50:00:00:04:00:01
    }
    ethernet eth2 {
        hw-id 50:00:00:04:00:02
    }
    ethernet eth3 {
        hw-id 50:00:00:04:00:03
    }
    loopback lo {
        address 7.7.7.1/32
    }
    tunnel tun0 {
        address 9.9.9.1/24
        encapsulation gre
        local-ip 10.10.1.2
        multicast enable
        parameters {
            ip {
                key ****************
            }
        }
    }
}
nat {
    source {
        rule 100 {
            outbound-interface eth0
            source {
                address 192.168.1.0/24
            }
            translation {
                address masquerade
            }
        }
    }
}
policy {
    route-map CONNECT {
        rule 10 {
            action permit
            match {
                interface lo
            }
        }
    }
}
protocols {
    nhrp {
        tunnel tun0 {
            cisco-authentication fuckall
            holding-time 300
            multicast dynamic
            redirect
        }
    }
    ospf {
        area 0 {
            network 192.168.1.0/24
            network 9.9.9.0/24
        }
        default-information {
            originate {
                always
                metric 10
                metric-type 2
            }
        }
        log-adjacency-changes {
        }
        parameters {
            abr-type cisco
            router-id 7.7.7.1
        }
        redistribute {
            connected {
                metric-type 2
                route-map CONNECT
            }
        }
    }
    static {
        route 0.0.0.0/0 {
            next-hop 10.10.1.1 {
            }
        }
    }
}
system {
    config-management {
        commit-revisions 100
    }
    console {
        device ttyS0 {
            speed 115200
        }
    }
    host-name vyos
    login {
        user vyos {
            authentication {
                encrypted-password ****************
                plaintext-password ****************
            }
        }
    }
    ntp {
        server 0.pool.ntp.org {
        }
        server 1.pool.ntp.org {
        }
        server 2.pool.ntp.org {
        }
    }
    syslog {
        global {
            facility all {
                level info
            }
            facility protocols {
                level debug
            }
        }
    }
}
vpn {
    ipsec {
        esp-group ESP-HUB {
            compression disable
            lifetime 1800
            mode tunnel
            pfs dh-group2
            proposal 1 {
                encryption aes256
                hash sha1
            }
            proposal 2 {
                encryption 3des
                hash md5
            }
        }
        ike-group IKE-HUB {
            close-action none
            ikev2-reauth no
            key-exchange ikev1
            lifetime 3600
            proposal 1 {
                dh-group 2
                encryption aes256
                hash sha1
            }
            proposal 2 {
                dh-group 2
                encryption aes128
                hash sha1
            }
        }
        ipsec-interfaces {
            interface eth0
        }
        profile NHRPVPN {
            authentication {
                mode pre-shared-secret
                pre-shared-secret ****************
            }
            bind {
                tunnel tun0
            }
            esp-group ESP-HUB
            ike-group IKE-HUB
        }
    }
}

SPOOKE1

vyos@vyos:~$ show configuration
interfaces {
    ethernet eth0 {
        address 10.10.2.2/24
        hw-id 50:00:00:05:00:00
    }
    ethernet eth1 {
        address 192.168.2.1/24
        hw-id 50:00:00:05:00:01
    }
    ethernet eth2 {
        hw-id 50:00:00:05:00:02
    }
    ethernet eth3 {
        hw-id 50:00:00:05:00:03
    }
    loopback lo {
        address 7.7.7.2/32
    }
    tunnel tun0 {
        address 9.9.9.2/24
        encapsulation gre
        local-ip 0.0.0.0
        multicast enable
        parameters {
            ip {
                key ****************
            }
        }
    }
}
nat {
    source {
        rule 100 {
            outbound-interface eth0
            source {
                address 192.168.2.0/24
            }
            translation {
                address masquerade
            }
        }
    }
}
policy {
    route-map CONNECT {
        rule 10 {
            action permit
            match {
                interface lo
            }
        }
    }
}
protocols {
    nhrp {
        tunnel tun0 {
            cisco-authentication fuckall
            map 9.9.9.1/24 {
                nbma-address 10.10.1.2
                register
            }
            multicast nhs
            redirect
            shortcut
        }
    }
    ospf {
        area 0 {
            network 192.168.2.0/24
            network 9.9.9.0/24
        }
        default-information {
            originate {
                always
                metric 10
                metric-type 2
            }
        }
        log-adjacency-changes {
        }
        parameters {
            abr-type cisco
            router-id 7.7.7.2
        }
        redistribute {
            connected {
                metric-type 2
                route-map CONNECT
            }
        }
    }
    static {
        route 0.0.0.0/0 {
            next-hop 10.10.2.1 {
            }
        }
    }
}
service {
    dhcp-server {
        shared-network-name LAN {
            subnet 192.168.2.0/24 {
                default-router 192.168.2.1
                dns-server 192.168.2.1
                domain-name internal-network
                lease 86400
                range 0 {
                    start 192.168.2.9
                    stop 192.168.2.254
                }
            }
        }
    }
    dns {
        forwarding {
            allow-from 192.168.2.0/24
            cache-size 0
            listen-address 192.168.2.1
        }
    }
}
system {
    config-management {
        commit-revisions 100
    }
    console {
        device ttyS0 {
            speed 115200
        }
    }
    host-name vyos
    login {
        user vyos {
            authentication {
                encrypted-password ****************
                plaintext-password ****************
            }
        }
    }
    ntp {
        server 0.pool.ntp.org {
        }
        server 1.pool.ntp.org {
        }
        server 2.pool.ntp.org {
        }
    }
    syslog {
        global {
            facility all {
                level info
            }
            facility protocols {
                level debug
            }
        }
    }
}
vpn {
    ipsec {
        esp-group ESP-SPOKE {
            compression disable
            lifetime 1800
            mode tunnel
            pfs dh-group2
            proposal 1 {
                encryption aes256
                hash sha1
            }
            proposal 2 {
                encryption 3des
                hash md5
            }
        }
        ike-group IKE-SPOKE {
            close-action none
            dead-peer-detection {
                action restart
                interval 15
                timeout 30
            }
            ikev2-reauth no
            key-exchange ikev1
            lifetime 3600
            proposal 1 {
                dh-group 2
                encryption aes256
                hash sha1
            }
            proposal 2 {
                dh-group 2
                encryption aes128
                hash sha1
            }
        }
        ipsec-interfaces {
            interface eth0
        }
        profile NHRPVPN {
            authentication {
                mode pre-shared-secret
                pre-shared-secret ****************
            }
            bind {
                tunnel tun0
            }
            esp-group ESP-SPOKE
            ike-group IKE-SPOKE
        }
    }
}

Hello @toadzhou, I think in your case will be helpful reducing nhrp hold-time

set protocols nhrp tunnel tun0 holding-time '30'

In my lab’s topology, I see that after restart hub - IPSec connection initializing, but NHRP thinking that the peer stil alive.

Hi,@Dmitry
This parameter does not solve the substantive problem. After the HUB is restarted or disconnected and restored, DMVPN still cannot be restored normally.

Can you before restarting nhrp service on SPOKE provide an output of the commands

show nhrp tunnel 
ip neigh show
show vpn ipsec sa

and then just try to restart nhrp

sudo /etc/init.d/opennhrp.init restart

Also when you increase log level you will see nhrp debug output

configure
set system syslog global facility all level all
commit
run show log tail 300

Restarting the service still cannot be used normally, but I restarted the vyos system and it was normal.

HUB

vyos@vyos:~$ show nhrp tunnel
Status: ok

Interface: tun0
Type: local
Protocol-Address: 9.9.9.255/32
Alias-Address: 9.9.9.1
Flags: up

Interface: tun0
Type: local
Protocol-Address: 9.9.9.1/32
Flags: up

Interface: tun0
Type: dynamic
Protocol-Address: 9.9.9.3/32
NBMA-Address: 10.10.3.2
Flags: up
Expires-In: 0:28

Interface: tun0
Type: dynamic
Protocol-Address: 9.9.9.2/32
NBMA-Address: 10.10.2.2
Flags: up
Expires-In: 0:22

vyos@vyos:~$ ip neigh
9.9.9.3 dev tun0 lladdr 10.10.3.2 REACHABLE
10.10.1.1 dev eth0 lladdr 50:00:00:01:00:00 DELAY
9.9.9.2 dev tun0 lladdr 10.10.2.2 REACHABLE
vyos@vyos:~$
vyos@vyos:~$ show vpn ipsec sa
Connection          State    Uptime    Bytes In/Out    Packets In/Out    Remote address    Remote ID    Proposal
------------------  -------  --------  --------------  ----------------  ----------------  -----------  ----------------------------------
dmvpn-NHRPVPN-tun0  up       46m20s    2K/2K           23/22             10.10.2.2         N/A          AES_CBC_256/HMAC_SHA1_96/MODP_1024
dmvpn-NHRPVPN-tun0  up       52m19s    26K/29K         236/237           10.10.3.2         N/A          AES_CBC_256/HMAC_SHA1_96/MODP_1024

Spooke1

vyos@vyos:~$ show nhrp tunnel
Status: ok

Interface: tun0
Type: local
Protocol-Address: 9.9.9.255/32
Alias-Address: 9.9.9.2
Flags: up

Interface: tun0
Type: local
Protocol-Address: 9.9.9.2/32
Flags: up

Interface: tun0
Type: static
Protocol-Address: 9.9.9.1/24
NBMA-Address: 10.10.1.2
Flags: up

vyos@vyos:~$
vyos@vyos:~$ ip neigh show
192.168.1.1 dev tun0  FAILED
10.10.2.1 dev eth0 lladdr 50:00:00:02:00:00 REACHABLE
9.9.9.1 dev tun0 lladdr 10.10.1.2 STALE
9.9.9.3 dev tun0  FAILED
vyos@vyos:~$
vyos@vyos:~$ show vpn ipsec sa
Connection          State    Uptime    Bytes In/Out    Packets In/Out    Remote address    Remote ID    Proposal
------------------  -------  --------  --------------  ----------------  ----------------  -----------  ----------------------------------
dmvpn-NHRPVPN-tun0  up       2s        12K/11K         104/102           10.10.1.2         N/A          AES_CBC_256/HMAC_SHA1_96/MODP_1024
dmvpn-NHRPVPN-tun0  up       N/A       N/A             N/A               N/A               N/A          N/A
vyos@vyos:~$
vyos@vyos:~$ sudo /etc/init.d/opennhrp.init restart
Restarting Next Hop Resolution Protocol: opennhrp.
vyos@vyos:~$
vyos@vyos:~$ ping 192.168.1.1
PING 192.168.1.1 (192.168.1.1) 56(84) bytes of data.
From 2.2.2.2 icmp_seq=1 Time to live exceeded
From 2.2.2.2 icmp_seq=2 Time to live exceeded

The log is as follows

vyos@vyos# sudo /etc/init.d/opennhrp.init restart
Restarting Next Hop Resolution Protocol: opennhrp.
[edit]
vyos@vyos#
[edit]
vyos@vyos# run show log tail 300
Sep 24 07:10:06 vyos pdns_recursor[1928]: Failed to update . records, RCODE=-1
Sep 24 07:11:36 vyos systemd[1]: Starting Cleanup of Temporary Directories...
Sep 24 07:11:36 vyos systemd-tmpfiles[2439]: [/usr/lib/tmpfiles.d/heartbeat.conf:3] Line references path below legacy directory /var/run/, updating /var/run/heartbeat 鈫?/run/heartbeat; please update the tmpfiles.d/ drop-in file accordingl
y.
Sep 24 07:11:36 vyos systemd-tmpfiles[2439]: [/usr/lib/tmpfiles.d/heartbeat.conf:4] Line references path below legacy directory /var/run/, updating /var/run/heartbeat/ccm 鈫?/run/heartbeat/ccm; please update the tmpfiles.d/ drop-in file ac
cordingly.
Sep 24 07:11:36 vyos systemd-tmpfiles[2439]: [/usr/lib/tmpfiles.d/heartbeat.conf:5] Line references path below legacy directory /var/run/, updating /var/run/heartbeat/crm 鈫?/run/heartbeat/crm; please update the tmpfiles.d/ drop-in file ac
cordingly.
Sep 24 07:11:36 vyos systemd-tmpfiles[2439]: [/usr/lib/tmpfiles.d/heartbeat.conf:6] Line references path below legacy directory /var/run/, updating /var/run/heartbeat/dopd 鈫?/run/heartbeat/dopd; please update the tmpfiles.d/ drop-in file
accordingly.
Sep 24 07:11:36 vyos systemd-tmpfiles[2439]: [/usr/lib/tmpfiles.d/resource-agents.conf:1] Duplicate line for path "/run/resource-agents", ignoring.
Sep 24 07:11:36 vyos systemd[1]: systemd-tmpfiles-clean.service: Succeeded.
Sep 24 07:11:36 vyos systemd[1]: Started Cleanup of Temporary Directories.
Sep 24 07:12:09 vyos pdns_recursor[1928]: Failed to update . records, got an exception: Server Failure while retrieving DS records for net
Sep 24 07:12:09 vyos pdns_recursor[1928]: Failed to update . records, RCODE=-1

This is an old thread, but I believe the issue here may have been that DPD was missing on the hub. I was also having the same problem on 1.3.1-S1, where I had DPD set on the spoke, would reboot the hub, and the spoke would never re-register once the hub was back up. However, after adding DPD to the hub too, and rebooting the hub, this is working as expected. Now when I reboot the hub, within about 5 minutes the spoke has re-registered with the hub and is reachable again.

I’m on 1.3.0-rc6, and configuring DPD on my hub and spokes does not result in spokes coming back online after a hub reset. I left them overnight last night to see if they would eventually come back online, and they didn’t.

@aohanian Can you share your hub and spoke config so I can see if there are any differences that may result in this behavior?

I am only able to get a spoke reconnected to a rebooted hub by calling sudo /etc/init.d/opennhrp.init restart (or rebooting the spoke). I could write a script to ping the hub and restart NHRP after a set number of failed pings, but this feels hacky and I would feel better if I could get the router to work correctly in the first place.

Thanks!

Here is an example hub and spoke config from my GNS3 virtual lab. Everything seems to reconnect fine for me a few minutes after rebooting the hub. I’m using a newer self build of VyOS 1.3.

hub1.config (6.5 KB)
spoke4.config (6.6 KB)

Thank you for sharing your configs! Unfortunately, when my spokes try to reconnect, the traffic is only GRE, and no attempt is made to establish an IPSec tunnel. Does your configuration reconnect with your iptables rules to block GRE traffic?

Yes, my lab devices reconnect IPsec even with the iptables rules to block clear text gre