VTI with BGP over IPsec, facing weird issue

,

Hi Team,

Here is my scenario. I have vyos 1.2.8 and have configured two IPsec tunnels with Azure on different ISPs and then configured BGP over IPsec using APIPA IP addresses. So, in this case lets assume on vyos I have ISP1 and ISP2. 169.254.21.1 is for ISP1 and 169.254.22.1 is for ISP2.

I have given higher weight to ISP1 and traffic is being exchanged with proper weight-age. However when both the links are up traffic flow absolutely properly. But even if my ISP2 which is not that stable goes down BGP traffic stops completely and show ip bgp summay shows both peers shows in ACTIVE state while show vpn ike sa shows ISP1 up as well as IPsec stays up for ISP1 and ike and Ipsec shows down for ISP2. Again BGP peer IP addresses both shows in ACTIVE stage.

Any help is really appreciated and here is my config

set interfaces ethernet eth0 address 'xxx.xxx.97.110/29'
set interfaces ethernet eth0 duplex 'auto'
set interfaces ethernet eth0 firewall local name 'blockssh'
set interfaces ethernet eth0 hw-id 'XX:XX:XX:XX:XX:e0'
set interfaces ethernet eth0 smp-affinity 'auto'
set interfaces ethernet eth0 speed 'auto'
set interfaces ethernet eth1 address 'xxx.xxx.226.237/28'
set interfaces ethernet eth1 duplex 'auto'
set interfaces ethernet eth1 firewall local name 'blockssh'
set interfaces ethernet eth1 hw-id 'XX:XX:XX:XX:XX:e1'
set interfaces ethernet eth1 smp-affinity 'auto'
set interfaces ethernet eth1 speed 'auto'
set interfaces vti vti2 address 'xxx.xxx.21.9/30'
set interfaces vti vti4 address 'xxx.xxx.21.13/30'
set policy prefix-list accept-only-s4 rule 2 action 'permit'
set policy prefix-list accept-only-s4 rule 2 prefix 'xxx.xxx.11.0/24'
set policy prefix-list accept-only-s4 rule 3 action 'permit'
set policy prefix-list accept-only-s4 rule 3 prefix 'xxx.xxx.10.0/24'
set policy route-map accept-only-s4 rule 2 action 'deny'
set protocols bgp XXXXXX address-family ipv4-unicast network xxx.xxx.40.0/23
set protocols bgp XXXXXX neighbor xxx.xxx.21.1 address-family ipv4-unicast route-map import 'accept-only-s4'
set protocols bgp XXXXXX neighbor xxx.xxx.21.1 address-family ipv4-unicast weight '100'
set protocols bgp XXXXXX neighbor xxx.xxx.21.1 disable-connected-check
set protocols bgp XXXXXX neighbor xxx.xxx.21.1 remote-as '65515'
set protocols bgp XXXXXX neighbor xxx.xxx.21.1 timers holdtime '30'
set protocols bgp XXXXXX neighbor xxx.xxx.21.1 timers keepalive '15'
set protocols bgp XXXXXX neighbor xxx.xxx.21.1 update-source 'xxx.xxx.21.13'
set protocols bgp XXXXXX neighbor xxx.xxx.22.1 address-family ipv4-unicast route-map import 'accept-only-s4'
set protocols bgp XXXXXX neighbor xxx.xxx.22.1 disable-connected-check
set protocols bgp XXXXXX neighbor xxx.xxx.22.1 remote-as '65515'
set protocols bgp XXXXXX neighbor xxx.xxx.22.1 timers holdtime '30'
set protocols bgp XXXXXX neighbor xxx.xxx.22.1 timers keepalive '15'
set protocols bgp XXXXXX neighbor xxx.xxx.22.1 update-source 'xxx.xxx.21.9'
set protocols static interface-route xxx.xxx.21.1/32 next-hop-interface vti4
set protocols static interface-route xxx.xxx.22.1/32 next-hop-interface vti2
set protocols static route xxx.xxx.0.0/0 next-hop xxx.xxx.226.225
set protocols static route xxx.xxx.153.181/32 next-hop xxx.xxx.97.105
set protocols static route xxx.xxx.10.0/24 next-hop xxx.xxx.144.18
set protocols static route xxx.xxx.11.0/24 next-hop xxx.xxx.144.18
set protocols static route xxx.xxx.40.0/23 next-hop xxx.xxx.144.18
set vpn ipsec esp-group AZURE-IKE compression 'disable'
set vpn ipsec esp-group AZURE-IKE lifetime '3600'
set vpn ipsec esp-group AZURE-IKE mode 'tunnel'
set vpn ipsec esp-group AZURE-IKE proposal 1 encryption 'aes256'
set vpn ipsec esp-group AZURE-IKE proposal 1 hash 'sha1'
set vpn ipsec ike-group AZURE-IKE dead-peer-detection action 'clear'
set vpn ipsec ike-group AZURE-IKE dead-peer-detection interval '15'
set vpn ipsec ike-group AZURE-IKE dead-peer-detection timeout '30'
set vpn ipsec ike-group AZURE-IKE ikev2-reauth 'yes'
set vpn ipsec ike-group AZURE-IKE key-exchange 'ikev2'
set vpn ipsec ike-group AZURE-IKE lifetime '28800'
set vpn ipsec ike-group AZURE-IKE proposal 1 dh-group '2'
set vpn ipsec ike-group AZURE-IKE proposal 1 encryption 'aes256'
set vpn ipsec ike-group AZURE-IKE proposal 1 hash 'sha1'
set vpn ipsec ipsec-interfaces interface 'eth0'
set vpn ipsec ipsec-interfaces interface 'eth1'
set vpn ipsec site-to-site peer xxxxx.tld authentication id 'xxx.xxx.226.237'
set vpn ipsec site-to-site peer xxxxx.tld authentication mode 'pre-shared-secret'
set vpn ipsec site-to-site peer xxxxx.tld authentication pre-shared-secret xxxxxx
set vpn ipsec site-to-site peer xxxxx.tld authentication remote-id 'xxx.xxx.16.236'
set vpn ipsec site-to-site peer xxxxx.tld connection-type 'initiate'

set vpn ipsec site-to-site peer xxxxx.tld ike-group 'AZURE-IKE'
set vpn ipsec site-to-site peer xxxxx.tld ikev2-reauth 'inherit'
set vpn ipsec site-to-site peer xxxxx.tld local-address 'xxx.xxx.226.237'
set vpn ipsec site-to-site peer xxxxx.tld vti bind 'vti4'
set vpn ipsec site-to-site peer xxxxx.tld vti esp-group 'AZURE-IKE'
set vpn ipsec site-to-site peer xxxxx.tld authentication id 'xxx.xxx.97.110'
set vpn ipsec site-to-site peer xxxxx.tld authentication mode 'pre-shared-secret'
set vpn ipsec site-to-site peer xxxxx.tld authentication pre-shared-secret xxxxxx
set vpn ipsec site-to-site peer xxxxx.tld authentication remote-id 'xxx.xxx.153.181'
set vpn ipsec site-to-site peer xxxxx.tld connection-type 'initiate'

set vpn ipsec site-to-site peer xxxxx.tld ike-group 'AZURE-IKE'
set vpn ipsec site-to-site peer xxxxx.tld ikev2-reauth 'inherit'
set vpn ipsec site-to-site peer xxxxx.tld local-address 'xxx.xxx.97.110'
set vpn ipsec site-to-site peer xxxxx.tld vti bind 'vti2'
set vpn ipsec site-to-site peer xxxxx.tld vti esp-group 'AZURE-IKE'
show ip bgp
BGP table version is 259, local router ID is zxxxx.xxxx.226.237, vrf id 0
Default local pref 100, local AS 65506
Status codes:  s suppressed, d damped, h history, * valid, > best, = multipath,
               i internal, r RIB-failure, S Stale, R Removed
Nexthop codes: @NNN nexthop's vrf id, < announce-nh-self
Origin codes:  i - IGP, e - EGP, ? - incomplete

   Network          Next Hop            Metric LocPrf Weight Path
*> 10.11.44.0/22    169.254.21.1                         100 65515 i
*                   169.254.22.1                           0 65515 i
*> 192.168.40.0/23  0.0.0.0                  0         32768 i
 show ip bgp summary

Neighbor        V         AS MsgRcvd MsgSent   TblVer  InQ OutQ  Up/Down State/PfxRcd
169.254.21.1    4      65515   76084   66813        0    0    0 00:48:49            1
169.254.22.1    4      65515   66990   59003        0    0    0 00:48:49            1

Could it be that a router on the Azure side still sees your BGP peer via broken connections? IPSec is not very fast and may take a lot of time sometimes to react to events in a network.

I would recommend checking counters in show vpn ipsec sa output carefully when both links are up. You may see there that actually all traffic from Azure uses only one IPSec connection, while you are using both.

Hmmm - That seems to be correct

show vpn ipsec sa | strip-private
Connection                      State    Up       Bytes In/Out    Remote address    Remote ID    Proposal
------------------------------  -------  -------  --------------  ----------------  -----------  ------------------------------------------------
peer-xxx.xxx.153.181-tunnel-vti  up       2 hours  5M/25K          xxx.xxx.153.181    N/A          AES_CBC_256/HMAC_SHA1_96/PRF_HMAC_SHA1/MODP_1024
peer-xxx.xxx.16.236-tunnel-vti   up       2 hours  11M/2M          xxx.xxx.16.236     N/A          AES_CBC_256/HMAC_SHA1_96/PRF_HMAC_SHA1/MODP_1024

Not sure though why both the tunnels are being used?

Could this perhaps be being caused by asymmetric routing? When connecting 2 upstreams on a single router, one might expect policy routing to be needed to send traffic from ISP1’s IPs out via eth0 and ISP2’s IPs out via eth1. The default route may also play a part. What does your routing table currently show?

Nah - I added static route for another peer here is my static route

S>* 0.0.0.0/0 [1/0] via 111.125.226.225, eth1, 05w1d02h
B>  10.11.44.0/22 [20/0] via 169.254.21.1 (recursive), 05:19:58
  *                        via 169.254.21.1, vti4 onlink, 05:19:58
C>* 10.144.144.16/28 is directly connected, eth7, 05w1d02h
S>* xxx.xxx.153.181/32 [1/0] via 42.104.97.105, eth0, 01w2d10h
C>* xxx.xxx.97.104/29 is directly connected, eth0, 05w1d02h
C>* xx.xxx.226.224/28 is directly connected, eth1, 05w1d02h
S>* 169.254.21.1/32 [1/0] is directly connected, vti4, 05:19:58
C>* 169.254.21.8/30 is directly connected, vti2, 05:19:58
C>* 169.254.21.12/30 is directly connected, vti4, 05:19:58
S>* 169.254.22.1/32 [1/0] is directly connected, vti2, 05:19:58
S>* 192.168.10.0/24 [1/0] via 10.144.144.18, eth7, 05w1d02h
S>* 192.168.11.0/24 [1/0] via 10.144.144.18, eth7, 05w1d02h
S>* 192.168.40.0/23 [1/0] via 10.144.144.18, eth7, 05w1d02h

My peer IP addresses are

set vpn ipsec site-to-site peer xxx.xxx.16.236 ike-group 'AZURE-IKE'
set vpn ipsec site-to-site peer xxx.xxx.153.181 ike-group 'AZURE-IKE'

peer-xxx.xxx.153.181-tunnel-vti up 2 hours 5M/25K xxx.xxx.153.181
Neglect the 25k, then focus on the 5M, being Bytes-In on your side. So it seems like the azure side has 2 equidistant routes for your network. Try AS-path prepend on your secondary VTI link

Hello,

What should be the AS PATH when prepending? Can it be any thing? I guess I am facing same issue? Would you please validate those commands?

set policy route-map as-path-vodafone rule 5 action 'permit'
set policy route-map as-path-vodafone rule 5 set as-path-prepend '65506'
set protocols bgp 65506 neighbor 169.254.22.1 address-family ipv4-unicast route-map export 'as-path-vodafone'

I did apply the route map for one of the peer. However the issue still persists.

set policy route-map as-path-vodafone rule 5 action 'permit'
set policy route-map as-path-vodafone rule 5 set as-path-prepend '65515'
set protocols bgp 65506 neighbor 169.254.22.1 address-family ipv4-unicast route-map import 'as-path-vodafone'

Have you checked the suggestion from @zsdc to ensure that the traffic is actually using both IPSEC connections? It also sounds to me as if it may only be using one.

Yes that is right - I am seeing connections on both the links and wondering how do I use as a backup VPN then?Since I am using BGP automatically it should opt primary/backup link?

The theory to be checked is that BOTH bgp sessions are only going down ONE of the ipsec connections. You would be best troubleshooting this before looking at the bgp.

That is true !! Again both the BGP goes down when my secondary IPsec connection goes down and still wondering why. I understand this link is flaky and most of the time it goes down hence wondering how do I eliminate that issue or troubleshoot it further.