Vrrp check script succeeds when run in shell, fails when run by vrrp

I have the below check script that gets the interface of eth0 and if the value exists tests connectivity, otherwise it returns an error.

When I run the script manually, I see “succ” in the checkstate, however when I run “restart vrrp” i see fault in the checkstate file. What could be the issue?

It seems that the output of vtysh is not valid in the environment the check script is run in. if I set eth0gw to “$(echo “127.0.0.1”)” the check script succeeds.
What can I do to get the desired output?

#!/bin/vbash
source /opt/vyatta/etc/functions/script-template


eth0status="$(/sbin/ethtool eth0 | grep "Link detected: yes)"
if [[ ! -z ${eth0status} ]]; then
 eth0gw="$(run show ip route | /usr/bin/awk '/0.0.0.0\/0/&&/eth0/{print $5}' | /bin/sed -e 's/,//g')"
 if [[ ! -z ${eth0gw} ]]; then
  /bin/ping -I eth0 -c 1 -W 1 ${eth0gw} > /tmp/checkstatus
 else
  echo "$(date) eth0 up, ping failed"
  exit 1
 fi
else
 echo "$(date) eth0 down"
 exit 0
fi

Hello @ACiD_GRiM, in this case I guess you are using health check script.
Can you change your script, and test again

#!/bin/vbash
source /opt/vyatta/etc/functions/script-template

eth0status="$(/sbin/ethtool eth0 | grep 'Link detected: yes')"
if [[ ! -z ${eth0status} ]]; then
 eth0gw="$(run show ip route | /usr/bin/awk '/0.0.0.0\/0/&&/eth0/{print $5}' | /bin/sed -e 's/,//g')"
 if [[ ! -z ${eth0gw} ]]; then
  /bin/ping -I eth0 -c 1 -W 1 ${eth0gw} > /tmp/checkstatus
 else
  echo "$(date) eth0 up, ping failed"
  exit
 fi
else
 echo "$(date) eth0 down"
 exit
fi
exit

You mix vbash and bash, as for me, this is not good idea, but this script works on test router.

Thanks for the help Dmitry,

I’ve only recently used vbash in an attempt to troubleshoot. originally eth0gw was set using vtysh -c “sh run ip route” and the script runtime was /bin/bash. However even in your current modification it results in a fault state for VRRP

here is the latest version I made before logging out last night

#!/bin/bash

eth0status="$(/sbin/ethtool eth0 | grep 'Link detected: yes')"
echo "$(date) ${eth0status}" >> /tmp/checkstatus
if [[ ! -z ${eth0status} ]]; then
 eth0gw="$(/usr/bin/vtysh -c 'show ip route' | /usr/bin/awk '/0.0.0.0\/0/&&/eth0/{print $5}' | /bin/sed -e 's/,//g')"
 echo "$(date) ${eth0gw}" >> /tmp/checkstatus
 if [[ ! -z ${eth0gw} ]]; then
  /bin/ping -I eth0 -c 1 -W 1 ${eth0gw} >> /tmp/checkstatus
 else
  echo "$(date) eth0 up, ping failed" >> /tmp/checkstatus
  exit 1
 fi
else
 echo "$(date) eth0 down" >> /tmp/checkstatus
 #Exit 0 because eth0 down is handled by vrrp transition
 exit 0
fi
exit

=====

$ sh ver
Version:          VyOS 1.2-rolling-201911180217
Built by:         autobuild@vyos.net
Built on:         Mon 18 Nov 2019 02:17 UTC
Build UUID:       1b67d568-983a-4200-beb4-2732a4d7879b
Build Commit ID:  e7a834c040cbd9

Architecture:     x86_64
Boot via:         installed image
System type:      VMware guest

Hardware vendor:  VMware, Inc.
Hardware model:   VMware7,1
Hardware S/N:     VMware-42 2a 01 58 71 99 0b c5-a4 22 60 63 a3 48 d6 f4
Hardware UUID:    58012a42-9971-c50b-a422-6063a348d6f4

Copyright:        VyOS maintainers and contributors

Can you show log output when you receive fault?

When I redirect stderr from vtysh to a file I see this:

cat /tmp/vty
% Can't open configuration file /etc/frr/vtysh.conf due to 'Permission denied'.
Exiting: failed to connect to any daemons.

frr documentation says that members of the frrvtygroup should have read access, keepalived_script user which runs the check script is a member of the group.
Permissions on the file seem fine
-rw-r----- 1 root frrvty 32 Nov 3 14:57 vtysh.conf

And setting all users permission to read results in this error:
Exiting: failed to connect to any daemons.

if I log into keepalived_script with sudo su - keepalived_script -s /bin/bash
I can run the vtysh command successfully, it just seems to be the scripted environment that is missing out on running vtysh.

I was able to get the script to work by adding keepalived_script to the sudoers file with NOPASSWD and use an init script to call the below script with sudo, however this isn’t within the normal vyos config and won’t persist across an upgrade.
Is this something that can be a resolvable bug report?

Here is the latest version of the script:

#!/bin/bash
echo "$(date) ==== Starting check script" >> /tmp/checkstatus
eth0status="$(/sbin/ethtool eth0 | grep 'Link detected: yes')"
echo "$(date) - eth0status: ${eth0status}" >> /tmp/checkstatus
if [[ ! -z ${eth0status} ]]; then
 /usr/bin/vtysh -c 'show ip route' &> /tmp/vty
 eth0gw="$(/usr/bin/vtysh -c 'show ip route' | grep -Po '^.*0\.0\.0\.0\/0.*via \K[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}(?=, eth0)')"
 echo "$(date) - eth0gw: $eth0gw" >> /tmp/checkstatus
 if [[ ! -z $eth0gw ]]; then
  /bin/ping -I eth0 -c 1 -W 1 $eth0gw >> /tmp/checkstatus
 else
  echo "$(date) eth0 up, ping failed" >> /tmp/checkstatus
  exit 1
 fi
else
 echo "$(date) eth0 down" >> /tmp/checkstatus
 #Exit 0 because eth0 down is handled by vrrp transition
 exit 0
fi

May be you need add sudo /usr/bin/vtysh? Did you try this?

@ACiD_GRiM FYI I configured VRRP with your script and all works without any modifications.

vyos@R1# tail -n 9  /tmp/checkstatus 
Wed Dec 18 09:45:08 UTC 2019 ==== Starting check script
Wed Dec 18 09:45:08 UTC 2019 - eth0status: 	Link detected: yes
Wed Dec 18 09:45:08 UTC 2019 - eth0gw: 10.0.0.2
PING 10.0.0.2 (10.0.0.2) from 10.0.0.1 eth0: 56(84) bytes of data.
64 bytes from 10.0.0.2: icmp_seq=1 ttl=64 time=1.93 ms

--- 10.0.0.2 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 1.936/1.936/1.936/0.000 ms
[edit]
vyos@R1# run show vrrp
Name    Interface      VRID  State    Last Transition
------  -----------  ------  -------  -----------------
group1  eth0              1  MASTER   4m38s

vyos@R1# run show version | match Version
Version:          VyOS 1.2-rolling-201912170217

But I’m not sure about your ping command in script, may be better use

/bin/ping -I eth0 -c 1 -W 1 $eth0gw >> /tmp/checkstatus && exit 0 || exit 1

I will update to the same build you’ve referenced tonight
Thanks for the script suggestion, I will make the change

edit:

I actually just made the change and still have the same problem. I’m only able to run the script if I run it as sudo, AND keepalived_script user is in sudoers file with NOPASSWD
This is an issue on both routers that were individually configured
I do see that there are improvements in the latest version with Keepalived, such as all vrrp groups in a sync group change state to backup/FAULT when the wangw state changes

I’ll also note an issue I’m having where the transition scripts don’t execute, but not the specific issue I’m reporting here.

Here’s my config:

router 1:
set high-availability vrrp group dmz_gateway advertise-interval ‘1’
set high-availability vrrp group dmz_gateway authentication password ‘mCD!343’
set high-availability vrrp group dmz_gateway authentication type ‘ah’
set high-availability vrrp group dmz_gateway hello-source-address ‘10.255.255.234’
set high-availability vrrp group dmz_gateway interface ‘eth2’
set high-availability vrrp group dmz_gateway preempt-delay ‘10’
set high-availability vrrp group dmz_gateway priority ‘200’
set high-availability vrrp group dmz_gateway virtual-address ‘172.16.0.1/24’
set high-availability vrrp group dmz_gateway vrid ‘2’
set high-availability vrrp group v6_dmz_gateway advertise-interval ‘1’
set high-availability vrrp group v6_dmz_gateway authentication password 'm
CD!343’
set high-availability vrrp group v6_dmz_gateway authentication type ‘ah’
set high-availability vrrp group v6_dmz_gateway hello-source-address ‘fd00:f9a8:9a7e:3::234’
set high-availability vrrp group v6_dmz_gateway interface ‘eth2’
set high-availability vrrp group v6_dmz_gateway preempt-delay ‘10’
set high-availability vrrp group v6_dmz_gateway priority ‘200’
set high-availability vrrp group v6_dmz_gateway virtual-address ‘fe80::f345/128’
set high-availability vrrp group v6_dmz_gateway virtual-address ‘fd00:f9a8:172:16::1/64’
set high-availability vrrp group v6_dmz_gateway vrid ‘3’
set high-availability vrrp group wan_gateway advertise-interval ‘1’
set high-availability vrrp group wan_gateway authentication password ‘m*CD!343’
set high-availability vrrp group wan_gateway authentication type ‘ah’
set high-availability vrrp group wan_gateway health-check failure-count ‘3’
set high-availability vrrp group wan_gateway health-check interval ‘2’
set high-availability vrrp group wan_gateway health-check script ‘/usr/bin/sudo /config/scripts/vrrp/wan_gateway-check.sh’
set high-availability vrrp group wan_gateway hello-source-address ‘fd00:f9a8:feed:beef::1’
set high-availability vrrp group wan_gateway interface ‘eth3’
set high-availability vrrp group wan_gateway preempt-delay ‘10’
set high-availability vrrp group wan_gateway priority ‘200’
set high-availability vrrp group wan_gateway transition-script backup ‘/config/scripts/vrrp/wan_gateway-fail.sh’
set high-availability vrrp group wan_gateway transition-script master ‘/config/scripts/vrrp/wan_gateway-master.sh’
set high-availability vrrp group wan_gateway virtual-address ‘fe80::888/128’
set high-availability vrrp group wan_gateway vrid ‘1’
set high-availability vrrp sync-group wangw member ‘wan_gateway’
set high-availability vrrp sync-group wangw member ‘dmz_gateway’
set high-availability vrrp sync-group wangw member ‘v6_dmz_gateway’
set service conntrack-sync failover-mechanism vrrp sync-group ‘wangw’

router 2:
set high-availability vrrp group dmz_gateway advertise-interval ‘1’
set high-availability vrrp group dmz_gateway authentication password ‘mCD!343’
set high-availability vrrp group dmz_gateway authentication type ‘ah’
set high-availability vrrp group dmz_gateway hello-source-address ‘10.255.255.233’
set high-availability vrrp group dmz_gateway interface ‘eth2’
set high-availability vrrp group dmz_gateway preempt-delay ‘10’
set high-availability vrrp group dmz_gateway virtual-address ‘172.16.0.1/24’
set high-availability vrrp group dmz_gateway vrid ‘2’
set high-availability vrrp group v6_dmz_gateway advertise-interval ‘1’
set high-availability vrrp group v6_dmz_gateway authentication password 'm
CD!343’
set high-availability vrrp group v6_dmz_gateway authentication type ‘ah’
set high-availability vrrp group v6_dmz_gateway hello-source-address ‘fd00:f9a8:9a7e:3::233’
set high-availability vrrp group v6_dmz_gateway interface ‘eth2’
set high-availability vrrp group v6_dmz_gateway preempt-delay ‘10’
set high-availability vrrp group v6_dmz_gateway virtual-address ‘fe80::f345/128’
set high-availability vrrp group v6_dmz_gateway virtual-address ‘fd00:f9a8:172:16::1/64’
set high-availability vrrp group v6_dmz_gateway vrid ‘3’
set high-availability vrrp group wan_gateway advertise-interval ‘1’
set high-availability vrrp group wan_gateway authentication password ‘m*CD!343’
set high-availability vrrp group wan_gateway authentication type ‘ah’
set high-availability vrrp group wan_gateway health-check failure-count ‘3’
set high-availability vrrp group wan_gateway health-check interval ‘2’
set high-availability vrrp group wan_gateway health-check script ‘/usr/bin/sudo /config/scripts/vrrp/wan_gateway-check.sh’
set high-availability vrrp group wan_gateway hello-source-address ‘fd00:f9a8:feed:beef::2’
set high-availability vrrp group wan_gateway interface ‘eth3’
set high-availability vrrp group wan_gateway preempt-delay ‘3’
set high-availability vrrp group wan_gateway transition-script backup ‘/config/scripts/vrrp/wan_gateway-fail.sh’
set high-availability vrrp group wan_gateway transition-script master ‘/config/scripts/vrrp/wan_gateway-master.sh’
set high-availability vrrp group wan_gateway virtual-address ‘fe80::888/128’
set high-availability vrrp group wan_gateway vrid ‘1’
set high-availability vrrp sync-group wangw member ‘wan_gateway’
set high-availability vrrp sync-group wangw member ‘dmz_gateway’
set high-availability vrrp sync-group wangw member ‘v6_dmz_gateway’
set service conntrack-sync failover-mechanism vrrp sync-group ‘wangw’

Here’s a simple example:

#!/bin/bash
echo "$(date) test" >> /tmp/checkvtysh
/usr/bin/vtysh -c "show ip route" >> /tmp/checkvtysh 2>&1

Output:

Wed Dec 18 18:35:58 MST 2019 test
% Can't open configuration file /etc/frr/vtysh.conf due to 'Permission denied'.
Exiting: failed to connect to any daemons.
Wed Dec 18 18:36:00 MST 2019 test
% Can't open configuration file /etc/frr/vtysh.conf due to 'Permission denied'.
Exiting: failed to connect to any daemons.
Wed Dec 18 18:36:02 MST 2019 test
% Can't open configuration file /etc/frr/vtysh.conf due to 'Permission denied'.
Exiting: failed to connect to any daemons.
Wed Dec 18 18:36:04 MST 2019 test
% Can't open configuration file /etc/frr/vtysh.conf due to 'Permission denied'.
Exiting: failed to connect to any daemons.
Wed Dec 18 18:36:06 MST 2019 test
% Can't open configuration file /etc/frr/vtysh.conf due to 'Permission denied'.
Exiting: failed to connect to any daemons.

I don’t see any issues with your scrip

set high-availability vrrp group group1 health-check script '/config/scripts/vrrp.sh'
set high-availability vrrp group group1 interface 'eth0'
set high-availability vrrp group group1 virtual-address '10.0.0.254/24'
set high-availability vrrp group group1 vrid '1'

vyos@R1# cat /config/scripts/vrrp.sh 
#!/bin/bash
echo "$(date) test" >> /tmp/checkvtysh
/usr/bin/vtysh -c "show ip route" >> /tmp/checkvtysh 2>&1

vyos@R1# run show version 
Version:          VyOS 1.3-rolling-201912190503
Built by:         autobuild@vyos.net
Built on:         Thu 19 Dec 2019 05:03 UTC

vyos@R1# sudo cat /tmp/checkvtysh 
Thu 19 Dec 2019 06:43:53 PM UTC test
Codes: K - kernel route, C - connected, S - static, R - RIP,
       O - OSPF, I - IS-IS, B - BGP, E - EIGRP, N - NHRP,
       T - Table, v - VNC, V - VNC-Direct, A - Babel, D - SHARP,
       F - PBR, f - OpenFabric,
       > - selected route, * - FIB route, q - queued route, r - rejected route

S>* 0.0.0.0/0 [1/0] via 10.0.0.2, eth0, 11:02:52
S   0.0.0.0/0 [210/0] via 192.168.0.1, eth1, 11:03:09
C>* 10.0.0.0/24 is directly connected, eth0, 11:03:06
C>* 172.16.0.0/24 is directly connected, vti0, 11:02:44
C>* 192.168.0.0/24 is directly connected, eth1, 11:03:11