VRRP does not stay in BACKUP status after reboot

Hi team,

I am running 1.2.4 and encountered two weird VRRP issues.

The setup is pretty simple that two VyOS box, vpn1a and vpn1b, that vpn1a interfaces has higher VRRP priority 110 and vpn1b has 100. Config shows below,

set high-availability vrrp group CLIENT4-1000 interface 'bond1.1000'
set high-availability vrrp group CLIENT4-1000 no-preempt
set high-availability vrrp group CLIENT4-1000 priority '110'  *(## vpn1b has 100 ##)*
set high-availability vrrp group CLIENT4-1000 virtual-address '172.30.104.1/24'
set high-availability vrrp group CLIENT4-1000 vrid '1'

set high-availability vrrp group UPLINK4-200 interface 'bond1.200'
set high-availability vrrp group UPLINK4-200 no-preempt
set high-availability vrrp group UPLINK4-200 priority '110'  *(## vpn1b has 100 ##)*
set high-availability vrrp group UPLINK4-200 transition-script backup '/config/scripts/vyos-restart-vpn.script dummy'
set high-availability vrrp group UPLINK4-200 transition-script fault '/config/scripts/vyos-restart-vpn.script dummy'
set high-availability vrrp group UPLINK4-200 transition-script master '/config/scripts/vyos-restart-vpn.script dummy'
set high-availability vrrp group UPLINK4-200 virtual-address '192.168.4.132/26'
set high-availability vrrp group UPLINK4-200 virtual-address '192.168.4.150/26'
set high-availability vrrp group UPLINK4-200 vrid '1'

set high-availability vrrp sync-group infravpn1 member 'CLIENT4-1000'
set high-availability vrrp sync-group infravpn1 member 'UPLINK4-200'

Issue-1. VRRP does not stay in BACKUP state after reboot.
Initially, vpn1a is the vrrp MASTER and vpn1b is the BACKUP. When I reboot vpn1a (use reboot command), the vpn1b takes over MASTER (and sends GARP to update ARP for other computers); this is correct.
However, when vpn1a boot up, it immediately grabs MASTER back despite I have “no-preempt” config. Is this an expected behavior?

Issue-2. VRRP VIP does not send out Gratuitous ARP (GARP) after bootup.
This is a follow up issue after the Issue-1. When vpn1a booted up and grabbed MASTER, I found it did not send out GARP hence the ARP information in other computers / uplink routers did not get updates. They are still pointing to old MAC address which is vpn1b. The result is all no one can access to the VIPs and got blackholed.
As a work around, I manually add “arping …” commands to /config/scripts/vyos-postconfig-bootup.script However, this does not sounds a correct VRRP behavior.

You have bond interfaces, and I read this in the keepalived.conf file:

# On some systems when bond interfaces are created, they can start passing traffic
# and then have a several second gap when they stop passing traffic inbound. This
# can mean that if keepalived is started at boot time, i.e. at the same time as
# bond interfaces are being created, keepalived doesn't receive adverts and hence
# can become master despite an instance with higher priority sending adverts.
# This option specifies a delay in seconds before vrrp instances start up after
# keepalived starts,
vrrp_startup_delay 5.5

I wonder if that’s the problem you’re hitting - I notice you are using bond interfaces.

How you could go about testing this theory - that’s above my expertise I’m sorry!

thanks @tjh for the information, let me do some researches on it and run the test in LAB see if it can be a remedy.

@tjh, so it looks like the vrrp_startup_delay has no effect. No matter what I did, the keepalived.conf is still been overwritten every time when the system boot up or commit new config.

Yes you’d need to modify the python code that writes the config, sorry, I didn’t mean to imply that modifying the config itself would work, as you’ve discovered it’s re-generated on every reboot by parsing the config file.

You would need to edit

/lib/live/mount/rootfs/<vyos-ver>.squashfs/usr/libexec/vyos/conf_mode/vrrp.py

You’ll see there how it parses the config to generate the file - you can just (I think) add a line with what you want and it should insert it into the config file when you save/reboot.

I might be totally wrong though, this might just be better logged as a Phabricator request referencing this thread. Good luck!

Handler for vrrp there /usr/libexec/vyos/conf_mode/vrrp.py

Wouldn’t another fix be to set both to preempt, which is typically best practice anyway?

You may still get the undesirable behavior on boot but it should flip back to the correct master assuming both are online. Might want to test that.