Cannot ping through vyos to server behind it.

Good morning.
I have done something magically wrong and I do not know what.
I was working up until today when we tried to push a lot of http across it and it had a meltdown.
Last month we did the same thing without a problem.

there are 2 pairs of VRRP routers sharing a backend network v1/2 and v3/4, they have different front ends though. (staging and production)
These are hosted on ECL2 (NTT COM’s openstack solution)

There are 5 vrrp addresses on the front and 1 on each of the back interfaces. all in sync-group VY.
front (separate networks) 10,20,30,40,50 These have global IP addresses.
back (shared network 1) 11 for staging, 101 for production. These are my DMZ, they have 10.41.0.0/x addresses.
back (shared network 2) 12 for staging, 102 for production. These are internal, they can get out and have 10.41.10.0/x addresses.
(I checked, 1/2 use one multicast address, 3/4 use another.)

[There is a chance this is idiotic, but that does not explain why it worked last month…]

There are nat rules, nothing special, cribbed from the user guide.
There are firewall rules for “to the firewall” and “in”

I can ping all the routers just fine.
But as of a few hours ago, I suddenly became unable to ping the servers behind them, and rebooting (the old windows trick) doesn’t fix it.
From the router itself, I can ping both the inside and the outside just fine with no loss.
From the inside server, I have the same problem.
This mysteriously went away after ~ 4 hours.

Implying that crossing through VYOS was a problem.

What might I have I done wrong? Besides crossing the streams of course. Where might I look?

You are, of course, free to laugh at this. It’s been a decade since I last touched a router.

Hello,
sorry to hear that, mysterious things happens all the time,
it’s hard to tell something clear here, but i will recommend capture dumps next time on all sides when an issue occurs,
maybe you will want to set up remote logging/monitoring as well, so you can get traces from them too

Also, check configs on consistency(if failover happens to second router with wrong configuration for example)
or you can provide scrubbed configs

If you are looking for commercial support for VyOS, ping me via PM

Thanks!

I shall have to think about your offer seriously. Today I plan to dig through the logs and whimper.

One theory that occurred to me after a good night’s sleep.

ARP vs VRRP.

Although why it upset all 4 at the same time is a mystery, I’m wondering if the configuration I thought was committed was not. For example. The “packet to the firewall itself” rule was seeing lots of packets that should have gone to the VRRP address. Yet it was the master.

Even with conntrack-sync disabled it was still loony tunes.