Each of which then connects to 2x Dell servers that host
an instance of VyOS per server inside of Proxmox
On each of the VyOS devices is an anycast IP. This anycast IP is part of a subnet that I have setup on Proxmox as a VNet for my VXLAN. If I ping between devices that are part of this subnet I can see traffic being encapsulated in VXLAN so this seems fine.
Behind each of the VyOS devices is a VM with its gateway set as the anycast IP. If I ping 8.8.8.8 and the VM decides to send traffic out of the VyOS device on the same node as itsself, I can’t reach 8.8.8.8. However, if it goes out via the VyOS device on the other node, it gets out fine. This is made even weirder by the fact that I can ping every single hop along the route to 8.8.8.8.
Another thing that makes me sure it isn’t the VXLAN/Anycast setup itsself is that if I set the gateway to another IP on the VyOS device I get the exact same problem. This happens for IPv4 and IPv6.
Something weird I have noticed is that when pinging 8.8.8.8, the reply reaches one of the Mikrotik distribution switch but a TCPDump on the VyOS node only shows the request and never the reply so it seems like it is getting stuck at the Mikrotik somehow. My first thought was firewall but there are no forward rules on the Mikrotik or VyOS nodes, only input so the default behaviour should be to forward everything. There is also no form of NAT at play. The Mikrotiks and VyOS nodes share routes via OSPF.
I would immediately blame the Mikrotiks but as mentioned, when traversing from VM to opposite node VyOS to 8.8.8.8 there is no problem.
EDIT: If I disable one of the VyOS VMs I can ping 8.8.8.8 from both VMs behind both VyOS machines. No matter what I do though I can’t ping 8.8.8.8 from VyOS itsself and pings only ever get as far as the next hop when going from VyOS itsself.
Did you set established related to just the input filter or global state policy ?
set firewall global-options state-policy established action accept
set firewall global-options state-policy related action accept
set firewall global-options state-policy invalid action drop