VyOS Anycast with Proxmox SDN Service

Current setup overview:

  • Juniper router (Core) that connects to
    • 2x Mikrotik distribution switches
      • Each of which then connects to 2x Dell servers that host
        • an instance of VyOS per server inside of Proxmox

On each of the VyOS devices is an anycast IP. This anycast IP is part of a subnet that I have setup on Proxmox as a VNet for my VXLAN. If I ping between devices that are part of this subnet I can see traffic being encapsulated in VXLAN so this seems fine.

Behind each of the VyOS devices is a VM with its gateway set as the anycast IP. If I ping 8.8.8.8 and the VM decides to send traffic out of the VyOS device on the same node as itsself, I can’t reach 8.8.8.8. However, if it goes out via the VyOS device on the other node, it gets out fine. This is made even weirder by the fact that I can ping every single hop along the route to 8.8.8.8.

Another thing that makes me sure it isn’t the VXLAN/Anycast setup itsself is that if I set the gateway to another IP on the VyOS device I get the exact same problem. This happens for IPv4 and IPv6.

Something weird I have noticed is that when pinging 8.8.8.8, the reply reaches one of the Mikrotik distribution switch but a TCPDump on the VyOS node only shows the request and never the reply so it seems like it is getting stuck at the Mikrotik somehow. My first thought was firewall but there are no forward rules on the Mikrotik or VyOS nodes, only input so the default behaviour should be to forward everything. There is also no form of NAT at play. The Mikrotiks and VyOS nodes share routes via OSPF.

I would immediately blame the Mikrotiks but as mentioned, when traversing from VM to opposite node VyOS to 8.8.8.8 there is no problem.

EDIT: If I disable one of the VyOS VMs I can ping 8.8.8.8 from both VMs behind both VyOS machines. No matter what I do though I can’t ping 8.8.8.8 from VyOS itsself and pings only ever get as far as the next hop when going from VyOS itsself.

Can you share some configuration for your vyos VMS?

Please see below:

  • The config is the same on the other VM just obviously with different IPs in the same subnets and different router-id for OSPF etc.
  • Eth0 is the main interface that communicates with other bits of equipment and is used for OSPF
  • Eth1 is the anycast address that VMs behind VyOS use as their gateway
  • Eth2 has another IP in the anycast subnet (originally Eth1 and Eth2 were a single interface but I split them to see if that worked)
  • The hw-id on Eth1 is the same as the anycast interface on the other VyOS VM.
firewall {
    ipv4 {
        input {
            filter {
                default-action drop
                rule 100 {
                    action accept
                    source {
                        address ****
                    }
                }
                rule 200 {
                    action accept
                    protocol ospf
                    source {
                        address 1.1.1.0/28
                    }
                }
                rule 300 {
                    action accept
                    protocol icmp
                    source {
                        address 1.1.1.0/28
                    }
                }
            }
        }
    }
    ipv6 {
        input {
            filter {
                default-action drop
                rule 100 {
                    action accept
                    source {
                        address ****
                    }
                }
                rule 200 {
                    action accept
                    protocol ospf
                    source {
                        address 1:1:1:1::/80
                    }
                }
                rule 300 {
                    action accept
                    protocol ipv6-icmp
                    source {
                        address 1:1:1:1::/80
                    }
                }
                rule 400 {
                    action accept
                    protocol ospf
                    source {
                        address fe80::/10
                    }
                }
            }
        }
    }
}
interfaces {
    ethernet eth0 {
        address 1.1.1.1/28
        address 1:1:1:1::1/80
        mtu 1578
        offload {
            gro
            gso
            sg
            tso
        }
    }
    ethernet eth1 {
        address 2.2.2.14/28
        address 2:2:2:2:2:ffff:ffff:ffff/80
        hw-id bc:24:11:14:14:14
        mtu 1450
    }
    ethernet eth2 {
        address 2.2.2.1/28
        address 2:2:2:2::1/80
        mtu 1450
    }
    loopback lo {
    }
}
protocols {
    ospf {
        area 1 {
            authentication md5
        }
        interface eth0 {
            area 1
            authentication {
                md5 {
                    key-id * {
                        md5-key ****
                    }
                }
            }
        }
        interface eth2 {
            area 1
            passive {
            }
        }
        parameters {
            router-id 1.1.1.1
        }
    }
    ospfv3 {
        interface eth0 {
            area 1
        }
        interface eth2 {
            area 1
            passive
        }
        parameters {
            router-id 1.1.1.1
        }
    }
    static {
    }
}
service {
    ssh {
        port 22
    }
}
system {
    config-management {
        commit-revisions 100
    }
    console {
        device ttyS0 {
            speed 115200
        }
    }
    host-name vyos
    name-server 9.9.9.9
    name-server 1.1.1.1
    syslog {
        global {
            facility all {
                level info
            }
            facility local7 {
                level debug
            }
        }
    }
}

So is it the vyos VM that can’t ping out from the vxlan? Or a VM that sits connected to vyos that is part of it?

Can’t see any firewall rules for established related traffic? Is this the whole config?

It is the VM that is connected that can’t ping out. It can ping out via the FRR on the opposite node but not the one on this same node.

I have tried with established related rules but it doesn’t seem to make a difference. This is the whole config.

Did you set established related to just the input filter or global state policy ?

set firewall global-options state-policy established action accept
set firewall global-options state-policy related action accept
set firewall global-options state-policy invalid action drop

I did this on the global state policy