Vyos cluster design theory questions

Thanks for checking out this thread. I have been working on a huge VyOS deployment alone for the past several months, and I feel like I need feedback from others in terms of design guidance and making sure I’m not setting any traps for the future.

I have 4 of these physical boxes of the following configuration:

Each box is in a different rack in a different datacenter.

Each box has Layer 3 connectivity to the Internet and to the other 3 boxes over eth0. In other words, eth0 does not support VLANs, and the physical servers have to communicate via routers if they want to talk to eachother over eth0. Also, those routers on eth0 have mtu 1500 and so eth0 must be 1500 mtu on the phyiscal boxes or else they cannot communicate over eth0.

eth2 on the otherhand has layer2 connectivity between the servers, in otherwords, VLANs are supported and the physical boxes can communicate directly without any routers in the way, and can use 9000 mtu nicely!

Another difference between eth0 and eth2 is that on eth2 i can use any random autogenerated mac address I want and use any public or private IP i want with it. I can migrate that VM to any other physical box and the traffic follows virtual uninterrupted. eth0 on the otherhand, every IP is bound to a specific MAC address provided by the ISP, and they are bound to a specific physical host and so if a VM is migrated, those IPs will not work without a request to the ISP to move the subnet which is a lenghtly process. In otherwords, eth0 should be used for management and internet access only and eth2 should be used for storage and all VM traffic.

With this much ram and CPU power I can make as many VyOS VMs as I need to support the platform… I’m currently running several including vyos routers for Core, VPN, Proxy, and Access.

I decided to go with CentOS boxes to handle DNS and DHCP services, but VyOS is doing everything else right now including all routing, firewall, VPN, NAT, load-balancing, VRRP, OSPF, DHCP-relay and DNS Forwarding Cache.

I am not sure that the current routing topology that I have is ideal, so I guess my goal should be to settle on a viable topology.

I have some limitations to work with besides the ones listed above… For instance: I do not need the VYOS VMs to be able to Live Migrate, but I do need my other VMs to be able to Live Migrate and I have an NFS shared storage server to store those VMs, while I plan on storing VyOS VMs on Local Storage (so that atleast the network stays up if the NFS goes down).

Each VM is limited to 7 virtual NICs, and the VyOS VM’s do not do any VLAN tagging. All the VLAN tagging is done by the hypervisor. I may need to support hundreds of VLANs in the near future, so having some “big” routers with tons of VLAN interfaces is out of the question. I will need a core/distro/access model (I think).

I “could” use SR-IOV here and make things run really slick by doing the VLAN tagging in the VyOS VMs, but I ran into some complications. Although I don’t need to live-migrate VyOS, using SR-IOV NICs makes it so that a VM cannot live-migrate, and if VyOS is on SR-IOV and regular VMs are not, then they cannot communicate with eachother unfortunately.

So I guess the question I pose to the community first is, if you were building this yourself, and you had these 4 physical boxes that you can do whatever you want with, how would you setup your core routing?

Probably you also want to attach some high level diagram too, just will be esier

1 Like

I will admit I omitted a design because I didnt want to set the stage or get people to think it needs to be one way or another…

I guess to forumate a good design, one would need to understand the requirements for the platform… So here’s some basic requirements…

This is for a mixed-use multi-tenant hybrid virtualization environment.
At the core of it we have these 4 XenServers which form the multi-tenant pool where most VMs live.
Some tenants have leased dedicated xenservers.
Most tenants have 1 or 2 VLANs (Private and DMZ)
Some virtual servers (like PBX) exist and its nice to have a public IP programmed directly on the VM
Some other virtual servers (like Web Servers) exist and need public IP’s NAT’d to a private IP
Some virtual servers are only accessible by OpenVPN
OpenVPN service needs to support both split tunnel and full tunnel as well as tcp443 and udp1194
Some Tenants may have Ubiquiti EdgeRouters to create site-to-site OpenVPN with their private VLANs on this platform.
Some Tenants require load-balancing proxy like HAproxy for inbound connections.

Ok so I guess here is a basic overview of the physical connections that connect be changed.

Every connection is 10GbE and low latency.

I can make up to 4096 vlans on the private network.

I would like to setup VyOS VMs on these xen servers to power my multi-tenant lab environment.

For what it’s worth there actually 4 Xenservers, but for diagraming simplicity reasons I left it out.

So the idea is to have VM servers on the private network on different VLANS.

Obviously at a minimum I could do 3 vyos VM and use VRRP.

Alternatively I could create a seperate VyOS VM for each VLAN, or a pair for each VLAN.

I could also make more Vyos to seperate out roles like make seperate vyos for VPN services/termination since each VLAN will like need some sort of VPN access in or out.

Just wondering what did you end up doing ?