vCenter keeps migrating my vyos machines from host to host, causing micro outages

I have two completely separate ESXi clusters, one runs vSphere/ESXi 7, and the other cluster runs vSphere/ESXi 8.

I have one instance of VyOS installed on each cluster.

On both clusters, the VyOS machine keeps getting vmotion’ed between ESXi hosts on the same cluster.

Now, why do I write this on a VyOS forum?

  • Its only the VyOS VM that gets migrated.
  • All hosts in esxi cluster has plenty of free cpu, ram and disk.
  • VyOS vm has 2 CPU, 4GB RAM, 8GB disk
  • VyOS vm barely uses any resources (50Mhz CPU/440MB ram)
  • VyOS vm does not forward any traffic (a few Kbps), but just participate in a very small ospf topology.
  • Tested with new installs vyos 1.3.3, vyos 1.4 (selfbuilt oct 17).
  • No other VM’s with the same settings do get migrated.
  • This happens on both my clusters, which do not share settings.
  • First DRS somehow decides to hot migrate VM
  • Then vSphere logs says VM machine mem/cpu usage changed from Gray to Green (see screenshot)

Only the VyOS get migrated between 5-20 times every day.

While I’m no vmware expert, is there something in vmware tools which could trigger this?

$ dpkg -l | grep VMware
ii  open-vm-tools                        2:12.2.0-1+deb12u1               amd64        Open VMware Tools for virtual machines hosted on VMware (CLI)
ii  vyos-1x-vmware                       1.4dev1-33-gc5627b326            amd64        VyOS configuration scripts and data for VMware

$ free -m
               total        used        free      shared  buff/cache   available
Mem:            1989         440        1188           1         511        1548
Swap:              0           0           0

$ show version 
Version:          VyOS 1.4-rolling-202310171102
Release train:    sagitta

Built by:         olofl
Built on:         Tue 17 Oct 2023 11:02 UTC
Build UUID:       a7600e85-15d8-45c7-a44a-17162ffadcd6
Build commit ID:  a03b5dbd3e3699

Doesnt it exist some kind of host affinity in VMware so you can “lock” a specific VM-guest to a specific VM-host in your cluster (that is any vMotion doesnt occur until that VM-host is completely gone for whatever reason)?

For future reference:

My suspision is that we had too agressive settings, and the VyOS machine used the least resources.
So once the ESXi hosts where just slightly balanced on resources, they migrated the least resource heavy VM, which was my VyOS VM.

Migrations are not as frequent anymore after I lowered this setting.

Ref: DRS Migration Threshold

