Help troubleshooting latency spikes to and through vyos

i run vyos in an esxi host and i notice traffic going to and through vyos will experience spikes of bad latency, while traffic to other guest VM’s on the same host is fine. i will see latency up to 1000ms running mtr to and through vyos. i have tested traffic all through my network and the bottle neck is vyos.

any suggestions on how to troubleshoot this? i am not using bgp/ospf/iegrp/etc, maybe i can disable those services completely?

thanks :slight_smile:

might be my fault - the vm had 2 cores and 2gb of ram so i kicked it up to 16 cores and 6gb of ram and seems to be better. seeing the minimum specs so low maybe gave me a false sense of how to size the vm.

You will get some overhead since you use vm but also in the hands of the vm host.

How loaded is your vm host?

Do you have ballooning enabled from the vm host?

Whats the physical cpu and other specs?

How is this vm guest configured from the vm host (virtio or e1000 drivers etc)?

What is the interface speed and how many is configured for your vm guest?

Do you do other traffic generation either for routing/firewalling or services runned locally on your VyOS such as encrypted/unencrypted tunnels, bgp/isis/ospf sessions, dhcp etc? Perhaps vrrp failover?

Also try to login and run atop or htop to try to spot what happens in your system when these high latency spikes occurs.

They can still be caused by other things out in your network.

hi again :slight_smile:

  1. 25-50%
  2. yes, not enough load for an issue really
  3. epyc7262 190gb ram
  4. 4 nics, one is actually e1000 but not in use.
  5. enc tunnels
  6. did this and couldnt really identify a particular service but could identify cpu spikes during the high latency

i realized this was likely a resource issue after i posted the issue. since cranking up the cpu and mem its been smooth sailing. i think sometimes typing it out helps.

thanks again @Apachez

And I suppose you run other VM’s aswell?

How is the total VCPU count of all VM’s and compared to the physical cores available in AMD EPYC7262 (8C, 16T)?

https://www.amd.com/en/products/cpu/amd-epyc-7262

The 4 nics are they 1G, 10G, 100G?

Enc tunnels, wireguard or ipsec or something else?

You could also run this command to try to spot if something funny happens in your network during these increased latencies:

monitor bandwidth interface *

After it started press “i” and “d” to get more stats.

Use up/down arrow to select interface and press esc followed by “y” to exit.

1 Like