I’ve recently deployed a few VyOS instances running on Dell PowerEdge R430
CPU: 2x Xeon E5-2637 v3
Memory: 32GB DDR4
NIC’s: 2x Intel X520-DA2
VyOS is running directly on the bare metal, no virtualization layer.
When pinging towards just about anything to/from one of the instances. I will get random ping spike and about 1% packet loss.
When i check pcaps on the target I’m pinging they respond normally with very low time.
But pcap on the box with issues shows the responding icmp packet arriving “late”.
The other instance with exactly the same hardware has no issues.
The ping spikes has no specific intervals.
The other day i rebooted the VyOS instance and the ping spikes and packet loss was gone, but it resumed again today.
Any ideas to what can be the issue?
The box with the issues is running 73 BGP sessions.
RIB entries 1674629, using 307 MiB of memory
Peers 73, using 1555 KiB of memory
Ping results towards 1.1.1.1 (Direct peering with Cloudflare)
64 bytes from 1.1.1.1: icmp_seq=364 ttl=63 time=0.119 ms
64 bytes from 1.1.1.1: icmp_seq=365 ttl=63 time=0.197 ms
64 bytes from 1.1.1.1: icmp_seq=366 ttl=63 time=0.155 ms
64 bytes from 1.1.1.1: icmp_seq=367 ttl=63 time=0.138 ms
64 bytes from 1.1.1.1: icmp_seq=368 ttl=63 time=349 ms
64 bytes from 1.1.1.1: icmp_seq=369 ttl=63 time=0.179 ms
64 bytes from 1.1.1.1: icmp_seq=370 ttl=63 time=0.150 ms
64 bytes from 1.1.1.1: icmp_seq=371 ttl=63 time=0.210 ms
64 bytes from 1.1.1.1: icmp_seq=372 ttl=63 time=0.159 ms
Thank you, but the ping spikes is now back again… I fail to see how it could be hyperthreading as it works after a reboot for 12 hours or so. I guess the reason it worked yesterday was that the interface was restarted when i applied the ring-buffer. So it seems like it’s tied with the uptime of the interface.
When running
ethtool -S ethx
The rx_missed_errors value is 1.5 million. That seems like a problem, but so far my research only led me to the ring-buffer size, which is already increased to maximum.
I guess missed packets (RX or TX) wouldn’t result in higher ping times, but ping time outs.
Do you have the same problem on internal interface?
Internally, mirror switch port, so you can see on sniffer if sent-out packet already is delayed for 300ms.
Yes, same issue on all interfaces. I’ve tried mirroring the port on the switch and there’s no delay. The delay happens when the VyOS box receives the package.
Same issue, even if i ping the next hop address. No matter what interface it is.
I haven’t tried the link you sent but i will take a look at it. But in my opinion ping spikes upwards of 1000 ms shouldn’t be OK no matter what performance profile we’re running.
I will try disabling hyper-threading, if that doesn’t work i will try applying the performance profile.