I found this old case which is describing a problem I am currently having on the latest rolling…
7.5% packetloss of packets under a certain bytes…
happens even on a basic installl…
see the side by side using pings with 214 vs 215…
I should mention this only seems to happen when routing from ethernet to ethernet, in other words,
When I connect thru OpenVPN to a vyos and ping thru it (in openvpn out ethernet) i don’t see the problem.
This is causing issues with ping test software i use which is reporting packetloss. it gets worse going thru two vyos…
@Dmitry I am running under Xen on XCP-NG 8
Originally I was using a fully PV VM. After having this issue, I’ve tried as HVM with PV drivers and without and I have the same issue. Also I tried using the intel NIC mode instead of the default Realtek mode and it does the same thing except the minimum packet size is smaller… the problem occurs using ping sizes up to 222 on the Realtek versus 214 on the intel driver.
This old bug i was referencing is talking about VyOS on AWS, which runs on Xen also from what I recall, so not suprized this is coming up on my own Xen.
Theres no packet drops on any interfaces in the hypervisor.
Also, the show hardware pci command above was run while I had the VM in HVM/PV mode for testing. I have reverted it back to full PV now (like I want it) and now when I run show hardware pci I get zero output.
also, i changed the number of vCPU’s to 5 now (1 per NIC), same results.
I don’t have anything set interfaces ethernet ethx smp-affinity, but i tried that before and it still happens.
also commands like set interfaces ethernet ethx mtu 9000 don’t work, but I would like it to support jumbo frames.
I was running full PV with SR-IOV, and everything was working great until I ran into a complication…
When using SR-IOV, the VMs will not live migrate. I do not need to live migrate my vyos boxes but I need to be able to live migrate my other VMs/servers… and I found out the hard way that when you have vyos on SR-IOV and you have VMs on virtual interfaces that are not SR-IOV, the only thing that gets thru is DHCP, after that, the VMs don’t get any ARP traffic from vyos. So I had to scrap SR-IOV for now, until such time that I can have these hypervisors running all vyos instance and all the VM servers are running on other hypervisors. That will happen eventually as I scale out, but for now, the 4 physical boxes that I have need to have SR-IOV off.
I can confirm I’m seeing this issue with the latest 1.3 rolling, even as far back as the 1.2 December 19 snapshot, specifically over wireguard. It doesn’t happen when using 1.2.5 stable. I’ve seen this on baremetal (protectcli), vmware esxi 6.7U3, and XCP-NG 8.0.
@hammerstud If you look at this ticket (since closed) you can see some commands to run while hitting to packet loss, to see if you’re seeing the kernel see packets being dropped.
The command, specifically, is watch -tn 1 "ifconfig -a | grep -A 5 eth1 | grep 'TX packets' | sed 's/^.* dropped:\\([0-9]\\{1,\\}\\) .*\$/\1/g’
You will need to replace the “eth1” with the specific interface you’re seeing problems with.
Do you see packet drops increasing there when you see the actual ping dropped packets?
A simpler command is to just watch sudo netstat -i
And yes, TX drop does increment when a packet is lost.
Since I have a new environment and the traffic is highly controlled, I can reboot my VYOS box and see there is 0 TX drops. I can run my ping flood with like 250 byte packets and I get 0% loss and no TX drops increment. Then I run the same test with 200 byte packets, and wham 7.5% loss shown in the ping test and the TX drops increment. It shows TX drops on both in and out ethernet interfaces, but not at the same time. About 3.75% on each NIC, resulting in the total 7.5% loss shown in the ping test.
This occurs on a local LAN environment, with No VPN involved and no physical interfaces involved… IE running this on a hypervisor with virtual NICs on internal virtual networks and fully PV VMs, it’s happening, so I’m not even sure if this data ever hits the real physical NIC in the box?
I have also started a thread on the XCP-NG Forum (the hypervisor)
Can’t repro here at home on vyos-1.3-rolling-202005040117-amd64.iso
This is on my laptop running Virtualbox (Connected via wireless even)
My Virtualbox is bridged to the Adapter (running as a virt-io interface), the Adapter got a DHCP address on the LAN. I’m pinging another host on the LAN:
-Create 2 networks I’ll call them Network1 and Network2.
-Create 2 linux VMs (any flavor should work), I’ll call them Server1 and and Server2
-Give Server1 a virtual NIC on Network1 with IP 192.168.1.111/24 gateway 192.168.1.1
-Give Server2 a virtual NIC on Network2 with IP 192.168.2.222/24 gateway 192.168.2.1
-Create a VyOS VM (will run OK from LiveCD) and assign a NIC on both networks.
-On Vyos:
–set interfaces ethernet eth0 address 192.168.1.1/24
–set interfaces ethernet eth1 address 192.168.2.1/24
–commit
–watch netstat -i
-You should observe initially TX drops is 0 on both interfaces
-On Server1:
–ping -f 192.168.2.222 -s 250 -c 1000
You should observe there is 0% packetloss and No TX drops incremented.
–ping -f 192.168.2.222 -s 56 -c 1000
You should now see there is about 7.5% packetloss (round trip) and there are about 37 or 38 TX drops on eth0 and eth1.
Yup I understand your topology/setup now. I don’t have much free time at the moment but I’ll try and repo this over the next day or two and then I think we’ll just log a new Phabricator ticket for it (it must be kernel related)
I installed a fresh copy of Debian 10 Buster and it’s running 4.9.0-8-amd64 Debian 4.19.98-1+deb10u1 (2020-04-27)
I did the test and there are no TX drops at all. I’m going to assume since I had no TX drops with Debian, CentOS, or PFsense, it does seem that VyOS is the only affected distro.
What would it take to update the kernel in rolling release?
FWIW, I ran the same with a Physical Ubiquiti EdgeRouter and no TX drops.
So also interesting lab results:
I’ve been doing iperf3 tests and found that the TX drops showup during iperf tests as well as the ping flood i was using before.
I was able to achieve 3.5Gb/sec forwarding through this VyOS box despite the TX dropped packets.
I dropped in a CentOS router to see the difference and I’m able to achieve full 10Gb/sec forwarded.
So as miniscule of a problem it sounds like to hear me say that 3.75% of small packets below 222 are being dropped, this affects things such as TCP ACK messages and other short messages, and the net result is that I am losing 65% of my possible bandwidth due to this…
I dont know how to get a copy of vyos 1.2.5 stable to test with, I seem to only have 1.1.8 and 1.3-rolling
I am going to try the iperf and ping tests against 1.1.8 again.
I would really love to use vyos in my environment but I can’t honestly go to production running like this.
So this morning I built the following two Rolling Routers, the first with a leg into my LAN.
LAN - ([192.168.0.225/24]-ROLLING-[10.10.10.1/30])====VBOX-BRIDGE====([10.10.10.2/30]-ROLLING)
The first router is simple rolling booted up in LiveISO mode with a /24 on the first interface and a /30 on the second, no other config. The second is also LiveISO with only a /30 configured on its Interface.
From my box 192.168.0.5, pinging 10.10.10.2 (so Router 1 is routing this for me)
I still can’t repro the packet loss I’m afraid, try as I might. I’ve tried many different small values between 56 and 300 for -s and still haven’t lost a packet yet.
I wonder if it’s something to do with the offload settings that are being set - if you disable all offload settings on your NICs does that alter/fix the problem?
I’m not in a position to fire up a Xen host, so I probably can’t help debug this problem any further I’m afraid.