Then under the VYOS 1.3 Equuleus heading on the left, click the “Submit Bug Report” button and fill out the form that is provided.
Please provide as much detail as possible so that it is possible for someone to reproduce it. You can link to this forum topic as well, but please do include good information in the bug report, not just a link to the forum.
Could you please provide more details regarding the machine specs where you ran those tests and saw a performance degradation from 1.2.5 LTS to 1.3 RC?
Also, was it a VM or bare metal install?
Would you be willing to test again with RC4 so I can add this information on the bug report?
IPERF3 between two interfaces on my VYOS VM (Proxmox, J1900 4-core Celron). VM is identical in config and specs, literally just swapping system image between tests.
v1.2.7:
2.90 Gbits/sec
ksoftirqd/0 using ~1% CPU
Likely limited by my host
v1.3.0-rc4:
306 Mbits/sec
ksoftirqd/0 using ~98% CPU
This is a massive performance degradation. About to test on bare metal.
Edit 1:
Same issue with baremetal. This is on an old PC with two NICs. Obviously limited by the 1 gig NICs in this test. 1.3.4 came close to line speed, but ksoftirqd was still using almost 100% CPU.
v1.2.7:
938 Mbits/sec
ksoftirqd/0 using ~3% CPU
v1.3.0-rc4:
905 Mbits/sec
ksoftirqd/0 using ~95% CPU
Edit 2:
Also can confirm the same regression in 1.3 rolling and 1.4 rolling as of today.
Setting GRO alone got my performance back to what I had in Vyos 1.2.
My NIC also supported SG and TSO, that further decreased CPU usage from interrupts significantly (around 40% less CPU usage), which is great.
These reset on reboot, how can I apply these permanently?
@srnoth I was going to, but since I didn’t have an phabricator account yet I delayed it and eventually forgot
Thank you for opening the ticket.
This is what I use, though it’s on Vyos 1.2.7 but I expect the same works on 1.3
tim@ferrari# show interfaces ethernet eth0
duplex auto
mtu 9000
offload-options {
generic-receive on
generic-segmentation on
scatter-gather on
tcp-segmentation on
}
I know this is not the most scientific way to test, but you can see the massive difference, even from 1.2.
With 1.2 (screenshots on the first post), I had 255% CPU usage, now with 1.3 and the offloads enabled I have around 120%, which is freaking incredible. Much better then the 380% on 1.3 without offloads