1.3 RC1 Performance regression, high CPU usage

Hi there,

Surprisingly for a few, I was still running 1.2 RC11 and have yesterday upgraded my router to 1.3 RC1 which went smoothly.
I’m using the exact configuration and have done 0 changes to it.

However, while testing, I’ve noticed a big performance regression on 1.3 compared to 1.2.

While doing a Speedtest (speedtest.net) I’m having a lot higher CPU usage on the download phase, resulting in the router not even hitting 1Gbit/s.

Here is my test with 1.3 RC1
Speedtest by Ookla - The Global Broadband Speed Test

Then I switched the image back to 1.2 RC11
Speedtest by Ookla - The Global Broadband Speed Test

As you can see, what I think to be the interrupts for the NIC (ksoftirqd) have a much higher CPU usage on 1.3, resulting on Vyos reaching only around 690Mbit/s download from the internet.
While on 1.2, there is CPU performance to spare and I can do easily 920Mbit/s (essentially maxing out 1Gbit, has a lower result due to overhead).

I’m calling this a performance regression mostly because it’s using the exact configuration from version 1.2 to 1.3 results on a CPU usage increase of almost double.

My router is using a Intel Celeron N3150 with an Intel ET2 quad port NIC.
image

It’s probably because your Vyos 1.2rc didn’t have the Spectre/Meltdown mitigations, while the 1.3 Vyos kernel does.

There’s some discussion in this thread about it, see if that helps you.

Are those mitigations present in the final 1.2?
I can try to install the lasted ISO and test it.

Yes, they are:

tim@ferrari:~$ grep . /sys/devices/system/cpu/vulnerabilities/*
/sys/devices/system/cpu/vulnerabilities/itlb_multihit:Not affected
/sys/devices/system/cpu/vulnerabilities/l1tf:Mitigation: PTE Inversion
/sys/devices/system/cpu/vulnerabilities/mds:Mitigation: Clear CPU buffers; SMT Host state unknown
/sys/devices/system/cpu/vulnerabilities/meltdown:Mitigation: PTI
/sys/devices/system/cpu/vulnerabilities/spec_store_bypass:Mitigation: Speculative Store Bypass disabled via prctl and seccomp
/sys/devices/system/cpu/vulnerabilities/spectre_v1:Mitigation: usercopy/swapgs barriers and __user pointer sanitization
/sys/devices/system/cpu/vulnerabilities/spectre_v2:Mitigation: Full generic retpoline, IBPB: conditional, IBRS_FW, STIBP: disabled, RSB filling
/sys/devices/system/cpu/vulnerabilities/srbds:Unknown: Dependent on hypervisor status
/sys/devices/system/cpu/vulnerabilities/tsx_async_abort:Not affected

Hm ok, yeah might be it.

On 1.2RC11 I have this:

vyos@vyos:~$ grep . /sys/devices/system/cpu/vulnerabilities/*
/sys/devices/system/cpu/vulnerabilities/l1tf:Not affected
/sys/devices/system/cpu/vulnerabilities/meltdown:Vulnerable
/sys/devices/system/cpu/vulnerabilities/spec_store_bypass:Not affected
/sys/devices/system/cpu/vulnerabilities/spectre_v1:Mitigation: __user pointer sanitization
/sys/devices/system/cpu/vulnerabilities/spectre_v2:Mitigation: Full generic retpoline

I can see that “meltdown” is marked as Vulnerable instead of mitigated.

In 1.3, is there a way to disable the mitigations?

Yeah, follow the thread I linked :slightly_smiling_face:

Well, I just followed the thread and it wasn’t very clear on what was supposed to do since it was a troubleshooting thread with a lot of back and forth.

I’ve done some research, along side with what was mentioned in the thread and I still don’t know if mitigations are actually turned off or not.

I have to agree with this request, that this should be a lot easier:
:anchor: T3001 Disable spectre mitigation patches from CLI (vyos.net)

The steps I’ve took were:

  1. Edit /boot/grub/grub.cfg and modified both VyOS 1.3.0-rc1 (KVM console) and VyOS 1.3.0-rc1 (Serial console) to include mitigations=off after the boot=live.
  2. Reboot Vyos and still no improvements in performance.
  3. Check cat /proc/cmdline and I get:
    BOOT_IMAGE=/boot/1.3.0-rc1/vmlinuz boot=live mitigations=off quiet rootdelay=5 noautologin net.ifnames=0 biosdevname=0 vyos-union=/boot/1.3.0-rc1 console=ttyS0,9600 console=tty0
  4. Check grep . /sys/devices/system/cpu/vulnerabilities/* and not clear if mitigations are active or not:
    /sys/devices/system/cpu/vulnerabilities/itlb_multihit:Not affected
    /sys/devices/system/cpu/vulnerabilities/l1tf:Not affected
    /sys/devices/system/cpu/vulnerabilities/mds:Vulnerable; SMT vulnerable
    /sys/devices/system/cpu/vulnerabilities/meltdown:Vulnerable
    /sys/devices/system/cpu/vulnerabilities/spec_store_bypass:Not affected
    /sys/devices/system/cpu/vulnerabilities/spectre_v1:Vulnerable: __user pointer sanitization and usercopy barriers only; no swapgs barriers
    /sys/devices/system/cpu/vulnerabilities/spectre_v2:Vulnerable, STIBP: disabled
    /sys/devices/system/cpu/vulnerabilities/srbds:Not affected
    /sys/devices/system/cpu/vulnerabilities/tsx_async_abort:Not affected
    
  5. Ran sudo /usr/sbin/update-initramfs -u got the following in the console:
    update-initramfs: Generating /boot/initrd.img-5.4.99-amd64-vyos
    live-boot: core filesystems devices utils udev wget blockdev.
    
  6. Reboot Vyos again, checked cat /proc/cmdline and mitigations=off are still there. Still same poor performance:

Could you give me some light on how to confirm if the mitigations are actually off?
Or are they off and the Specter and Meltdown mitigations are not the reason of the difference in performance vs 1.2?

Thanks.

You have disabled Mitigations.

You can see by the fact that grep . /sys/devices/system/cpu/vulnerabilities/* shows everything now as Vulnerable. Also the fact it’s clear by looking at cmdline it clearly shows the mitigations=off statement.

If that hasn’t fixed it, then something else is also causing the issue/performance regression.

What it might be though, I don’t know now I’m sorry.

Any one else has any idea?

If I was a maintainer of VyOS, I would be in part very concerned with such a performance penalty from a simple update, but it seems that it’s not important.

Try it

set system option performance throughput

Thank you for reaching out.

I’ve set set system option performance throughput
Then did commit and save.
Unfortunately, the issue seems to persist :frowning:

I’ve then rebooted the system, but same results.
CPU capped at around 760Mbit/s.

Any other ideias?

Thank you.

Additionally, I’ve decided to remove the mitigations=off from grub and see the difference.

And with the mitigations off vs on, there is little difference, negligible, 727 vs 760 with mitigations off.

Now I start to wonder if the Intel driver changed from 1.2 to 1.3

The IGB driver changed from 1.2 to 1.3.

1.2 is using IGB driver version 5.4.0-k
1.3 is using IGB driver version 5.6.0-k

I’ve never installed or messed around with drivers, is there a easy easy way for me to switch the driver back to 5.4.0-k in Vyos 1.3 to validate a possible driver issue?

1.2:

1.3

We have tried all the options suggested here, But none of them seems to be working for us. Can anyone from here confirm the issue? Is there a plan to resolve this issue? We are seeing following performance degradation in 1.3 rc release with compare to 1.2.5 LTS release:

  • Throghput between two interfaces is reduced to 2.5 Gbps from 10 Gbps

  • IPSec throughput is reduced to 500 Mbps from 950 Mbps

  • VXLAN throughput is reduced to 1 Gbps from 3.5 Gbps

  • Macsec with vxlan throughput is reduced to 370 Mbps from 1.3 Gbps

Hello @hiteshhapani, yes, I can confirm some issues with high IRQs in a virtual environment. Next week we plan to deploy 1.3RC2 on bare metal and start research deeper. Also, I can confirm this degradation in Debian bullseye (kernel 5.10.x) So, I guess this issue related to kernel

Just to clarify in my case, I’m using bare metal currently.