Vyos cpu0 100% ksoftirq+

sonicbx · April 22, 2020, 12:09pm

putting some load on this hex-core vyos box I see CPU0 goes to 100% doing ksoftirq+ meanwhile CPU1-5 are idling.

box1 <-> vyos <-> box2

from vyos to box2 i get iperf tests at 6.4Gb/sec
from box1 to box2 i get iperf tests at 1.75Gb/sec

olofl · April 22, 2020, 12:18pm

Did you try using more than one parallel stream?
iperf3 -c x.x.x.x -P 6

sonicbx · April 22, 2020, 1:44pm

yes i was using -P 3

darconada · April 23, 2020, 9:08am

With 6 Cores you should have 4 tx rx ring buffers activated on the nic card. so you should be able to use 4 CPUs for traffic forwarding.
One of the reason to just being using one could be because transfer was UDP. This was the case?
If not check this commands:
sudo ethtool -S ethX with X your interface.
Are you using vyos on a abremetal instalacion or its a VM ?

sonicbx · April 27, 2020, 1:51pm

this is a Paravirtualized VM, I can choose 1 to 8 vCPUs as necessary.

I will check my processer and RX/TX queue optimizations.

I found this:

and this seems to be the problem I’m experiencing. I am only doing L3, no VLAN or QinQ.

I am seeing TX packet drops too on small packets only. I opened a seperate thread in the bugs forum because I think that relates to a previously resolved issue rearing it’s head again.

these vyos VM have eth0,1,2,3,4 should I just the vCPU cores to 5 to make things balanced?

I noticed when I set the vCPU cores to 8, eth4 doesn’t show up, but it shows up with 7.

ethtool seems to be not available to me probably due to using paravirtualized NIC drivers.

vyos@router3:~$ sudo ethtool -g eth1

Ring parameters for eth1:

Cannot get device ring settings: Operation not supported

vyos@router3:~$

as you requested:
vyos@router3:~$ sudo ethtool -S eth1
NIC statistics:
rx_gso_checksum_fixup: 0

darconada · April 27, 2020, 3:22pm

On the link you have mention issue and tune parameters are for the ixgbe driver. this means a physical installation using the intel network driver card (or sr-iov on the vm presenting directly the physical interface). They also tune cpu pinnig to interfaces but I guess that what we wamt to check on your enviroment is why is not being used several tx, rx buffers on the same nic.

I have tested a lot vyos in VMware virtual enviroments with the vmxnet3 driver and on this enviroments rx,tx works fine without any special tunning. Also work fine in KVM.

What version of vyos are you using? on wich virtual eviroment?

sonicbx · April 28, 2020, 9:23am

I am using vyos rolling (teststed with one from March 30th and one from yesterday (same problems)

I am running it as a PV on XCP-NG8 (XenServer). I have tried as full HVM and as HVM with PV drivers, and as fully PV and the same issue persists.

It also happens when I use the Xen Backend OpenVSwitch and using Xen Bridging.

Dmitry · April 28, 2020, 9:31am

Can you execute following commands and run test again?

sudo su -l
echo f > /sys/class/net/eth0/queues/rx-0/rps_cpus
echo f > /sys/class/net/eth1/queues/rx-0/rps_cpus

sonicbx · April 28, 2020, 9:40am

done, I ran this for eth0 thru eth4.
same results; thought it seems I have many more TCP retransmits.

This might be related to my other topic having to do with packetloss I think you were also responding to.

I’m not sure what’s going on!

darconada · April 28, 2020, 10:17pm

I have to say… many strange things to narrow the issue. Im not experience with xen virtualization but seems that is the main reason for the issues your are having. IT is possible to run the same test on KVM or vmware? what are the available options for the network nic driver with XEN?

There are many reason to have less performance when the path is server1 --> vyos --> server2 compared to server1 --> server2 directly connection. In a ideal scenario performance could be very similar but never as good as direct conection.

drops on TX is really strange… are much more common in RX buffers. Maybe a missmatch between the TX buffer on the Host compared with the VM. However. you have many things to solve (like variable number on ethx based on the number or cpu)… maybe everything is just related to CPU exhaustion but hard to say …

If I were in your posotion first i will try the setup in KVM (free) or vmware… then once everything is clea on the network setup and vyos config i would move to XEn to compare results and performance…