Ipoe-server: ksoftirqd/n at 100% with moderate traffic

lxc · April 4, 2025, 10:34pm

Hello,
I am in the process of evaluating vyos (VyOS 1.5-stream-2025-Q1) as an ipoe BNG. I have a simple configuration that just accepts dhcp and starts serving users.
On my first tests I was using a small N100 based pc, i226, and could push about 700kpps (128 bytes) through it with bngblaster.

I now moved to a beefier box, with two 18 cores (36 total, MT disabled) xeon, a dual port i710 and a dual Connectix-3 10Gb/s NICs. Vyos running on bare metal.

With the same configuration as the N100 I can hardly reach 200kpps before one ksoftirqd/n process tops at 100% and starts dropping packets. Other cores are almost idle. Synthetic traffic is 50 users with a few streams each.

This happens with any combination of access/network interface between the dual intel and dual mellanox cards. The core that gets loaded is always the same on different runs, changes when I change ports roles

[edit service ipoe-server]
set authentication mode 'noauth'
set client-ip-pool IPOE range '100.100.0.192-100.100.0.249'
set default-pool 'IPOE'
set gateway-address '100.100.0.254/26'
set interface eth4.1010 mode 'l2'
set interface eth4.1010 network 'vlan'
set interface eth4.1010 vlan '1000-1200'
set interface eth4.1010 vlan-mon

[edit interfaces]
set ethernet eth4 mtu '9214'
set ethernet eth4 offload gro
set ethernet eth4 offload gso
set ethernet eth4 offload sg
set ethernet eth4 offload tso
set ethernet eth4 vif-s 1010 protocol '802.1q'
set ethernet eth5 address '100.100.1.1/30'
set ethernet eth5 mtu '9214'
set ethernet eth5 offload gro
set ethernet eth5 offload gso
set ethernet eth5 offload sg
set ethernet eth5 offload tso

Firewall/NAT doesn’t seem to be a factor. I also have option performance throughput set, also not changing much.
Everything seems to point out at softirqs being well distributed, /proc/softirqs show balance for pretty much everything.

I am confused.

tjh · April 6, 2025, 3:35am

Hi @lxc ! Welcome to the VyOS Forums.
My first suggestion would be to try turning off some of those offloads, they don’t always help.
Also try increasing the ring buffer size on the cards:

Use this to see what they are:

sudo ethtool -g ethX where X is the card.

If it is something low like 128, try setting it up to 1024, 2048 etc. Just a caution that higher values lead to higher latency.

To set it

sudo ethtool -G ethX tx 1024 rx 1024 again where X is the card.

You could also try turning off the kernel exploit mitigations which are known to slow things down.

set system option kernel disable-mitigations

Note you will need to commit and reboot for that to take effect.

Hope something here helps, please report back and let us know how you get on!

lxc · April 7, 2025, 8:39am

Hello @tjh and thanks for the help!

I tried disabling kernel mitigation and it’s definitely affecting the throughput, but, well, it’s going from ~220kpps to ~280kpps before the same ksoftirqd/n process hits 100% (and starts dropping frames). Seems like incremental optimization, but not affecting the core issue.

The ring buffer was already at 1024, tried larger values, not affecting it much. The same about the offloads.

What puzzles me is that on the smaller machine I can see the ksoftirqd processes grow in cpu utilization, but it’s all of them, not just one.

Something is different on the bigger box, and it’s not balancing cores properly, but can’t find where to look for it.

[Edit: with some squeezing, and accepting 0.2% packet loss, I can get to ~380kpps. Still, I would expect much more from this box, and definitely non bound to single core performance]

Apachez · April 7, 2025, 9:54am

As a troubleshooting remove all these offloading options and add them one by one to find out which have actually a positive effect or not on your box. Dont forget to reboot in between.

You can also (as troubleshoot) try to set mtu to default 1500 to see if that changes anything.

I am assuming you are running the VyOS on baremetal, right (as in not in as a VM-guest)?

Give or take 250kpps is what I would expect per core when doing interruptbased routing (modern CPU’s should be able to push more). Next step is to enable polling (about 1Mpps per core) and after that DPDK (about 10Mpps per core).

lxc · April 7, 2025, 10:06am

@Apachez yes. I agree that 250-350 kpps per core is what I would expect from it. And what I get from the N100, give or take.

I tried different combinations for offloading and with that, no mitigation, no nat, no firewall, I can get to almost 400kpps per core (just edited the previous post). Except it’s using only one core, that’s also my overall throughput.

Vyos is running baremetal on a Supermicro X10DRU-i+ and two Xeon E5-2699 v3.

I will try with a different mtu, but as a side note I am testing pps, so packet size is 64 bytes.

[Edit: mtu might have some marginal effect, but still not changing things]

Also: there might be some threshold/non linear effect with traffic. With a given configuration, a 280kpps traffic, the load on the affected core is 2-4%. At 290kpps is 50%. At 300kpps is 100%.

Apachez · April 7, 2025, 10:22am

Since you are using IPOE question might be if thats compiled/configured for multicore support?

Its not uncommon that for many applications a single stream will only be able to utilize a single core.

If you do regular routing without IPOE etc in between will it then be able to fully utilize all available cores?

lxc · April 7, 2025, 10:37am

ipoe.conf has

[core]
thread-count=36

which is configured automatically. Not sure how I can check if it’s compiled for multicore.

To be honest, I am not even sure how accel-ppp handles IPOE. Given the autoconfiguration of vlans (“vlan-mon”) and DHCP, the rest should be plain IP forwarding handled by the kernel? I see an “ipoe” kernel module, but does that mean the module is responsible for forwarding frames?

Apachez · April 7, 2025, 10:43am

When you are doing the tests is it with live clients or some kind of synthetic tests using iperf2 or so?

If possible try running multiple streams at once to see if they will be better loaded between available cores (threads) or not.

Viacheslav · April 7, 2025, 11:10am

As far as I know, upstream Intel drivers cannot balance SVLAN between cores from the box.
Correct balancing SVLAN by the driver out of tree driver can help GitHub - serhepopovych/ixgbe at ixgbe-5.19.9/double-vlan

lxc · April 7, 2025, 11:28am

Hello, and thank you!
That’s a good hint. Although the box has an xl710 nic installed, I am currently testing on the mellanox.

Sounds about an easy thing to check, though.

tjh · April 7, 2025, 9:06pm

Have you tried enabling RPS and/or RSS?

It certainly seems like something, probably in the NIC driver you’ve got, isn’t sharing the load over more than one CPU.

For your testing are you using a single source address/MAC address/VLAN? If so, it might be the way your hardware balances that. You might find if you can randomise that more, it might help.

You could also try enabling flow-offload as well, I’ve read other posts (unsure how of how true they are) that conntrack stuff can be the major cause of ksoftirqd issues. That still doesn’t explain why everything’s ending up on a a single core though.

lxc · April 7, 2025, 10:00pm

Thank you.
The environment we will want to insert the bng to has QinQ and one vlan/mac address per user. So my synthetic traffic has one s-vlan with 50 c-vlans in. I assume this is a factor.

IPoE server seems to handle this pretty fine. Except for the current issue.

I tried rfs (does nothing), rps (fails to commit). I also tried rps manually, even in each s-vlan/c-vlan, and disabling all kind of vlan offloading from the hardware. I assumed, maybe incorrectly, that if I disable every kind of offloading I’d get worse performance (per core), but less hardware limitations. That’s what I see with the N100.
I still only get it incrementally better, but still one core up.

Also, if it’s in the nic driver, it’s in both the mlx4 (the kernel one) and the ixgbe because they behave the same.

lxc · April 9, 2025, 5:15pm

Just a quick follow up. rps is a factor, if I make a script that sets rps on each vlan subinterface after it has been created by ipoe-server I can reach about 2M pps before the same core maxes out (which is roughly 10x what i was getting).

Also I now can see other ksoftirqd/n processes having some load, although most is still on the same core.

Moreover, setting rps on this specific machine through the VyOS configuration fails (could be because NUMA?)

foo@vyos# commit

WARNING: could not change "eth4" flow control setting!

[ interfaces ethernet eth4 ]

WARNING: could not change "eth4" flow control setting!

VyOS had an issue completing a command.

We are sorry [...]

Report time:      2025-04-09 17:01:51
Image version:    VyOS 1.5-stream-2025-Q1
Release train:    circinus

Built by:         VyOS Networks Iberia S.L.U.
Built on:         Thu 13 Feb 2025 18:06 UTC
Build UUID:       b38b28e0-a516-4f56-a596-5502ae094d3b
Build commit ID:  5128f5e45cdb73-dirty

Architecture:     x86_64
Boot via:         installed image
System type:      bare metal

Hardware vendor:  Supermicro
Hardware model:   Super Server
Hardware S/N:     0123456789
Hardware UUID:    00000000-0000-0000-0000-002590faaab4

OSError: [Errno 75] Value too large for defined data type

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/libexec/vyos/conf_mode/interfaces_ethernet.py", line 357, in <module>
    apply(c)
  File "/usr/libexec/vyos/conf_mode/interfaces_ethernet.py", line 338, in apply
    e.update(ethernet)
  File "/usr/lib/python3/dist-packages/vyos/ifconfig/ethernet.py", line 449, in update
    self.set_rps(dict_search('offload.rps', config) != None)
  File "/usr/lib/python3/dist-packages/vyos/ifconfig/ethernet.py", line 347, in set_rps
    self._write_sysfs(f'/sys/class/net/{self.ifname}/queues/rx-{i}/rps_cpus', rps_cpus)
  File "/usr/lib/python3/dist-packages/vyos/ifconfig/control.py", line 142, in _write_sysfs
    write_file(filename, str(value))
  File "/usr/lib/python3/dist-packages/vyos/utils/file.py", line 69, in write_file
    raise e
  File "/usr/lib/python3/dist-packages/vyos/utils/file.py", line 61, in write_file
    with open(fname, 'w' if not append else 'a') as f:
OSError: [Errno 75] Value too large for defined data type

noteworthy:
cmd 'ethtool --pause eth4 autoneg on tx on rx on'
returned (out):

returned (err):
netlink error: Invalid argument

[[interfaces ethernet eth4]] failed

Apachez · April 10, 2025, 10:46pm

I would look into the order of how each section is being applied within VyOS.

I currently dont recall where this is located in an installed VyOS box but that would be worth spending a minute or two to locate and change the order in there.

Im thinking so the (if possible) interface offloading commands are runned at the end or at least after ipoe stuff have been runned?

Edit: There it is:

https://docs.vyos.io/en/latest/contributing/debugging.html#priorities

system · May 10, 2025, 10:47pm

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.