Performance question for tcp throughput

tomcat667 · April 18, 2024, 1:46pm

Hi,

I have a question about performance, how many CPU cores do I need for Vyos to get approx. 25 Gbgit per TCP through the network card?

Is there a guideline value?

greetz

Ralm · April 18, 2024, 1:54pm

I don’t have any 25GbE gear, however I would say that whoever does will need more context regarding the scenario.
Such as what type of traffic (if will be L3 traffic or will you just bridging 2 ports and let traffic flow at L2), what interface offloads you can use, etc.

davespc · April 18, 2024, 3:29pm

Agreed; in general you are limited by the number of packets per second (pps). That is influenced by what features you have enabled (how many firewall rules, NAT, etc), what offloads your NIC hardware is capable of (as Ralm said). Additionally, how many TCP flows are you planning on processing and is there sufficient entropy? Each TCP flow will be handled by a particular CPU, so if you are hoping to get 25Gbps through a single TCP flow you will probably be disappointed (and adding CPU cores won’t help, though faster cores will). Also, how big are the packets you are looking to transfer? Closer to 50 bytes or 1500 bytes?

For what it’s worth, on embedded Jasper Lake Celeron cores, I can get ~400k pps/core with NAT and 16 firewall rules. So, rough math for 25 Gbps at 1500 byte packets, you’d need ~2M pps to saturate 25Gbps or 5 Jasper Lake Celeron cores. But again, your specific configuration and workload matters a lot.

kyle · April 18, 2024, 5:57pm

Testing with silicom XL710 dual 40G card on an E3-1230v3, basic offloads, and no firewalls rules i was able to do 40G iperf without issue from a single host with multiple streams . I have some other VMs on nodes with dual e5-2650v2s that can do 25g without issue on virtio interfaces, although the load is a bit higher on them since its a virtual nic.

Apachez · April 18, 2024, 6:18pm

Unfortunately (in this case) OPNsense is based on FreeBSD while VyOS is based on Linux, so its like comparing a wagyu beef with a maine lobster

But a somewhat of a hint are the performance metrics available for the OPNsense appliances:

https://shop.opnsense.com/dec3800-series-update-2024/

Firewall Throughput: 17.4 Gbps
Firewall Packets Per Second: 1450Kpps

And the above is runned on a:

https://www.deciso.com/netboard-a20/

CPU Model EPYC3201
AMD Embedded EPYC 3201 (Octa Core 1.5Ghz max turbo frequency 3.1Ghz – 30W -No GPU)

https://www.amd.com/en/products/embedded/epyc/epyc-3000-series.html#specifications

Another (still not relevant for what you ask about but can give a hint) is to look at Mikrotik and their CRS518 vs CCR2216 who both uses the same switchchip.

But the CRS518 when pushing routing (fastpath) meaning passing the mgmt cpu gets about 0.5 Gbps (while switching where the packet only touch the switch chip can push in total about 1.2 Tbps):

Architecture: MIPSBE
CPU: QCA9531
CPU core count: 1
CPU nominal frequency: 650 MHz
Switch chip model: 98DX8525

And in their case the CCR2216 use the same switch chip but the mgmt cpu is changed to:

Architecture: ARM 64bit
CPU: AL73400
CPU core count: 16
CPU nominal frequency: 2000 MHz
Switch chip model: 98DX8525

The result then becomes about 69.3 Gbps in total performance for traffic pushed through the mgmt CPU.

In short changing from MIPS to ARM, from 1 core to 16 core and from 650MHz per core to 2000 MHz per core and the performance of routing packets through the CPU went from about 0.5 Gbps in total performance to about 69.3 Gbps.

In your case I would probably go for a Mellanox card instead of an Intel card.

Verify which offloading settings you can apply for each interface.

Go for a as fast single core CPU as possible.

For example the F-series of CPUs from AMD EPYC Genoa series.

https://www.amd.com/content/dam/amd/en/documents/products/epyc/epyc-9004-series-processors-data-sheet.pdf

When it comes to AMD use 12 memory sticks to fully utilize the 12-memorychannels.

There are also a few tweaks for the BIOS like enable “Performance mode” on the other hand this will mostly just consume more power and generate more heat that needs to be cooled off.

tomcat667 · April 19, 2024, 4:47am

I probably need to write a little more about this. Please excuse me
The Vyos is currently running as a VM under KVM. In it there is a Vlan network where VM run and then it goes out into the network via NAT. On the KVM host I use Vswitch for the networks.

The hosts are each equipped with 2x Xeon 6248R
Some have Mellanox mt27710, others have BCM57414 NetXtreme from Broadcom. The mix is due to the shortage of components in 2021/22.
I can see I’ll probably have to deal with it here
SR-IOV and network cards
https://docs.nvidia.com/networking/display/mlnxofedv581011/single+root+io+virtualization+(sr-iov)

sirebral · April 25, 2024, 3:27pm

More like lobster and crawfish. The BSD network stack and drivers are kind of trash in comparison, hence why I’m on Linux, Even if’s a bit less simple to configure, I’ll opt for performance. For example, running 10 gigs through PF takes 100% of my standby box’s CPU, while the same box with more features enabled on my Vyos install takes 1-2% CPU. It’s a massive difference, and I’m running Intel cards, which are supposed to be the best on BSD.

Apachez · April 25, 2024, 4:10pm

I would say you have probably some other malfunction going on if the same baremetal with latest FreeBSD uses 100% CPU to push 10Gbps throughput on a 10G nic and VyOS running current stable linux kernel (assuming VyOS 1.5 rolling) would only use 1-2% CPU to push 10Gbps throughput on the same 10G nic (and same motherboard, CPU, RAM etc).

sirebral · April 25, 2024, 5:02pm

It’s possible, yet I do know the kernel driver support is always way behind, and even when I compile the latest, it’s still the same. I really don’t know what the difference is. The only thing that could be the issue as far as CPU is is that I was doing flow accounting on BSD and don’t currently on Linux. Perhaps if I enabled it, I’d see the same thing. Otherwise, it’s just a flat single-subnet setup; no extra services or routing are enabled on either platform. I can easily push 10 gigs without any struggle on 6x kernel Linux platforms, without offloads, so there’s something that’s uniquely different on BSD’s stack with what is supposed to be an optimal NIC for the platform (500-series Intel). As I see it, with the one exception, there’s no real difference in enabled functionality. Netfilter, as I understand it, is considerably more efficient than PF, so that may also have something to do with it as well. I never really got to the bottom of it, as I have no pain points on VYOS with regard to network performance with little to no tweaking.

I got so tired of struggling with it that I just moved to Vyos. I don’t even need offloads for full performance, and the CPU isn’t even close to breaking a sweat. I’ve seen plenty of other similar stories while trying to bend BSD to my will, yet I could never stop the CPU spikes., even if I severely limited my flows. The box is an appliance form-factor Dual Xeon Cascade Lake, which should have plenty of power, yet it’s a drastically different experience between the two OS’s.

After the initial learning curve of the CLI, I can configure everything the same, and with docker/podman options in Vyos, I get even more flexibility with regard to adding more services to the box. I haven’t tried HA yet on the Vyos platform as I’m waiting on another cluster node to arrive, but once it does, that’ll be my next step: a new KVM host as the primary as it’s a newer generation chip and the current box as it’s partner in crime. I’m really hoping the HA in Vyos is fleshed out; I will find out soon!

Apachez · April 25, 2024, 7:39pm

Netflix uses FreeBSD for their CDN solution named Open Connect so if the difference were 50:1 between Linux vs FreeBSD in network performance on the same hardware I seriously doubt Netflix would have chosen to use FreeBSD at all:

https://papers.freebsd.org/2019/fosdem/looney-netflix_and_freebsd/

Also:

The “other” FreeBSD optimizations used by Netflix to serve video at 800Gb/s from a single server

melroy89 · June 17, 2024, 9:02am

Too bad Deciso removed the page of the netboard A20 from their site.

So here is a working link: https://web.archive.org/web/20240407173559/https://www.deciso.com/netboard-a20/

sirebral · February 11, 2025, 6:20am

The Netflix CDN running on FreeBSD is a solid technical achievement, but it’s not relevant when comparing Linux vs BSD for router/firewall performance. Here’s why:

CDN Workload

Mostly established TCP connections
Streaming large files sequentially
Heavy caching, minimal packet inspection
Optimized for one specific workload pattern
Predictable memory access patterns

Router/Firewall Workload

Complex stateful packet inspection
Connection tracking tables
NAT processing overhead
Multiple concurrent rulesets
Random access patterns across memory
Varied packet sizes and protocols

The Netflix stack is purpose-built for pushing bits out of memory to the wire as fast as possible. That’s it. They’ve optimized the hell out of that specific path.

A router/firewall has to:

Track connection state
Match packets against rule chains
Handle NAT translations
Deal with fragmentation
Process varied protocols
Manage QoS

Different workload, different bottlenecks, different optimization targets.

My experience with BSD/PF in production has shown its real constraints aren’t in raw throughput, but in handling mixed workloads that modern routers face. The single global state table lock becomes apparent under heavy NAT loads, and the sequential rule processing model shows its age when dealing with complex rulesets. While FreeBSD’s PF implementation has improved SMP scaling compared to OpenBSD, both still exhibit bottlenecks under high-connection scenarios that involve serious state tracking.

I’ve particularly struggled with driver support and hardware offloading capabilities. While Linux’s netfilter stack can leverage most modern NIC features, getting equivalent performance out of BSD often requires more careful hardware selection, considerable tweaking, and/or feature tradeoffs.

blunden · March 26, 2025, 8:35am

That could certainly be the case. It would explain why I keep seeing people (mostly home users) with performance problems on BSD based router/firewall distros when trying to run 10 or 25 Gbps on somewhat reasonably priced hardware.

In terms of NIC driver support being subpar, that part is undeniably true. Unfortunately, unless there is some sudden increase in interest in BSD I don’t think that will change. The current trend is companies leaving BSD though, not the other way around. Still, BSD has better than VMware ESXi for certain older NICs though (because VMware removed them).

Lets hope things improve in BSD land. Having only Linux would be a bit boring. BSD related projects also heavily benefit Linux and other projects as well.