Which has faster throughput - ( 11 Ethernet en# ports ) -or- ( 2 Ethernet ports where there are 10 Vlans )?

North-Idaho-Tom-Jone · May 31, 2024, 3:36pm

Which has faster throughput - ( 11 Ethernet en# ports ) -or- ( 2 Ethernet ports where there are 10 Vlans ) ?

Example # 1 :
eth0 - WAN ( OSPF area )
eth1 - ( normal simple access port - external switch places this port on a specific access vlan ).
eth2 - ( normal simple access port - external switch places this port on a specific access vlan ).
eth3 - ( normal simple access port - external switch places this port on a specific access vlan ).
eth4 - ( normal simple access port - external switch places this port on a specific access vlan ).
eth5 - ( normal simple access port - external switch places this port on a specific access vlan ).
eth6 - ( normal simple access port - external switch places this port on a specific access vlan ).
eth7 - ( normal simple access port - external switch places this port on a specific access vlan ).
eth8 - ( normal simple access port - external switch places this port on a specific access vlan ).
eth9 - ( normal simple access port - external switch places this port on a specific access vlan ).
eth10 - ( normal simple access port - external switch places this port on a specific access vlan ).

-or-
Example # 2 :
eth0 - WAN ( OSPF area )
eth1 - ( 802.1q trunk - external switch has this port configured as a vlan trunk interface).
eth1.100 - ( Vlan 100 via interface eth1 )
eth1.101 - ( Vlan 101 via interface eth1 )
eth1.102 - ( Vlan 102 via interface eth1 )
eth1.103 - ( Vlan 103 via interface eth1 )
eth1.104 - ( Vlan 104 via interface eth1 )
eth1.105 - ( Vlan 105 via interface eth1 )
eth1.105 - ( Vlan 106 via interface eth1 )
eth1.107 - ( Vlan 107 via interface eth1 )
eth1.108 - ( Vlan 108 via interface eth1 )
eth1.109 - ( Vlan 109 via interface eth1 )

My thoughts:

Example #1 ; uses more hardware resources, more hardware/software I/O buffers, more hardware interrupts interrupts. Packets are smaller ( about 22 bytes smaller ) because the packets are not 802.1q vlan encapsulated. ( Possible less load on the router CPU and the switch does 100-percent of the vlan work ). Harder to add additional eth interfaces on a running system.

Example # 2 ; less hardware resources , fewer hardware/software I/O buffers, fewer hardware interrupts. Packets are larger because all vlan packets are 802.1q encapsulated ( about 22 bytes larger ). ( Possible heaver load on the router CPU because the router software uses additional 802.1q vlan drivers/resources ). Easier to add additional vlan interfaces on a running system.

In the distant past, I’ve done some hardware motherboard interface designs - but I am weak on software interface drivers. I can see advantages of both examples above. However , for a static router configuration that does not change and to achieve the absolute fastest network throughput ( 100 Gig interfaces ) I just do not know… ?

So my questions are which has the fastest throughput :

Bare Metal hardware, which is faster Example 1 or 2 ?
A hypervisor environment ( I use Proxmox these days ), which is faster Example 1 or 2 ?

I look forward to any responses and ideas ?

North Idaho Tom Jones

Apachez · May 31, 2024, 8:57pm

Since VyOS is a software based router (with currently no hardware offloading as in support for switchchips etc) on a microscopic level using 802.1Q (VLAN-tagging) would consume more CPU cycles than without 802.1Q tagging on the network traffic (due to en/decapsulation).

But the main difference would be that pushing all VLANs on a single interface will limit the performance of each VLAN when more than 1 is being used per direction at a specific time.

So example1 would be prefered from a performance point of view (one interface per VLAN).

I/O buffers and interrupts will only occur when a packet is sent or received so that wont matter if you push all VLANs through one NIC or having one VLAN per NIC.

What you eventually will run out of with example1 is amount of PCIe channels depending on which CPU vendor you select. Here AMD EPYC CPUs currently have a great win over Intel CPUs.

Another thing to consider is which NIC speeds you select and which PCIe slots you put them in since not all PCIe slots are 16x (starts to matter when you approach 100G NIC speeds). Also the thing of having a multi interface NIC could screw up things (for the scenario where all interfaces push maxload at the same time at full duplex). As in you have to do your due dilligence in which NIC’s you select (single or multi interfaces) along with which PCIe slots you choose to put them in (depends on the motherboard).

So no matter if you run bare metal or through a VM hypervisor - having one interface per VLAN will give you more performance than to push all VLANs through a single interface (at least the interface itself wont be the limiting factor).

North-Idaho-Tom-Jone · May 31, 2024, 9:31pm

Apachez

I like your reply post

some of my thoughts re your post …

re: … 802.1Q (VLAN-tagging) would consume more CPU cycles …
bare metal - I agree , more CPU cycles on the VyOS router
hypervisor - I double agree , more CPU cycles on the VyOS router -and- more CPU cycles on the hypervisor if the hypervisor does not have a physical dedicated access-mode network interface for a specific Vlans in use.
hypervisor with SR-IOV network interfaces - still agree - however still some minor hypervisor cpu overhead

re: … AMD EPYC CPUs currently have a great win over Intel CPUs …
IMO , that depends. High-end AMD CPUs might have more cores to better support multi-threading and more PCIe channels/lanes, but high-end Intel Xeon CPUs have more cache which helps in large route table lookups and large complex firewall configurations.

North Idaho Tom Jones

Apachez · May 31, 2024, 10:04pm

1152MB L3 cache which AMD EPYC 9684X have, which Intel Xeon have that?

https://www.amd.com/en/products/processors/server/epyc/4th-generation-9004-and-8004-series.html#specs

AMD EPYC also have PCIe 5.0 x 128 lanes and 12 memory channels.

The latest Intel Xeon peaks at 80 lanes and 8 memory channels.

So if you want performance today, go with AMD EPYC.

North-Idaho-Tom-Jone · May 31, 2024, 11:27pm

Apachez

I’ve never compared AMD or Intel CPUs for which is has greater maximum throughput when testing router throughput comparisons where it is running on bare metal or a hypervisor.

hmmm - now you’ve got me re-thinking which CPU is better for router throughput…

We are getting ready to order about 13 high-end servers. Was thinking about the fastest Intel CPUs - but now I am wondering which is faster.

My budget was for quanity-13 10-Gig & 100-Gig nics , the newest fastest clock CPU speeds available and 1-TB of RAM ( priorities are throughput first - cost last ).
Four servers would be bare-metal ( VyOS ) routers ( BGP & OSPF ).
The rest would be used for Proxmox hypervisor servers ( just about every VM server a medium sized ISP would use and some additional virtual VyOS OSPF routers ).

Sooo , It’s now worth some investigative research into which CPU ( AMD vs Intel ) will get me the fastest throughput.

North Idaho Tom Jones

Apachez · May 31, 2024, 11:46pm

Here you got some info regarding a 5 year old AMD EPYC 7502P CPU and what it is capable of when it comes to VyOS:

When it comes to performance you can also look at performance over time which Intel have large issues with where every CVE found will bring down the performance even more (and increase the distance to AMD) for every mitigation being performed by the kernel or through microcode updates:

https://security-tracker.debian.org/tracker/source-package/intel-microcode

https://security-tracker.debian.org/tracker/source-package/amd64-microcode

Note that microcode updates also comes through BIOS updates.

The recently released AMD EPYC 4004 series can also be interresting if you are on a budget:

https://www.amd.com/en/products/processors/server/epyc/4004-series.html#specifications

L0crian · June 1, 2024, 2:50pm

You’re going to run into “on paper” vs “real world” stuff when trying to evaluate a lot of this. For instance your VLAN question; you do have to take an extra action on the packet to shim a dot1q header into it. So it is marginally more computationally expensive than just having separate interfaces (on paper), but from experience, you’ll likely not see that penalty when running your router in production. It’s such a simple task computationally, that it isn’t really observed. I’ve easily maxed line rate at 25Gbps with dot1q tagging (for a single flow; could easily max 100G with more parallel flows).

The potential benefits of having multiple interfaces serving a single VLAN for redundancy far outweigh any minor performance penalty. You’d also receive a slight performance penalty by using bonding, but it’s definitely preferable to losing an entire subnet from a single interface failure.

If you’re looking at 100G NICs, I highly suggest looking at Intel NICs. You might start to look at Mellanox/Nvidia, but most of those don’t support breakout. Almost all of the 100G Intel NICs I’ve seen support a wide variety of breakout (4x10, 4x25, 2x50, etc…). You may not even need breakout initially, but having the option later can be beneficial.

Apachez · June 1, 2024, 3:46pm

I think you are confused about the Mellanox capabilities:

15m (49ft) FS for Mellanox MFA7A50-C015 Compatible 100G QSFP28 to 4x25G SFP28 Active Optical Breakout Cable
https://www.fs.com/products/120559.html?attribute=13463&id=3532374

To me Mellanox are currently ahead of Intel in my shortlist.

Intels major issue when it comes to 40G and above is their in-tree vs out-of-tree drivers and crippling use of 3rd party transceivers (as seen in several threads at this forum).

Edit: Its me who were confused, turns out that Mellanox adapters doesnt seem to support breakouts (unless that changed recently) while their switches do.

Ref: Using 100GbE to 4x25GbE breakout cables with ConnectX-5 EX NICs - Adapters and Cables - NVIDIA Developer Forums

L0crian · June 1, 2024, 3:57pm

Yeah, I generally prefer Mellanox; never really had any issues with them and they have broad driver support. But the breakout piece can be a deal breaker for me. Single flows greater than 25Gbps are generally rare, so I’d rather break SR4 into 4x25G instead of a single 100G.

There are some that support breakout, but if you got 10 Mellanox cards and 10 Intel cards, most of the intel will have broad breakout support, and only a few of the Mellanox cards will have it.

Apachez · June 1, 2024, 5:10pm

The breakout capabilities are also nice if you for whatever reason needs to transform a single 100G interface into a 25G or 10G through QSA-adapter (that is if you want to avoid the octopus of cables which a breakout will bring you).

Example:

Customized QSFP28 100G to SFP28 25G Adapter Converter Module with DDMI

L0crian · June 1, 2024, 5:12pm

Yeah, QSA is generally supported on Mellanox. Technically you can use the breakout cables from what I’ve read, and it will only work on one of the interfaces. So I guess if that’s cheaper than QSA adapters, you could go that way too.