10GBe tuning for best throughput between NAS and workstation

I’m using VyOS as a virtual switch for 10Gbe connection between workstation and NAS. It is in ESXi alongside OPNsense for general purpose routing.

With direct connection between NAS and WS I can have 1 GB/s during SMB file copying. However with VyOS in between, I can only reach ~580MB/s, before tuning it was ~450MB/s (ethtool -K eth1 rx on tx on sg on tso on gso on gro on lro on ntuple on rxhash on; ethtool -G eth1 tx 4096 rx 4096).

Both NAS and WS uses Mellanox ConnectX-3, VyOS has PCIe passthrough of X520 and VMXNET3 for 1GBe LAN intel link shared with OPNsense. All VyOS interfaces (eth0-2) are tied in singe bridge. I have 9000 MTU set on 10GBe links on each side.

The ESXi spec is not high - G4400, 8GB DDR4, however max CPU usage by VyOS during copying is around 6%. VyOS has assigned 2 vCPUs and 2GB RAM.

Is there anything I can do to improve performance? Currently I don’t know where is bottleneck in 10GBe switching. I know I can always upgrade the hardware, but even current limited resources seems not be fully used.

All advises will be appreciated!

Hello @nefph. Could you provide an output of top command and press 1 when you copy something through VyOS?
Did you use VLANs? Try also enable RPS

set interfaces ethernet eth0 offload rps

and set system option to performance throughput

set system option performance throughput 

