After upgrading to a newer image i noticed source nat performance has tanked significantly. Below is a list of images and their performance. Some hosts got up to 200mb/s download while others struggled to get 2mb/s. The same hosts on older images have no problem reaching the full speed of the wan connection. Only download speeds are affected. Tested on two identical machines. Can anyone else replicate this issue? if so i will open a phabricator task for it.
Images tested:
1: 1.4-rolling-202209260217 200mb/s down to 2mb/s
2: 1.4-rolling-202209250705 200mb/s down to 2mb/s
3: 1.4-rolling-202209232340 200mb/s down to 2mb/s
4: 1.4-rolling-202208070707 900+mb/s
5: 1.4-rolling-202206120705 900+mb/s
Just a suggestion - have you checked what offloads are enabled on the old vs the new? It might be a change in kernel enables/disables some offloads by default etc.
Dont see anything standing out. The NIC is an intel X540-AT2 if it matters.
Newest Image
rx-checksumming on
tx-checksumming on
tx-checksum-ip-generic on
tx-checksum-sctp on
scatter-gather on
tx-scatter-gather on
tcp-segmentation-offload on
tx-tcp-segmentation on
tx-tcp-mangleid-segmentation off
tx-tcp6-segmentation on
generic-segmentation-offload on
generic-receive-offload on
large-receive-offload on
rx-vlan-offload on
tx-vlan-offload on
ntuple-filters off
receive-hashing on
rx-vlan-filter on
tx-gre-segmentation on
tx-gre-csum-segmentation on
tx-ipxip4-segmentation on
tx-ipxip6-segmentation on
tx-udp_tnl-segmentation on
tx-udp_tnl-csum-segmentation on
tx-gso-partial on
tx-esp-segmentation on
tx-udp-segmentation on
tx-nocache-copy off
rx-all off
l2-fwd-offload off
hw-tc-offload off
esp-hw-offload on
esp-tx-csum-hw-offload on
rx-gro-list off
rx-udp-gro-forwarding off
Working Image
rx-checksumming on
tx-checksumming on
tx-checksum-ip-generic on
tx-checksum-sctp on
scatter-gather on
tx-scatter-gather on
tcp-segmentation-offload on
tx-tcp-segmentation on
tx-tcp-mangleid-segmentation off
tx-tcp6-segmentation on
generic-segmentation-offload on
generic-receive-offload on
large-receive-offload off
rx-vlan-offload on
tx-vlan-offload on
ntuple-filters off
receive-hashing on
rx-vlan-filter on
tx-gre-segmentation on
tx-gre-csum-segmentation on
tx-ipxip4-segmentation on
tx-ipxip6-segmentation on
tx-udp_tnl-segmentation on
tx-udp_tnl-csum-segmentation on
tx-gso-partial on
tx-esp-segmentation on
tx-udp-segmentation on
tx-nocache-copy off
rx-all off
l2-fwd-offload off
hw-tc-offload off
esp-hw-offload on
esp-tx-csum-hw-offload on
rx-udp_tunnel-port-offload on
rx-gro-list off
It’s a guess but have you tried applying a performance profile to see if it makes a difference? I had a performance issue on 1.3 and followed the offload options path without result (Intel NICs and bonding in use).
Using the latency or throughput options fixed the throughput problem for me:
set system option performance < throughput | latency >