Nat performance issue after recent update

After upgrading to a newer image i noticed source nat performance has tanked significantly. Below is a list of images and their performance. Some hosts got up to 200mb/s download while others struggled to get 2mb/s. The same hosts on older images have no problem reaching the full speed of the wan connection. Only download speeds are affected. Tested on two identical machines. Can anyone else replicate this issue? if so i will open a phabricator task for it.

Images tested:
1: 1.4-rolling-202209260217 200mb/s down to 2mb/s
2: 1.4-rolling-202209250705 200mb/s down to 2mb/s
3: 1.4-rolling-202209232340 200mb/s down to 2mb/s
4: 1.4-rolling-202208070707 900+mb/s
5: 1.4-rolling-202206120705 900+mb/s

Nat config:
rule 100 {
outbound-interface eth6
source {
address 172.16.0.0/16
}
translation {
address masquerade
}
}

Hardware:
Hardware vendor: Supermicro
Hardware model: X10SLH-N6-ST031
CPU: Intel(R) Xeon(R) CPU E3-1230 v3 @ 3.30GHz
Ram : 32gb

Just a suggestion - have you checked what offloads are enabled on the old vs the new? It might be a change in kernel enables/disables some offloads by default etc.

show interfaces ethernet ethX physical offload

Dont see anything standing out. The NIC is an intel X540-AT2 if it matters.

Newest Image

rx-checksumming               on
tx-checksumming               on
tx-checksum-ip-generic        on
tx-checksum-sctp              on
scatter-gather                on
tx-scatter-gather             on
tcp-segmentation-offload      on
tx-tcp-segmentation           on
tx-tcp-mangleid-segmentation  off
tx-tcp6-segmentation          on
generic-segmentation-offload  on
generic-receive-offload       on
large-receive-offload         on
rx-vlan-offload               on
tx-vlan-offload               on
ntuple-filters                off
receive-hashing               on
rx-vlan-filter                on
tx-gre-segmentation           on
tx-gre-csum-segmentation      on
tx-ipxip4-segmentation        on
tx-ipxip6-segmentation        on
tx-udp_tnl-segmentation       on
tx-udp_tnl-csum-segmentation  on
tx-gso-partial                on
tx-esp-segmentation           on
tx-udp-segmentation           on
tx-nocache-copy               off
rx-all                        off
l2-fwd-offload                off
hw-tc-offload                 off
esp-hw-offload                on
esp-tx-csum-hw-offload        on
rx-gro-list                   off
rx-udp-gro-forwarding         off

Working Image

rx-checksumming               on
tx-checksumming               on
tx-checksum-ip-generic        on
tx-checksum-sctp              on
scatter-gather                on
tx-scatter-gather             on
tcp-segmentation-offload      on
tx-tcp-segmentation           on
tx-tcp-mangleid-segmentation  off
tx-tcp6-segmentation          on
generic-segmentation-offload  on
generic-receive-offload       on
large-receive-offload         off
rx-vlan-offload               on
tx-vlan-offload               on
ntuple-filters                off
receive-hashing               on
rx-vlan-filter                on
tx-gre-segmentation           on
tx-gre-csum-segmentation      on
tx-ipxip4-segmentation        on
tx-ipxip6-segmentation        on
tx-udp_tnl-segmentation       on
tx-udp_tnl-csum-segmentation  on
tx-gso-partial                on
tx-esp-segmentation           on
tx-udp-segmentation           on
tx-nocache-copy               off
rx-all                        off
l2-fwd-offload                off
hw-tc-offload                 off
esp-hw-offload                on
esp-tx-csum-hw-offload        on
rx-udp_tunnel-port-offload    on
rx-gro-list                   off

Also the issue is only specificly for IPs behind nat, If i test using a routed public V4 or V6 address behind the router it works fine.

Can you please check with 2 version:

  • vyos-1.4-rolling-202209151133
  • vyos-1.4-rolling-202209090217

Also, if possible it would be much better if you share full-config so we can try to reproduce this issue

I was able to replicate the issue in both of those versions.
Full config is here edge-1.txt (21.3 KB)

It’s a guess but have you tried applying a performance profile to see if it makes a difference? I had a performance issue on 1.3 and followed the offload options path without result (Intel NICs and bonding in use).

Using the latency or throughput options fixed the throughput problem for me:

set system option performance < throughput | latency >

Unfortunatley that does not seem to make a different.

Doing some more testing having offload lro enabled on the wan interface seem to cause the issue.

1 Like