Slow PPPoE throughput, high CPU with APU3C4 and VyOS 1.3 Rolling

Hello there,

last week I installed VyOS for the first time. I chose VyOS to replace OpnSense on my APU3C4 Board (AMD GX-412TC 1GHz, 4GB RAM, 60G mSATA). Sadly, the throughput isn’t much higher than on OpnSense (the main reason I switched). I receive a peak of around 690MBit/s in Downstream and around 500MBit/s in the upstream on speedtest.net. However, since i switched to VyOS I can’t establish an iperf3 session to my ISP. Local, from VyOS to my PC i have ~950Mbit/s with iperf3 and 4 parallel sessions. The firewall has only the default rules as mentioned in the documentation (NAT Masquerade and allow in if there is an outgoing connection first).The line itself is a 1Gbit/s FTTH Connection. This is the result after I applied the optimisation mentioned in this thread.

The APU Board has three ethernet ports provided by an Intel I211AT. ETH0 is the WAN interface for the PPPoE connection, ETH1+2 are configured as a Bond to my Cisco SG300 Switch. ETH0 and ETH1 have 4 HW queues, ETH2 has 2 of them:

ei8ht@brandwall:~$ ls /sys/class/net/eth0/queues/
rx-0 rx-1 tx-0 tx-1
ei8ht@brandwall:~$ ls /sys/class/net/eth1/queues/
rx-0 rx-1 tx-0 tx-1
ei8ht@brandwall:~$ ls /sys/class/net/eth2/queues/
rx-0 tx-0

ei8ht@brandwall:~$ sudo cat /proc/softirqs
CPU0 CPU1 CPU2 CPU3
HI: 0 0 2 1
TIMER: 1738972 3210606 1696851 1477021
NET_TX: 57573 81883 163671 30690
NET_RX: 6350629 3591436 12268594 11800880
BLOCK: 0 0 0 0
IRQ_POLL: 0 0 0 0
TASKLET: 220463 132907 20 54
SCHED: 1422473 2635577 1304981 1102749
HRTIMER: 463 466 469 205
RCU: 965068 1985195 850226 809194

ei8ht@brandwall:~$ sudo ethtool -g eth0
Ring parameters for eth0:
Pre-set maximums:
RX: 4096
RX Mini: 0
RX Jumbo: 0
TX: 4096
Current hardware settings:
RX: 4096
RX Mini: 0
RX Jumbo: 0
TX: 4096

ei8ht@brandwall:~$ sudo ethtool -g eth1
Ring parameters for eth1:
Pre-set maximums:
RX: 4096
RX Mini: 0
RX Jumbo: 0
TX: 4096
Current hardware settings:
RX: 4096
RX Mini: 0
RX Jumbo: 0
TX: 4096

ei8ht@brandwall:~$ sudo ethtool -g eth2
Ring parameters for eth2:
Pre-set maximums:
RX: 4096
RX Mini: 0
RX Jumbo: 0
TX: 4096
Current hardware settings:
RX: 4096
RX Mini: 0
RX Jumbo: 0
TX: 4096

Do you see any further potential for optimization or is the hardware on its limits?

Sadly I’m not allowed to add attachements for the moment. So I can’t show you a screen of top and my configuration.

Kind regards

ei8ht

Hi @ei8ht, can you manually enable RPS, I think in 1.3-rolling it does not enable automatically.

echo "f" > /sys/class/net/eth0/queues/rx-0/rps_cpus
echo "f" > /sys/class/net/eth1/queues/rx-0/rps_cpus
echo "f" > /sys/class/net/eth2/queues/rx-0/rps_cpus

Can you capture screencast top and press 1 when you run a speed test?

Hi @Dmitry,
thanks for your reply.
I hope you don’t mind when I applied this statement also to the rx-1 queue of the interfaces. The following statement is in my /config/scripts/vyos-postconfig-bootup.script:

ethtool -G eth0 tx 4096 rx 4096
ethtool -G eth1 tx 4096 rx 4096
ethtool -G eth2 tx 4096 rx 4096
sudo echo “f” > /sys/class/net/eth0/queues/rx-0/rps_cpus
sudo echo “f” > /sys/class/net/eth1/queues/rx-0/rps_cpus
sudo echo “f” > /sys/class/net/eth0/queues/rx-1/rps_cpus
sudo echo “f” > /sys/class/net/eth1/queues/rx-1/rps_cpus
sudo echo “f” > /sys/class/net/eth2/queues/rx-0/rps_cpus

In the meantime I managed to do some speedtests. Some on speedtest.net, some with a 30GB File diretcly from my ISP (alternative to iperf).

The Result remains the same. I get around 5-600Mbit/s (actual bit less, because lot of people are at home and it’s a shared FAN node from Swisscom).

All 4 cores go up. I think, this is really the HW limit, sadly. Except you have some more Ideas. I’m not that deep into Linux to know such tweaks.

Screencast with wget
Screencast with Speedtest.net

Kind Regards

ei8ht

P.S: Maybe activate webm for upload :slight_smile:

Hi @ei8ht, I think it is possible to disable some spectre patches to add some performance.
Try disable mitigations. Add mitigations=off to active bootable entry in /boot/grub/grub.cfg
In this case, it needs a reboot. These changes could add about 30% performance.