Slower download speeds after rebuilding machine from Vyatta to VyOS

We recently upgraded two of our routers from Vyatta (VSE 6.7R12) to VyOS 1.2.4. Both routers run on a HP DL360 G7.

Since the upgrade we have noticed a few irregularities. Firstly, there is a slight packet loss when there was none before. Secondly, we are noticing slower download speeds in general (they tend to fluctuate) when going through IP transit or peering (using same physical 10Gbe NIC)

Overall traffic throughput seems to be the normal in monitoring but there is a definite drop in download speeds.

Any idea what might be up? Or how to troubleshoot this further?

@Bilal Can you share screenshot?
top and press 1.

Hi Viacheslav,

I’ve added a screenshot of top.
We also had monitoring set up and at no point did CPU utilization went above 5%.

The strange thing is that download speeds are back to normal now, and I can’t see the packet loss I was seeing before. No change was made on our end.

To give a bit more info, we are using a 10gig port to connect the router to the switch.
The NIC is on an Intel 82599ES 10-Gigabit SFI/SFP+ card.

It’s the same configuration for the other router as well.

We are seeing download speeds randomly drop to 1-3 Mbps and then go back up.Does not matter if the destination if via Ip-transit or peering.

How do you make tests?
Are you downloading a file locally to the VyOS or is traffic passing through the system to some host?
Can you download the file that is located before your connection with the provider and compare it with the file downloaded from the Internet?
Take a look at the switch logs where the 10G port is connected, errors/discards/etc. Check the CPU load on the switch when the problem recurs.
Do both routers have this behavior after upgade?
You can use:
mtr to check on which hop might be the problem.
Try to gather more information. Including NIC statistics
sudo cat /proc/interrupts | egrep -i "cpu|\rx"
sudo ethtool -S ethX
where ethX - your 10g port.

Hi Viacheslav,

We are testing from machines sitting behind the router and testing traffic going out to various IP transit providers or peering.

We are seeing sudden drops in the download speeds to those files, which then tend to return to normal values after a while. Both routers are exhibiting this behavior post the upgrade. We are still able to get peak speeds but it seems to be that during periods of load this behavior becomes more apparent.

I can see about 1% packet loss on the switch port but it sometimes jumps up to about 7%. This is only for the port going to one of the routers.

I have attached the NIC stats below. FYI we are using a DAC cable going to the 10gig port, eth4.

ethtool results (DAC cable).txt (10.8 KB)

Hello @Bilal, I don’t see any errors or crc errors in ethtool statistics.
Can you run iperf test from some PC/Server in your network to this router?

Hi Dimitry,

I ran some iperf tests and it seems that the results were weird.
I initially ran a test from a device in our office going to the upstream router RT02 and it was maxing out at 500Mbit\s, which is the line speed of the Ethernet link.

[ ID] Interval Transfer Bandwidth Retr Cwnd
[ 4] 0.00-1.00 sec 64.1 MBytes 538 Mbits/sec 307 129 KBytes
[ 4] 1.00-2.00 sec 57.4 MBytes 482 Mbits/sec 258 126 KBytes
[ 4] 2.00-3.00 sec 49.8 MBytes 417 Mbits/sec 367 163 KBytes
[ 4] 3.00-4.00 sec 60.3 MBytes 506 Mbits/sec 297 192 KBytes
[ 4] 4.00-5.00 sec 51.0 MBytes 428 Mbits/sec 202 212 KBytes
[ 4] 5.00-6.00 sec 61.3 MBytes 514 Mbits/sec 338 150 KBytes
[ 4] 6.00-7.00 sec 57.0 MBytes 478 Mbits/sec 154 255 KBytes
[ 4] 7.00-8.00 sec 55.2 MBytes 463 Mbits/sec 122 219 KBytes
[ 4] 8.00-9.00 sec 52.3 MBytes 438 Mbits/sec 254 219 KBytes
[ 4] 9.00-10.00 sec 60.0 MBytes 503 Mbits/sec 363 66.5 KBytes

[ ID] Interval Transfer Bandwidth Retr
[ 4] 0.00-10.00 sec 568 MBytes 477 Mbits/sec 2662 sender
[ 4] 0.00-10.00 sec 566 MBytes 475 Mbits/sec receiver

Running the iperf between the 2 routers only got us 480 Mbit/s, they have a 10G link between them.

[ ID] Interval Transfer Bandwidth
[ 5] 0.00-1.00 sec 46.7 MBytes 392 Mbits/sec
[ 5] 1.00-2.00 sec 51.6 MBytes 433 Mbits/sec
[ 5] 2.00-3.00 sec 90.0 MBytes 755 Mbits/sec
[ 5] 3.00-4.00 sec 60.5 MBytes 507 Mbits/sec
[ 5] 4.00-5.00 sec 51.5 MBytes 432 Mbits/sec
[ 5] 5.00-6.00 sec 61.3 MBytes 514 Mbits/sec
[ 5] 6.00-7.00 sec 47.0 MBytes 394 Mbits/sec
[ 5] 7.00-8.00 sec 44.8 MBytes 376 Mbits/sec
[ 5] 8.00-9.00 sec 49.9 MBytes 419 Mbits/sec
[ 5] 9.00-10.00 sec 64.2 MBytes 538 Mbits/sec
[ 5] 10.00-10.04 sec 1.55 MBytes 310 Mbits/sec

[ ID] Interval Transfer Bandwidth Retr
[ 5] 0.00-10.04 sec 573 MBytes 478 Mbits/sec 58 sender
[ 5] 0.00-10.04 sec 569 MBytes 475 Mbits/sec receiver

Now that it is later in the evening and the load has died down running an iperf between the 2 routers via their inter-connect (eth5) I am now getting speeds of 2 - 4.5 Gbits/sec, which is not near the 10G link capacity.

During times of load on the router I was only getting 0.48 Gbits/sec.

Hi @Bilal, can you try increase ring buffers (like in this article, and run iperf with 4-8 streams, key -P 8 for iperf3?

Hi Dmitry,

The value was set to 512, which I have changed to 4096.

I ran iperf with 8 streams and here is the result I got:
[ ID] Interval Transfer Bandwidth Retr Cwnd
[ 4] 0.00-1.00 sec 1.04 GBytes 8.89 Gbits/sec 129 754 KBytes
[ 4] 1.00-2.00 sec 1.05 GBytes 9.06 Gbits/sec 62 588 KBytes
[ 4] 2.00-3.00 sec 1.07 GBytes 9.17 Gbits/sec 1 829 KBytes
[ 4] 3.00-4.00 sec 1.06 GBytes 9.07 Gbits/sec 302 877 KBytes
[ 4] 4.00-5.00 sec 1.08 GBytes 9.24 Gbits/sec 113 856 KBytes
[ 4] 5.00-6.00 sec 1.06 GBytes 9.10 Gbits/sec 425 721 KBytes
[ 4] 6.00-7.00 sec 1.08 GBytes 9.24 Gbits/sec 207 847 KBytes
[ 4] 7.00-8.00 sec 1.07 GBytes 9.22 Gbits/sec 280 427 KBytes
[ 4] 8.00-9.00 sec 1.08 GBytes 9.28 Gbits/sec 70 916 KBytes
[ 4] 9.00-10.00 sec 1.04 GBytes 8.95 Gbits/sec 160 690 KBytes

[ ID] Interval Transfer Bandwidth Retr
[ 4] 0.00-10.00 sec 10.6 GBytes 9.12 Gbits/sec 1749 sender
[ 4] 0.00-10.00 sec 10.6 GBytes 9.12 Gbits/sec receiver

I was getting similar sort of readings as there is barely any load going through the router.
Let’s see how it performs tomorrow under load, and if this makes any difference.

Hi @Bilal, nice result, take please screenshot when you run test, top command and press 1 and also provide please output of commands:

show interfaces ethernet eth4 statistics 
show interfaces ethernet eth5 statistics 

Hi Dmitry,

Considering it’s almost midnight here, this is the time of minimum load.
The peak speeds tended to fall off during load, let’s see if the increased buffer size makes a cahnge tomorrow.

In the meantime I’ve attached the current top and statistics for eth4 (link going to switch, has most of the traffic) and eth5 (inter-connect between the 2 routers).

eth4 statistics.txt (10.8 KB) eth5 statistics.txt (10.6 KB)

Hi @Bilal, I see many fdir_miss in your output, can you enable ntuple if you will have problem?

sudo ethtool --features eth4 ntuple on
sudo ethtool --features eth5 ntuple on

Be careful, You can get 2-3 sec packet-loss after this commands.

Hi Dmitry,

I’ve enabled ntuple on both interfaces on both routers.
I’ll see how performance goes as load ramps up to peak in the next 2 hours.

Hi Dmitry,

No change, as soon as the load came back up to peak levels everything dropped again.
Inter-connect traffic is down to 0.4 Gbits/sec again :frowning:

How many real cores do you have?
Do you use hyper-threading?
Show current offloads
sudo ethtool -k ethX
Where ethX - interface where is the traffic.

The server has 2 processors with 4 cores each (I believe HT for 8 threads each).
I have attached the results of running the commnad on eth4 and eth5.

As soon as the load really picks up the bandwidth available drops.

At night under minimum load I can get 9.3 Gbits/sec when running iperf3 between the routers but during peak load I can only get about 0.4 Gbits/sec, even though that link has only 200 Mbits/sec flowing on a 10G link.

ethool k eth4.txt (1.7 KB) ethool k eth5.txt (1.7 KB)

Difference iperf vs real traffic with amount of processing packets.
For first you need to exclude logical cores from packet processing.
So you need to find real cores with:
show hardware cpu
Or disable hyper-threading from bios.

eth4 and eth5 it one pci slot or difference (one network adapter or 2/3/4)?
If you have 4 real cores per socket, you need 4 queue for combo rx/tx on eth4 and 4 rx/tx for eth5.
Ideally, each queue should be handled by one real core.

Recommendation for ethtool (The connectivity may disappear for a few seconds)
Disable pause frames (flowcontrol) / disable offloads:
sudo /sbin/ethtool -A eth4 rx off tx off
sudo /sbin/ethtool -K eth4 rx off tx off
sudo /sbin/ethtool -K eth4 tso off
sudo /sbin/ethtool -K eth4 gro off
sudo /sbin/ethtool -K eth4 sg off
sudo ifconfig eth4 txqueuelen 10000

Also check ring buffers:
sudo /sbin/ethtool -g eth4
If current settings less then pre-set maximum, you can increase it.

@Bilal take screenshot please, command top and press 1 when you have issue.
And also provide output

show interfaces ethernet eth4
show interfaces ethernet eth5

How many firewall or NAT rules do you have?
Are you using any VPN tunnels?