We recently upgraded two of our routers from Vyatta (VSE 6.7R12) to VyOS 1.2.4. Both routers run on a HP DL360 G7.
Since the upgrade we have noticed a few irregularities. Firstly, there is a slight packet loss when there was none before. Secondly, we are noticing slower download speeds in general (they tend to fluctuate) when going through IP transit or peering (using same physical 10Gbe NIC)
Overall traffic throughput seems to be the normal in monitoring but there is a definite drop in download speeds.
Any idea what might be up? Or how to troubleshoot this further?
How do you make tests?
Are you downloading a file locally to the VyOS or is traffic passing through the system to some host?
Can you download the file that is located before your connection with the provider and compare it with the file downloaded from the Internet?
Take a look at the switch logs where the 10G port is connected, errors/discards/etc. Check the CPU load on the switch when the problem recurs.
Do both routers have this behavior after upgade?
You can use: mtr 1.1.1.1 to check on which hop might be the problem.
Try to gather more information. Including NIC statistics sudo cat /proc/interrupts | egrep -i "cpu|\rx" sudo ethtool -S ethX
where ethX - your 10g port.
We are testing from machines sitting behind the router and testing traffic going out to various IP transit providers or peering.
We are seeing sudden drops in the download speeds to those files, which then tend to return to normal values after a while. Both routers are exhibiting this behavior post the upgrade. We are still able to get peak speeds but it seems to be that during periods of load this behavior becomes more apparent.
I can see about 1% packet loss on the switch port but it sometimes jumps up to about 7%. This is only for the port going to one of the routers.
I have attached the NIC stats below. FYI we are using a DAC cable going to the 10gig port, eth4.
I ran some iperf tests and it seems that the results were weird.
I initially ran a test from a device in our office going to the upstream router RT02 and it was maxing out at 500Mbit\s, which is the line speed of the Ethernet link.
Now that it is later in the evening and the load has died down running an iperf between the 2 routers via their inter-connect (eth5) I am now getting speeds of 2 - 4.5 Gbits/sec, which is not near the 10G link capacity.
During times of load on the router I was only getting 0.48 Gbits/sec.
I was getting similar sort of readings as there is barely any load going through the router.
Let’s see how it performs tomorrow under load, and if this makes any difference.
Considering it’s almost midnight here, this is the time of minimum load.
The peak speeds tended to fall off during load, let’s see if the increased buffer size makes a cahnge tomorrow.
In the meantime I’ve attached the current top and statistics for eth4 (link going to switch, has most of the traffic) and eth5 (inter-connect between the 2 routers).
The server has 2 processors with 4 cores each (I believe HT for 8 threads each).
I have attached the results of running the commnad on eth4 and eth5.
As soon as the load really picks up the bandwidth available drops.
At night under minimum load I can get 9.3 Gbits/sec when running iperf3 between the routers but during peak load I can only get about 0.4 Gbits/sec, even though that link has only 200 Mbits/sec flowing on a 10G link.
Difference iperf vs real traffic with amount of processing packets.
For first you need to exclude logical cores from packet processing.
So you need to find real cores with: show hardware cpu
Or disable hyper-threading from bios.
eth4 and eth5 it one pci slot or difference (one network adapter or 2/3/4)?
If you have 4 real cores per socket, you need 4 queue for combo rx/tx on eth4 and 4 rx/tx for eth5.
Ideally, each queue should be handled by one real core.
Recommendation for ethtool (The connectivity may disappear for a few seconds)
Disable pause frames (flowcontrol) / disable offloads:
sudo /sbin/ethtool -A eth4 rx off tx off
sudo /sbin/ethtool -K eth4 rx off tx off
sudo /sbin/ethtool -K eth4 tso off
sudo /sbin/ethtool -K eth4 gro off
sudo /sbin/ethtool -K eth4 sg off
sudo ifconfig eth4 txqueuelen 10000
Also check ring buffers: sudo /sbin/ethtool -g eth4
If current settings less then pre-set maximum, you can increase it.