Fairly new to VyOS but pretty familiar with other linux stuff. Recently installed on inexpensive Core2 Duo 3Ghz machine with 4GB ram, 2 times 1Gbit intel ethernet. Started configuring simple Vyos with Source and Destination NAT, DHCP, Simple few FW rules and so.
Everything went well in the beginning (few days) but then I experienced issues of huge performance drop. So like from speedtest of 20ms latency and 300Mbit/100Mbit to 80ms and 25Mbit/25Mbit.
Ok, I thought that maybe there is internet issue, maybe some nat is slowing or so.
Now I have eliminated (a lot of hours) like:
Internet connection works, so if switched to other device it works and when put back to VyOS it is slow
all switches, managed, nonmanaged tested, also cables changed
configuration rolled back to minimum
power saving things removed from pc bios
reinstalled VyOS from beginning
a lot of other
Now probably what is left is that maybe ethernet card just broke down? I’m really out of ideas how to debug the issue. Have not seen other issues than slowness and latency. I have tried to google similar issues but have not found. Why it is that +65ms? Why it shapes to almost exact 25Mbit/25Mbit? Really looks like sw issue but cant figure what is causing that.
I have tried to look from logs and eth statistics + monitoring and figuring out (learning) to understand and fix things but don’t know anymore what to look next. Oc I do not want this to be so that if it is obvious hw error then it should be replaced but the behavior looks really odd to my experience.
Any tips is welcome and will post my findings here when I solve this.
Even when shaped at 25/25 , ping time should be way better when link is idle.
Use traceroute to 8.8.8.8 (windows pathping) to get a clue of the hop causing delay
This was really good, I kind of had tested it before but not this clearly understood it.
So when I ping some reliable like ftp.funet.fi in Finland from the VyOS itself and run speedtest from Windows client, the ping latency is ok and not affected much. It is like 24ms and goes up to 40 or 60ms. If speedtest would go fastest pace it would be hundreds of ms.
And from windows tracert it reveals it
tracert 8.8.8.8
Tracing route to dns.google [8.8.8.8]
over a maximum of 30 hops:
1 16 ms 103 ms 103 ms 192.168.100.x
2 82 ms 103 ms 103 ms ww-xx-yy-zz.aaaaaaaa.fi [ww.xx.yy.zz]
3 * * * Request timed out.
4 103 ms 118 ms 91 ms 10.64.192.25
5 140 ms 46 ms 127 ms 213.192.186.82
6 57 ms 98 ms 119 ms ae3.bbr1.hel2.fi.ip4.elisa.net [213.192.186.81]
7 107 ms 118 ms 99 ms 213.192.184.95
8 107 ms 126 ms 100 ms 213.192.185.93
9 114 ms 99 ms 98 ms 142.250.213.179
10 118 ms 107 ms 91 ms 209.85.241.29
11 64 ms 100 ms 101 ms dns.google [8.8.8.8]
ok now I got some clarify, don’t know though yet how to fix it.
changed external pci card to different as I thought it was faulty. Booted, configured (eth1 changed to eth2) and run speedtest and thought it is now fixed as good throughput and latency. Then once I was wrapping it up heard beep from the machine and found out from log…
ok, so I think what affected it was putting usb device to one of the USB ports in front of the pc (USB 1&2), like keyboard and that sharing IRQ with eth1
did not put irqpoll yet on kernel parameters but changed the keyboard to back of the pc.
That behavior was weird, maybe it does some polling after that to pci device when it is in “failsafe” mode with IRQ:s and that makes it +65ms and “shaping” to 25Mbps