I found my VyOS firewall buried in my VM library yesterday and gave it another shot. The one last thing — two with policy routing — I had to figure out was how to NAT.
Before, I had discovered that whenever it was natting, it started having traffic drops, but the oddest thing is that it is to specific domains only. I’ve ask about it various times but other issues were more important at the time and I was able to work around it anyway.
My infrastructure is largely the same as before, it kind of has to; I’m on an Active Directory domain with zero Windows clients. It’s very mixed, and it must be stable. Infrastructure services are each their own thing or a tiny cluster. DNS, DHCP, RADIUS, NAC, NTP all that. This allows me to quickly eliminate them as causes. I also tried changing NICs: VMXNET3 backed by Intel, Mellanox, Broadcom. Then tried them all out plus some from TP-Link as PCI pass through but it still be droppin’ (I may have just regretted that). Then tried again as SRIOV virtual functions…no improvement.
I discovered, which I mentioned as well the last time I was around here, that using another firewall to handle the PPPoE session and NAT, leaving VyOS as a firewall only — no NAT — eliminates the issue.
So that leaves either NAT of the PPPoE session. I don’t know how to separate the two since I had to create some sort of transparent firewall/bridge and I have no idea how would I get the dynamically assigned address onto VyOS if PPPoE is what gets me to layer 3. So instead, I gathered all the errors I could find to come ask for help.
There weren’t a lot of them, first is a packet capture. I had already had one of these but I didn’t [and still don’t] know what to make of it.
I took a fresh capture, straight from VyOS using tcpdump
without selecting any interface, I just set a capture filter from the machine I was on, towards TCP ports 80 and 443 i.e. ‘src host 10.9.0.32 and (dst port 80 or dst port 443)
’, then cleaned up the normal traffic.
Again, I’m not sure what to look for here, but the only thing that got my attention (besides the timeouts and resets, of course) was the outdated protocol in Client Hello
packets.
Attached is the cleaned up capture; it is the .*\\.pcapng
file ending with a txt
extension stacked on it. It wouldn’t let me attach it otherwise. The other file is VyOS’ CLI output of show conntrack statistics
which was the only other place where it showed errors. show interfaces detail
shows no errors.
Though it just shows some column with unlabeled rows. Hopefully somebody can tell me what do they mean. Neither the PPPoE interface nor the VLAN interface it is on have errors or drops.
droppedtraffic.pcapng.txt (8.6 KB)
run-show-conntrack-statistics.txt (13.4 KB)
Thanks for your help.
— Almost forgot —
It’s version:
Version: VyOS 1.5-rolling-202407100021
Release train: current
Release flavor: generic
Built by: [email protected]
Built on: Wed 10 Jul 2024 02:43 UTC
Build UUID: 931ee68f-2f60-439e-97ab-ae9d2e91b433
Build commit ID: 16753c9d3a6138
Architecture: x86_64
Boot via: installed image
System type: VMware guest
Hardware vendor: VMware, Inc.
Hardware model: VMware7,1
Hardware S/N: VMware-56 4d 8a 02 d4 2f 34 d9-b4 73 c2 78 e5 58 03 da
Hardware UUID: 028a4d56-2fd4-d934-b473-c278e55803da
I know it’s kind of outdated, but it’s been upgraded and rebuilt several times, I don’t think the version at this point will be too much of an issue. I’ll upgrade it if I’m told to, though. Thanks again.