I have a vyos firewall. We are experiencing random accessibility problems (works 2 out of 10 times) to our web server. A nat rule is made from the ETH0 interface on our public ip to the eth1 interface to the internal ip of the server port 443.
The server is a virtual machine on ESXi with VMXNET3 drivers. The version of vyos is: 1.2.1.
Below you will find the connection logs. At the top a failed request and at the bottom an OK request.
“Not OK” packets show responses from the server with the RST flag: RST (reset): Signify the connection is down or maybe the service is not accepting the requests
Requests must first come to the server for this to work. Thus, it looks like VyOS is passing packets to the server, but the server refuses to establish a connection for some reason.
Is there some triangular routing? On first KO log, I only see traffic from the server to client, not the other way around
On 1st OK log, 2-way traffic is OK
Requests from the client to the server web are displayed on the firewall in R. (reset)
Yes Eth0 is WAN interface
I tried with another ip wan and it’s samething.
I have seen situations where WISP CPE equipment randomly(?) injected RST packets on TCP sessions for no apparent reason. Try performing a packet capture writing to a file (eg. tcpdump -i eth0 -w /tmp/eth0-data-capture.cap) and then copy it to your computer and examine it in Wireshark. (Personally, I find data readability much easier in Wireshark than just raw TCPdump output. )
In the case I was troubleshooting, we were able to determine the responsible equipment thanks to the TTL of the injected RST packet (with a spoofed source IP) not matching the TTL of the other packets in the TCP session. For more information on such troubleshooting, see this Medium post by Shriram Sharma.
OK session: the SYN/ACK is likely starting out with an origin TTL of 128, so it’s making 15 hops to get to the client.
Reset session: the RST packet is likely starting out with an origin TTL of 64, so it’s making 12 hops to get to the client.
Also, since the OK traffic seems to be using a TTL of 128 at each end, that would seem to be a dead giveaway of injection, though pinpointing the exact origin beyond this could be difficult, unless you have access to the routers the sessions are traversing. If you do have access, you can do more captures along the path and look at Source MAC addresses to 100% confirm the source of the unwanted RST packets.
I’m curious: if you run a packet capture on your VyOS WAN interface while replicating the problem, are you getting a RST packet too? And if so, what does TTL does that have in comparison to the OK traffic?
FWIW, you can right-click about any packet parameter in the Wireshark details pane and select Apply as a Column for easier visibility and comparison.
Does that mean “interface on VyOS that connects to server”, or “interface on server that connects to VyOS”?
In any case, you’ve got a packet capture with a TTL of 64 on it: so if that’s from your webserver, it’s the one resetting the connection; I have no idea why.
I have a new information to trying help me
I added a new public ip on the eth0 interface. Changed the destination ip on my NAT rule. And I do not encounter any worries.
IP requests KO: X.X.124.197 / 27
IP requests OK: X.X.124.205 / 27
I also tried the problematic IP to a new web server. And the problem is present.