So this is a weird one. Some cellular carriers enforce source IP violation rules (see bottom of page 5 for discussion), which means that if any packet makes it onto their network with a source IP other than that assigned to the modem, they disconnect the modem.
I’m having an issue with WAN load balancing where it appears that a packet is occasionally leaked onto the wwan0 interface without NAT. The weird part is that NAT seems to be working fine for packets preceding the leaked packet. Here’s a screenshot of the pcap from the wwan0 interface (.143 is the IP assigned to the WWAN interface):
The client is 10.254.254.11, and you can see that prior to the leaked packet, the client traffic was being NAT’ed. As soon as the leaked packet hits the interface, the cellular provider disconnects the connection and the modem needs to be rebooted to come back online.
To try and mitigate this, I added a redundant NAT masquerade policy (WAN load balancing should do the masquerade already):
When this is active, no client-outbound traffic works at all.
I have confirmed that this is the issue, since when I leave the client PC off and generate traffic from/to the internet using wwan0 on vyos itself, there is no issue.
Linux netfilter will not NAT traffic marked as INVALID. This often confuses people into thinking that Linux (or specifically VyOS) has a broken NAT implementation because non-NATed traffic is seen leaving an external interface. This is actually working as intended, and a packet capture of the “leaky” traffic should reveal that the traffic is either an additional TCP “RST”, “FIN,ACK”, or “RST,ACK” sent by client systems after Linux netfilter considers the connection closed. The most common is the additional TCP RST some host implementations send after terminating a connection (which is implementation- specific).
In other words, connection tracking has already observed the connection be closed and has transition the flow to INVALID to prevent attacks from attempting to reuse the connection.
You can avoid the “leaky” behavior by using a firewall policy that drops “invalid” state packets.
Having control over the matching of INVALID state traffic, e.g. the ability to selectively log, is an important troubleshooting tool for observing broken protocol behavior. For this reason, VyOS does not globally drop invalid state traffic, instead allowing the operator to make the determination on how the traffic is handled.
Thanks for this! Oddly enough, it seems it may not have been the problem. I added a firewall for invalid and it made no difference. Out of frustration, I removed the wan load balancing configuration altogether and this is still happening. I cannot figure it out! I will make a new thread.
In any case, I appreciate the response as this is good information to prevent cellular source ip violations.