WWAN interface dies with client NAT

I originally thought this issue was with load balancing, but it is something else. For some reason, as soon as client traffic which is NAT’ed goes through the wwan0 interface, it dies and does not come back until reboot. This is on 1.4-rolling-20220824, however I confirmed the same behavior on 1.4-rolling-20220916 as well. I cannot figure it out!

Here is the scenario: basic setup with client at 10.254.254.11/24 on eth1, with source nat masquerade on wwan0.

When vyos is booted but client is offline, there is no issue. VyOS can use the wwan0 interface with no problems, including downloading files etc. The instant the client comes online the wwan0 connection stops working, not only for the client but also VyOS. The only way to resolve is to disconnect the client and reboot VyOS.

Here is a look at the packet capture from the client network interface (eth1):

Right after the HTTP request, the connection is dead.

Keeping a constant ping on the vyos router to 1.1.1.1, the ping stops immediately after the client comes online. After a minute or so, “no buffer space available”

Here is the full configuration, minimum to reproduce:

 firewall {
     name drop-invalid {
         default-action accept
         rule 1 {
             action drop
             state {
                 invalid enable
             }
         }
     }
 }
 interfaces {
     ethernet eth0 {
         address dhcp
         hw-id 00:0c:29:03:37:de
     }
     ethernet eth1 {
         address 10.254.254.1/24
         firewall {
             in {
                 name drop-invalid
             }
         }
         hw-id 00:0c:29:03:37:e8
     }
     loopback lo {
     }
     wwan wwan0 {
         address 167.XX.XX.143/27
         apn b2b.static
     }
 }
 nat {
     source {
         rule 200 {
             outbound-interface wwan0
             source {
                 address 10.254.254.0/24
             }
             translation {
                 address masquerade
             }
         }
     }
 }
 protocols {
     static {
         route 0.0.0.0/0 {
             next-hop 167.XX.XX.144 {
                 distance 215
             }
         }
     }
 }
 service {
     dhcp-server {
         shared-network-name lan {
             name-server 1.1.1.1
             subnet 10.254.254.0/24 {
                 default-router 10.254.254.1
                 name-server 1.1.1.1
                 range 0 {
                     start 10.254.254.10
                     stop 10.254.254.250
                 }
             }
         }
     }
     ssh {
     }
 }
 system {
     config-management {
         commit-revisions 100
     }
     conntrack {
         modules {
             ftp
             h323
             nfs
             pptp
             sip
             sqlnet
             tftp
         }
     }
     console {
         device ttyS0 {
             speed 115200
         }
     }
     host-name vyos
     login {
         user vyos {
             authentication {
                 encrypted-password <>
                 plaintext-password ""
             }
         }
     }
     ntp {
         server time1.vyos.net {
         }
         server time2.vyos.net {
         }
         server time3.vyos.net {
         }
     }
     syslog {
         global {
             facility all {
                 level info
             }
             facility protocols {
                 level debug
             }
         }
     }
 }

I just built a 1.3 ISO and the same exact thing happens there. Very confused. Hardware issue?

Does “sudo dmesg” tell you anything?

Nope, nothing at all around the time it happens.

Well, it’s not hardware. I tried it on a completely different set of hardware (NUC instead of Supermicro, same modem model but different modem) and got the same behavior.

Any ideas here? I’m at a loss.

I’m sorry but I have no idea myself. Might be worth a Phabricator ticket if you can easily repro it.