Destination NAT randomly stops working on version 1.4-rolling-202204240217

Hello,

I am encountering a strange issue with destination NAT running on HyperV on version 1.4-rolling-202204240217.

Config:
$ show configuration commands
set firewall name EdgeAclIn-v4 default-action ‘drop’
set firewall name EdgeAclIn-v4 rule 10 action ‘accept’
set firewall name EdgeAclIn-v4 rule 10 state established ‘enable’
set firewall name EdgeAclIn-v4 rule 10 state related ‘enable’
set firewall name EdgeAclIn-v4 rule 12 action ‘accept’
set firewall name EdgeAclIn-v4 rule 12 protocol ‘icmp’
set firewall name EdgeAclIn-v4 rule 20 action ‘drop’
set firewall name EdgeAclIn-v4 rule 20 state invalid ‘enable’
set firewall name EdgeAclIn-v4 rule 101 action ‘accept’
set firewall name EdgeAclIn-v4 rule 101 destination address ‘10.0.0.4’
set firewall name EdgeAclIn-v4 rule 101 destination port ‘80’
set firewall name EdgeAclIn-v4 rule 101 protocol ‘tcp’
set interfaces ethernet eth0 address ‘10.97.14.211/24’
set interfaces ethernet eth0 firewall in name ‘EdgeAclIn-v4’
set interfaces ethernet eth0 hw-id ‘00:15:5d:0e:01:0a’
set interfaces ethernet eth1 address ‘10.0.0.1/24’
set interfaces ethernet eth1 hw-id ‘00:15:5d:0e:01:0b’
set interfaces loopback lo
set nat destination rule 108 description ‘http’
set nat destination rule 108 destination port ‘80’
set nat destination rule 108 inbound-interface ‘eth0’
set nat destination rule 108 protocol ‘tcp’
set nat destination rule 108 translation address ‘10.0.0.4’
set nat source rule 100 outbound-interface ‘eth0’
set nat source rule 100 protocol ‘all’
set nat source rule 100 source address ‘10.0.0.0/24’
set nat source rule 100 translation address ‘masquerade’
set protocols static route 0.0.0.0/0 next-hop 10.97.14.254
set service ssh listen-address ‘10.97.14.211’
set service ssh port ‘22’
set system config-management commit-revisions ‘100’
set system conntrack modules ftp
set system conntrack modules h323
set system conntrack modules nfs
set system conntrack modules pptp
set system conntrack modules sip
set system conntrack modules sqlnet
set system conntrack modules tftp

What I observe is the ability to TCP ping the DNAT rule randomly starts and stops with no change to the configuration.
Connecting to 10.97.14.211:80: from 10.97.14.1:52322: 0.41ms
Connecting to 10.97.14.211:80: from 10.97.14.1:52329: 0.41ms
Connecting to 10.97.14.211:80: from 10.97.14.1:52331: 0.39ms
Connecting to 10.97.14.211:80: from 0.0.0.0:52336:
This operation returned because the timeout period expired.
Connecting to 10.97.14.211:80: from 0.0.0.0:52358:
This operation returned because the timeout period expired.
Connecting to 10.97.14.211:80: from 0.0.0.0:52374:
This operation returned because the timeout period expired.

Using tcpdump, I see the inbound TCP SYN packet on eth0. Ignore the port differences with the client above as I was taking these snapshots at different times.
15:31:13.474606 IP 10.97.14.1.52174 > 10.97.14.211.http: Flags [S], seq 2786528018, win 64240, options [mss 1460,nop,wscale 8,nop,nop,sackOK], length 0
15:31:13.474849 IP 10.97.14.211.http > 10.97.14.1.52174: Flags [S.], seq 2550764189, ack 2786528019, win 64240, options [mss 1460,nop,nop,sackOK,nop,wscale 7], length 0
15:31:13.474971 IP 10.97.14.1.52174 > 10.97.14.211.http: Flags [.], ack 1, win 8212, length 0
15:31:13.475088 IP 10.97.14.1.52174 > 10.97.14.211.http: Flags [F.], seq 1, ack 1, win 8212, length 0
15:31:13.475324 IP 10.97.14.211.http > 10.97.14.1.52174: Flags [F.], seq 1, ack 2, win 502, length 0
15:31:13.475381 IP 10.97.14.1.52174 > 10.97.14.211.http: Flags [.], ack 2, win 8212, length 0
15:31:14.472702 IP 10.97.14.1.52175 > 10.97.14.211.http: Flags [S], seq 961444245, win 64240, options [mss 1460,nop,wscale 8,nop,nop,sackOK], length 0
15:31:14.472935 IP 10.97.14.211.http > 10.97.14.1.52175: Flags [S.], seq 4042366975, ack 961444246, win 64240, options [mss 1460,nop,nop,sackOK,nop,wscale 7], length 0
15:31:14.473061 IP 10.97.14.1.52175 > 10.97.14.211.http: Flags [.], ack 1, win 8212, length 0
15:31:14.473120 IP 10.97.14.1.52175 > 10.97.14.211.http: Flags [F.], seq 1, ack 1, win 8212, length 0
15:31:14.473298 IP 10.97.14.211.http > 10.97.14.1.52175: Flags [F.], seq 1, ack 2, win 502, length 0
15:31:14.473343 IP 10.97.14.1.52175 > 10.97.14.211.http: Flags [.], ack 2, win 8212, length 0
15:31:15.479910 IP 10.97.14.1.52178 > 10.97.14.211.http: Flags [S], seq 452746413, win 64240, options [mss 1460,nop,wscale 8,nop,nop,sackOK], length 0
15:31:16.494273 IP 10.97.14.1.52178 > 10.97.14.211.http: Flags [S], seq 452746413, win 64240, options [mss 1460,nop,wscale 8,nop,nop,sackOK], length 0
15:31:18.506138 IP 10.97.14.1.52178 > 10.97.14.211.http: Flags [S], seq 452746413, win 64240, options [mss 1460,nop,wscale 8,nop,nop,sackOK], length 0
15:31:20.495132 IP 10.97.14.1.52179 > 10.97.14.211.http: Flags [S], seq 1671964920, win 64240, options [mss 1460,nop,wscale 8,nop,nop,sackOK], length 0
15:31:21.509607 IP 10.97.14.1.52179 > 10.97.14.211.http: Flags [S], seq 1671964920, win 64240, options [mss 1460,nop,wscale 8,nop,nop,sackOK], length 0
15:31:23.517295 IP 10.97.14.1.52179 > 10.97.14.211.http: Flags [S], seq 1671964920, win 64240, options [mss 1460,nop,wscale 8,nop,nop,sackOK], length 0
15:31:25.495504 IP 10.97.14.1.52182 > 10.97.14.211.http: Flags [S], seq 3213357895, win 64240, options [mss 1460,nop,wscale 8,nop,nop,sackOK], length 0
15:31:26.504653 IP 10.97.14.1.52182 > 10.97.14.211.http: Flags [S], seq 3213357895, win 64240, options [mss 1460,nop,wscale 8,nop,nop,sackOK], length 0

If I run tcpdump on eth1, I do not see the translated packets (or any packets on port 80) when the DNAT fails.

$ sudo tcpdump -i eth1 port 80
tcpdump: verbose output suppressed, use -v[v]… for full protocol decode
listening on eth1, link-type EN10MB (Ethernet), snapshot length 262144 bytes
^C
0 packets captured
0 packets received by filter
0 packets dropped by kernel

Anyone have any idea what could be going on? What other data could I collect to debug this further?

Did you try removing firewall and just evaluating dnat/snat?

del interfaces ethernet eth0 firewall in

Thanks for your reply. I think I figured it out. It looks like the target 10.0.0.4 was bring down its interface for some reason and the router then couldn’t resolve the mac per arp -a:

? (10.0.0.4) at on eth1

Sorry for the randomization, I think this can be closed out.

This topic was automatically closed 2 days after the last reply. New replies are no longer allowed.