NAT masquerade doesn't work when enabling policy route

Hi Team,

I am running off-path firewall PoC on eve-ng using vyos 1.4.0 epa2. But the pc couldn’t pass through nat on router to access internet node if policy route is enabled.

My topology is as the following:
topology

I run vxlan over bgp on router for VPCs in several subnets, which works fine.
And also I enable policy route to redirect cross subnet traffic to firewall as transparent off-path firewall topology.
The core config on router is:

set interfaces ethernet eth0 address 10.1.1.1/24
set interfaces ethernet eth1 address 10.1.2.1/24
set interfaces ethernet eth4 vif 20
set interfaces bridge br20 address '10.64.2.1/24'
set interfaces bridge br20 member interface eth4.20
set interfaces bridge br20 member interface vxlan102
set policy route CROSS_ZONE interface br20
set policy route CROSS_ZONE rule 11 source address 10.64.0.0/11
set policy route CROSS_ZONE rule 11 destination address !10.64.0.0/11
set policy route CROSS_ZONE rule 11 protocol all
set policy route CROSS_ZONE rule 11 set table 10
set protocols static table 10 route 0.0.0.0/0 next-hop 10.1.2.254
set interfaces ethernet eth8 address 1.1.1.2/24
set nat source rule 2000 outbound-interface name eth8
set nat source rule 2000 source address 10.0.0.0/8
set nat source rule 2000 translation address 'masquerade'
set nat source rule 2000 protocol all
set protocols static route 0.0.0.0/0 next-hop 1.1.1.1

I dived into traffic paths and shows that NAT is not performed on eth8.

14:47:35.529249 eth4  P   IP 10.64.2.11 > 2.2.2.2: ICMP echo request, id 2040, seq 322, length 64
14:47:35.529249 eth4.20 P   IP 10.64.2.11 > 2.2.2.2: ICMP echo request, id 2040, seq 322, length 64
14:47:35.529249 br20  In  IP 10.64.2.11 > 2.2.2.2: ICMP echo request, id 2040, seq 322, length 64
14:47:35.529292 eth1  Out IP 10.64.2.11 > 2.2.2.2: ICMP echo request, id 2040, seq 322, length 64
14:47:35.531928 eth0  In  IP 10.64.2.11 > 2.2.2.2: ICMP echo request, id 2040, seq 322, length 64
14:47:35.531947 eth8  Out IP 10.64.2.11 > 2.2.2.2: ICMP echo request, id 2040, seq 322, length 64

But when I remove the config:

set policy route CROSS_ZONE interface br20

The NAT works well:

14:46:43.353490 eth4  P   IP 10.64.2.11 > 2.2.2.2: ICMP echo request, id 54263, seq 271, length 64
14:46:43.353490 eth4.20 P   IP 10.64.2.11 > 2.2.2.2: ICMP echo request, id 54263, seq 271, length 64
14:46:43.353490 br20  In  IP 10.64.2.11 > 2.2.2.2: ICMP echo request, id 54263, seq 271, length 64
14:46:43.353533 eth8  Out IP 1.1.1.2 > 2.2.2.2: ICMP echo request, id 54263, seq 271, length 64

It looks like the policy route breaks the NAT.

Could you please help on this issue?

So, policy route config and IPS:

## Policy route
set policy route CROSS_ZONE interface br20
set policy route CROSS_ZONE rule 11 source address 10.64.0.0/11
set policy route CROSS_ZONE rule 11 destination address !10.64.0.0/11
set policy route CROSS_ZONE rule 11 protocol all
set policy route CROSS_ZONE rule 11 set table 10

# Routing table used by policy route
set protocols static table 10 route 0.0.0.0/0 next-hop 10.1.2.254

# interfae used for routing entry added in table 10:
set interfaces ethernet eth1 address 10.1.2.1/24

As described in docs, policy route happens before source NAT, which happens almost at the end of the packet flow.

This means that a connection from IP 10.64.2.11 towards internet will hit the policy route, and will use table 10 in order to go out to internet; which means it will try to forward such packet through eth1

Thank you for the explanation.
The route where the traffic goes out from eth1 and comes in from eth0 again before NAT is what I expect, which means the traffic will go through the firewall.

But I don’t quite understand why the NAT on eth8 is not performed when the packet comes back from eth0 and is sent outside from eth8.
The policy route is set on br20 only. However the packet is received on eth0 which doesn’t have any policy route. From my understanding it should be regarded as a new packet which will be sent from eth0 to eth8 directly and in the meanwhile the NAT is applied.
What I am thinking of which may be related to this behavior is the source IP address is still 10.64.2.11 which looks like a packet from br20. I am not sure if this could result in such situation?

It seems something is wrong while applying PBR first, which uses firewall marks and generates and entry in conntrack, and then packet is once again received in other interface and we try to apply NAT, which relies on conntrack.

Adding one generic conntrack ignore rule seems to work, at least in my quick PoC:

vyos@Router# sudo tcpdump -ni any icmp
tcpdump: data link type LINUX_SLL2
tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
listening on any, link-type LINUX_SLL2 (Linux cooked v2), snapshot length 262144 bytes
10:38:37.190828 eth4  P   IP 10.64.2.11 > 8.8.8.8: ICMP echo request, id 11605, seq 1, length 64
10:38:37.190828 eth4.20 P   IP 10.64.2.11 > 8.8.8.8: ICMP echo request, id 11605, seq 1, length 64
10:38:37.190828 br20  In  IP 10.64.2.11 > 8.8.8.8: ICMP echo request, id 11605, seq 1, length 64
10:38:37.190876 eth1  Out IP 10.64.2.11 > 8.8.8.8: ICMP echo request, id 11605, seq 1, length 64
10:38:37.191298 eth0  In  IP 10.64.2.11 > 8.8.8.8: ICMP echo request, id 11605, seq 1, length 64
10:38:37.191581 eth8  Out IP 1.1.1.2 > 8.8.8.8: ICMP echo request, id 11605, seq 1, length 64
10:38:37.216774 eth8  In  IP 8.8.8.8 > 1.1.1.2: ICMP echo reply, id 11605, seq 1, length 64
10:38:37.216784 br20  Out IP 8.8.8.8 > 10.64.2.11: ICMP echo reply, id 11605, seq 1, length 64
10:38:37.216788 eth4.20 Out IP 8.8.8.8 > 10.64.2.11: ICMP echo reply, id 11605, seq 1, length 64

^C
9 packets captured
9 packets received by filter
0 packets dropped by kernel
[edit]
vyos@Router# run show config comm | grep ignore
set system conntrack ignore ipv4 rule 10 inbound-interface 'br20'
[edit]
vyos@Router# 

While debugging, you can check:

## Firewall mangle table used by PBR:
sudo nft list table ip vyos_mangle

## Firewall nat table used by NAT rules:
sudo nft list table ip vyos_nat

## Conntrack live Events; and conntrack entries:
sudo conntrack -E
sudo conntrack -L

I have tested the config:

set system conntrack ignore ipv4 rule 10 inbound-interface 'br20'

It is working pretty well.

I also tried to check the command outputs you’ve provided.

Before applying the new config:

vyos@vyos:~$ sudo nft list table ip vyos_mangle

table ip vyos_mangle {
        chain VYOS_PBR_PREROUTING {
                type filter hook prerouting priority mangle; policy accept;
                iifname "br20" counter packets 27 bytes 2268 jump VYOS_PBR_UD_CROSS_ZONE
        }

        chain VYOS_PBR_POSTROUTING {
                type filter hook postrouting priority mangle; policy accept;
        }

        chain VYOS_PBR_UD_CROSS_ZONE {
                ip daddr != 10.64.0.0/11 ip saddr 10.64.0.0/11 counter packets 27 bytes 2268 meta mark set 0x7ffffff5 return comment "ipv4-route-CROSS_ZONE-11"
        }
}

vyos@vyos:~$ sudo nft list table ip vyos_nat

table ip vyos_nat {
        chain PREROUTING {
                type nat hook prerouting priority dstnat; policy accept;
                counter packets 33 bytes 2772 jump VYOS_PRE_DNAT_HOOK
        }

        chain POSTROUTING {
                type nat hook postrouting priority srcnat; policy accept;
                counter packets 81 bytes 5652 jump VYOS_PRE_SNAT_HOOK
                oifname "eth8" ip saddr 10.0.0.0/8 counter packets 0 bytes 0 masquerade comment "SRC-NAT-2000"
        }

        chain VYOS_PRE_DNAT_HOOK {
                return
        }

        chain VYOS_PRE_SNAT_HOOK {
                return
        }
}

vyos@vyos:~$ sudo conntrack -L

icmp     1 23 src=10.64.2.11 dst=2.2.2.2 type=8 code=0 id=18239 [UNREPLIED] src=2.2.2.2 dst=10.64.2.11 type=0 code=0 id=18239 mark=0 use=1
icmp     1 29 src=10.64.2.11 dst=2.2.2.2 type=8 code=0 id=19775 [UNREPLIED] src=2.2.2.2 dst=10.64.2.11 type=0 code=0 id=19775 mark=0 use=1
icmp     1 21 src=10.64.2.11 dst=2.2.2.2 type=8 code=0 id=17727 [UNREPLIED] src=2.2.2.2 dst=10.64.2.11 type=0 code=0 id=17727 mark=0 use=1
icmp     1 27 src=10.64.2.11 dst=2.2.2.2 type=8 code=0 id=19263 [UNREPLIED] src=2.2.2.2 dst=10.64.2.11 type=0 code=0 id=19263 mark=0 use=1
icmp     1 25 src=10.64.2.11 dst=2.2.2.2 type=8 code=0 id=18751 [UNREPLIED] src=2.2.2.2 dst=10.64.2.11 type=0 code=0 id=18751 mark=0 use=1
icmp     1 19 src=10.64.2.11 dst=2.2.2.2 type=8 code=0 id=17215 [UNREPLIED] src=2.2.2.2 dst=10.64.2.11 type=0 code=0 id=17215 mark=0 use=1
icmp     1 13 src=10.64.2.11 dst=2.2.2.2 type=8 code=0 id=15679 [UNREPLIED] src=2.2.2.2 dst=10.64.2.11 type=0 code=0 id=15679 mark=0 use=1
icmp     1 17 src=10.64.2.11 dst=2.2.2.2 type=8 code=0 id=16703 [UNREPLIED] src=2.2.2.2 dst=10.64.2.11 type=0 code=0 id=16703 mark=0 use=1
icmp     1 15 src=10.64.2.11 dst=2.2.2.2 type=8 code=0 id=16191 [UNREPLIED] src=2.2.2.2 dst=10.64.2.11 type=0 code=0 id=16191 mark=0 use=1

After applying the new config:

vyos@vyos:~$ sudo nft list table ip vyos_mangle

table ip vyos_mangle {
        chain VYOS_PBR_PREROUTING {
                type filter hook prerouting priority mangle; policy accept;
                iifname "br20" counter packets 74 bytes 6216 jump VYOS_PBR_UD_CROSS_ZONE
        }

        chain VYOS_PBR_POSTROUTING {
                type filter hook postrouting priority mangle; policy accept;
        }

        chain VYOS_PBR_UD_CROSS_ZONE {
                ip daddr != 10.64.0.0/11 ip saddr 10.64.0.0/11 counter packets 74 bytes 6216 meta mark set 0x7ffffff5 return comment "ipv4-route-CROSS_ZONE-11"
        }
}

vyos@vyos:~$ sudo nft list table ip vyos_nat

table ip vyos_nat {
        chain PREROUTING {
                type nat hook prerouting priority dstnat; policy accept;
                counter packets 85 bytes 7140 jump VYOS_PRE_DNAT_HOOK
        }

        chain POSTROUTING {
                type nat hook postrouting priority srcnat; policy accept;
                counter packets 133 bytes 10020 jump VYOS_PRE_SNAT_HOOK
                oifname "eth8" ip saddr 10.0.0.0/8 counter packets 28 bytes 2352 masquerade comment "SRC-NAT-2000"
        }

        chain VYOS_PRE_DNAT_HOOK {
                return
        }

        chain VYOS_PRE_SNAT_HOOK {
                return
        }
}

vyos@vyos:~$ sudo conntrack -L

icmp     1 5 src=10.64.2.11 dst=2.2.2.2 type=8 code=0 id=61758 src=2.2.2.2 dst=1.1.1.2 type=0 code=0 id=61758 mark=0 use=1
icmp     1 1 src=10.64.2.11 dst=2.2.2.2 type=8 code=0 id=60734 src=2.2.2.2 dst=1.1.1.2 type=0 code=0 id=60734 mark=0 use=1
icmp     1 0 src=10.64.2.11 dst=2.2.2.2 type=8 code=0 id=60478 src=2.2.2.2 dst=1.1.1.2 type=0 code=0 id=60478 mark=0 use=1
icmp     1 9 src=10.64.2.11 dst=2.2.2.2 type=8 code=0 id=62782 src=2.2.2.2 dst=1.1.1.2 type=0 code=0 id=62782 mark=0 use=1
icmp     1 3 src=10.64.2.11 dst=2.2.2.2 type=8 code=0 id=61246 src=2.2.2.2 dst=1.1.1.2 type=0 code=0 id=61246 mark=0 use=1
icmp     1 8 src=10.64.2.11 dst=2.2.2.2 type=8 code=0 id=62526 src=2.2.2.2 dst=1.1.1.2 type=0 code=0 id=62526 mark=0 use=1
icmp     1 4 src=10.64.2.11 dst=2.2.2.2 type=8 code=0 id=61502 src=2.2.2.2 dst=1.1.1.2 type=0 code=0 id=61502 mark=0 use=1
icmp     1 7 src=10.64.2.11 dst=2.2.2.2 type=8 code=0 id=62270 src=2.2.2.2 dst=1.1.1.2 type=0 code=0 id=62270 mark=0 use=1
icmp     1 2 src=10.64.2.11 dst=2.2.2.2 type=8 code=0 id=60990 src=2.2.2.2 dst=1.1.1.2 type=0 code=0 id=60990 mark=0 use=1
icmp     1 10 src=10.64.2.11 dst=2.2.2.2 type=8 code=0 id=63038 src=2.2.2.2 dst=1.1.1.2 type=0 code=0 id=63038 mark=0 use=1
icmp     1 6 src=10.64.2.11 dst=2.2.2.2 type=8 code=0 id=62014 src=2.2.2.2 dst=1.1.1.2 type=0 code=0 id=62014 mark=0 use=1
icmp     1 12 src=10.64.2.11 dst=2.2.2.2 type=8 code=0 id=63550 src=2.2.2.2 dst=1.1.1.2 type=0 code=0 id=63550 mark=0 use=1
icmp     1 11 src=10.64.2.11 dst=2.2.2.2 type=8 code=0 id=63294 src=2.2.2.2 dst=1.1.1.2 type=0 code=0 id=63294 mark=0 use=1

All the rules look the same. The only difference I can figure out is there is no packet counted in POSTROUTING in vyos_nat.

NAT is only performed on initial packet. In 1st flow.
So sending packet out on eth1 creates entry in conntrack table. Without NAT.
All subsequent packets will use this NAT table entry, so routing the looped back packet doesn’t check NAT rules, but uses conntrack entry instead.

Work-arounds:
-On eth1, use sNAT rule , so packet already gets correct eth8 source address.
-On remote firewall, do source NAT. Then loop-backed traffic in VyOS will be state=new, and normal masquerade will kick in. ( If remote firewall uses eth8 address as sNAT address, you can do without double NAT)

1 Like

@szb3597

I had a similar situation… I wanted nat to happen before policy route…

After much fighting… I stopped nat’ing, got what we needed with a few more firewall rules…

Simplified ipv4 example:

8< -- snip -- >8
set firewall group network-group LOCAL_NETs network '10.20.0.0/16'
set firewall group network-group LOCAL_NETs network '172.16.0.0/16'
set firewall group network-group LOCAL_NETs network '10.120.0.0/16'
set firewall group network-group LOCAL_NETs network '10.121.0.0/16'
set firewall group network-group LOCAL_NETs network '10.192.0.0/16'

set interfaces ethernet eth5 vif 42 policy route 'GWv4_CC'
set interfaces ethernet eth5 vif 515 policy route 'GWv4_CC'
set interfaces ethernet eth5 vif 1921 policy route 'GWv4_CC'

set policy route GWv4_CC rule 9 destination group network-group 'LOCAL_NETs'
set policy route GWv4_CC rule 9 set table 'main'
set policy route GWv4_CC rule 10 description 'Route to CC - A:fa5db2d0 Z:cc353d16'
set policy route GWv4_CC rule 10 set table '10'

set protocols static route 0.0.0.0/0 next-hop 10.20.250.17
set protocols static table 10 route 0.0.0.0/0 next-hop 10.20.245.10

show configuration commands | grep vif | grep -c address
21

(I’m not offering a solution to your problem… just acknowledging I could not get it to work either… )

I had buildings with local tech teams which I was nat’ing to keep people out of their ranges… again no nat, and added some fw rules… everything was great… so only those three vlans get the pbr out the 10.20.245.10…

1.3-rolling 20240215

YMMV