Policy route table selection causing NAT?

satwell · April 29, 2024, 3:47am

After upgrading from 1.3 to 1.4, I noticed that some of my internal traffic was getting NATed to the router’s IP on the transit network it’s getting routed over. I finally tracked this down to a policy route rule with set table main.

Here’s my abbreviated config:

set interfaces ethernet eth0 vif 42 address '192.168.42.1/24'
set interfaces ethernet eth0 vif 200 address '10.11.80.2/28'
set interfaces ethernet eth0 vif 200 description 'transit'
set protocols static route 0.0.0.0/0 next-hop 10.11.80.1

There are several other VIFs and Wireguard tunnel interfaces set up, using OSPF for routing. But there is no nat config.

A really basic policy config demonstrates the surprising NAT behavior:

set policy route test interface 'eth0.42'
set policy route test rule 5 set table 'main'
set policy route test rule 5 source address '192.168.42.41'

Now I send a ping from 192.168.42.41 to 192.168.2.8. This will use an OSPF route that sends the traffic via 10.11.80.1 on eth0.200.

vyos@devrouter:~$ monitor traffic interface any filter "icmp and host 192.168.2.8"
20:36:36.086084 eth0.42 In  IP 192.168.42.41 > 192.168.2.8: ICMP echo request, id 21846, seq 1, length 64
20:36:36.086134 eth0.200 Out IP 10.11.80.2 > 192.168.2.8: ICMP echo request, id 21846, seq 1, length 64
20:36:36.086622 eth0.200 In  IP 192.168.2.8 > 10.11.80.2: ICMP echo reply, id 21846, seq 1, length 64
20:36:36.086636 eth0.42 Out IP 192.168.2.8 > 192.168.42.41: ICMP echo reply, id 21846, seq 1, length 64

But if I delete policy route test rule 5 set table main and re-send the ping, I get no NAT:

vyos@devrouter:~$ monitor traffic interface any filter "icmp and host 192.168.2.8"
20:38:14.007456 eth0.42 In  IP 192.168.42.41 > 192.168.2.8: ICMP echo request, id 34232, seq 1, length 64
20:38:14.007484 eth0.200 Out IP 192.168.42.41 > 192.168.2.8: ICMP echo request, id 34232, seq 1, length 64
20:38:14.007984 eth0.200 In  IP 192.168.2.8 > 192.168.42.41: ICMP echo reply, id 34232, seq 1, length 64
20:38:14.007997 eth0.42 Out IP 192.168.2.8 > 192.168.42.41: ICMP echo reply, id 34232, seq 1, length 64

Is this expected behavior? It seems really strange that I’m getting masquerade behavior by explicitly selecting the main routing table.

tjh · April 29, 2024, 4:05am

Do you have a nat source rule for the egress interface?

satwell · April 29, 2024, 4:09am

I don’t have a nat source rule that should apply. I do have a disabled rule for a a different outbound interface, but this shouldn’t apply.

vyos@devrouter# show nat | commands
set source rule 101 disable
set source rule 101 outbound-interface name 'eth0.11'
set source rule 101 source address '192.168.42.0/24'
set source rule 101 translation address 'masquerade'

betty4920taylor · April 29, 2024, 4:13am

Hello Catlikesbest @satwell,

The behavior you’re observing is indeed unexpected, but it can be explained by the interaction between policy-based routing (PBR) and Network Address Translation (NAT).
PBR allows you to selectively route traffic based on defined policies rather than the standard routing table.
In your case, you’ve set up a policy route called “test” that selects traffic from source address 192.168.42.41 and assigns it to the main routing table.
NAT operates at the firewall policy level, not the routing policy level.
When a packet arrives at the router, it first goes through PBR to determine the egress interface.
After PBR, the packet enters the firewall policy processing stage.
If NAT rules are applied, they modify the packet’s source or destination address based on the configured rules.
In your scenario, the policy route rule with set table ‘main’ is causing the surprising NAT behavior.
When the packet from 192.168.42.41 matches this policy route, it gets assigned to the main routing table.
As a result, when the packet exits the router (e.g., via eth0.200), it undergoes NAT based on the main routing table.
This behavior is unexpected because you’d assume that explicitly selecting the main routing table wouldn’t trigger NAT.
To avoid this NAT behavior,
Modify NAT Rules: Adjust your NAT rules to exclude traffic from 192.168.42.41 when it matches the policy route with set table ‘main’.
Use Different Routing Tables: Instead of the main routing table, create a custom routing table specifically for PBR and avoid using the main table for policy routes.
You’ve already verified that removing the policy route rule (set table main) eliminates the NAT behavior.
You can continue testing and adjusting your configuration to achieve the desired behavior.
Remember that PBR and NAT interactions can sometimes lead to unexpected results. It’s essential to carefully review your configuration and consider the order of operations when dealing with complex scenarios like this.

Best Regards,

tjh · April 29, 2024, 4:15am

Seems like a bug then I guess.
Can you paste the relevant config for the policy route from the nft table?

tjh · April 29, 2024, 4:17am

Hi @betty4920taylor they’ve mentioned they have no NAT rule for traffic egressing their main table.

satwell · April 29, 2024, 4:22am

From the vyos_mangle table (ignoring the firewall groups that end up in there):

table ip vyos_mangle {
        chain VYOS_PBR_PREROUTING {
                type filter hook prerouting priority mangle; policy accept;
                iifname "eth0.42" counter packets 6 bytes 312 jump VYOS_PBR_UD_test
        }

        chain VYOS_PBR_POSTROUTING {
                type filter hook postrouting priority mangle; policy accept;
        }

        chain VYOS_PBR_UD_test {
                ip saddr 192.168.42.41 counter packets 6 bytes 312 meta mark set 0x7fffff01 return comment "ipv4-route-test-5"
        }
}

But this just marks the traffic? I’m not sure where this mark gets consumed.

n.fort · April 29, 2024, 9:50am

Check and compare marks used in both sections:

sudo ip rule
sudo nft list table ip vyos_mangle

Can you share full configuration so we can check?

satwell · April 29, 2024, 2:19pm

I think I’ve narrowed this down to an issue with container networking. I’ve got this container config:

set container name nginx image 'docker.io/library/nginx:mainline-alpine'
set container name nginx network nginx address '172.17.1.2'
set container name nginx restart 'always'
set container network nginx prefix '172.17.1.0/24'

If I remove the container config, PBR works correctly. That is, no NAT on the policy routed traffic.

Here’s the nftables ip nat table:

table ip nat {
        chain VYOS_PRE_SNAT_HOOK {
                type nat hook postrouting priority srcnat - 1; policy accept;
                return
        }

        chain NETAVARK-5BD504A99B1D3 {
                ip daddr 172.17.1.0/24 counter packets 0 bytes 0 accept
                ip daddr != 224.0.0.0/4 counter packets 0 bytes 0 masquerade
        }

        chain POSTROUTING {
                type nat hook postrouting priority srcnat; policy accept;
                counter packets 161 bytes 11326 jump NETAVARK-HOSTPORT-MASQ
                ip saddr 172.17.1.0/24 counter packets 1 bytes 64 jump NETAVARK-5BD504A99B1D3
        }

        chain NETAVARK-HOSTPORT-SETMARK {
                counter packets 0 bytes 0 meta mark set mark or 0x2000
        }

        chain NETAVARK-HOSTPORT-MASQ {
                 meta mark & 0x00002000 == 0x00002000 counter packets 31 bytes 2141 masquerade
        }

        chain NETAVARK-HOSTPORT-DNAT {
        }

        chain PREROUTING {
                type nat hook prerouting priority dstnat; policy accept;
                fib daddr type local counter packets 87 bytes 5156 jump NETAVARK-HOSTPORT-DNAT
        }

        chain OUTPUT {
                type nat hook output priority dstnat; policy accept;
                fib daddr type local counter packets 0 bytes 0 jump NETAVARK-HOSTPORT-DNAT
        }
}

See how the NETAVARK-HOSTPORT-MASQ chain is turning on masquerade if mark bit 0x2000 is set? And from above, PBR is using 0x7fffff01, which includes that bit. And every time I start a new TCP connection that’s routed by PBR, the packet counter for that masquerade rule increments by 1.

16again · May 5, 2024, 8:22pm

Good catch. Seems like a bug to me. If markings are used for multiple purposes, devote specific bits to specific functions