[SOLVED] When PBR is used, NAT isn't applied

Hi,

I think I’ve found a problem in Vyos 1.2.5 (official ISO). I’ll explain it here, along with a simple reproducer.

What I want to do is have specific clients on my LAN be forced via an OpenVPN Tunnel. The issue is the OpenVPN Tunnel is a “Client” VPN, and for traffic to go via it, it must be NAT’d to the IP address of the Tunnel Interface.

Simple reproducer:

I NAT all traffic via vtun2 to it’s IP address:

tim@ferrari# show nat source
 rule 2000 {
     description "XXX VPN Traffic"
     outbound-interface vtun2
     translation {
         address masquerade
     }
 }

From my laptop (192.168.0.120) I ping the following IP:

C:\Users\tim>ping 139.130.4.5 -n 1
Pinging 139.130.4.5 with 32 bytes of data:
Reply from 139.130.4.5: bytes=32 time=40ms TTL=241

Router’s routing table is currently:

tim@ferrari# run show ip route 139.130.4.5
Routing entry for 0.0.0.0/0
  Known via "kernel", distance 0, metric 0, best
  Last update 3d12h18m ago
  * directly connected, pppoe0

40ms ping. Now I add a static route to force all traffic via 139.130.4.5 that way.

tim@ferrari# set interface-route 139.130.0.0/16 next-hop-interface vtun2

Check the routing table:

tim@ferrari# run show ip route 139.130.4.5
Routing entry for 139.130.0.0/16
  Known via "static", distance 1, metric 0, best
  Last update 00:00:49 ago
  * directly connected, vtun2, weight 1

Looks good, another ping:

C:\Users\tim>ping 139.130.4.5 -n 1
Pinging 139.130.4.5 with 32 bytes of data:
Reply from 139.130.4.5: bytes=32 time=62ms TTL=241

Increase of about 20ms, about what I expected.

So this proves (just for the avoidance of doubt) that sending traffic destined for 139.130.4.5 via vtun2 will work, it gets NAT’d and it works correctly.

Problem is of course that affects ALL clients on my LAN. That’s the default routing table. I don’t want that, I just want to route some specific clients, so I’m going to use Policy Based Routing to achieve this.

tim@ferrari# show protocols static
 table 10 {
     interface-route 139.130.0.0/16 {
         next-hop-interface vtun2 {
         }
     }
 }

tim@ferrari# show policy route
 route test {
     rule 10 {
         set {
             table 10
         }
         source {
             address 192.168.0.120/32
         }
     }
 }

tim@ferrari# set interfaces ethernet eth1 policy route test

Note: eth1 has IP of 192.168.0.1/24 and is the interface my laptop is connected to.

Great! Now only 192.168.0.120 (my test PC I’ve been doing the pings from) is going to get traffic for 139.130.4.5 routed via vtun2.

But this is my problem:

C:\Users\tim>ping 139.130.4.5 -n 5
Pinging 139.130.4.5 with 32 bytes of data:
Request timed out.
Request timed out.
Request timed out.
Request timed out.
Request timed out.

It doesn’t work. Additionally I get this in dmesg/kernel logs:

[Sat Apr 18 21:10:29 2020] IPv4: martian source 192.168.0.120 from 139.130.4.5, on dev vtun2
[Sat Apr 18 21:10:34 2020] IPv4: martian source 192.168.0.120 from 139.130.4.5, on dev vtun2
[Sat Apr 18 21:10:39 2020] IPv4: martian source 192.168.0.120 from 139.130.4.5, on dev vtun2
[Sat Apr 18 21:10:44 2020] IPv4: martian source 192.168.0.120 from 139.130.4.5, on dev vtun2
[Sat Apr 18 21:10:49 2020] IPv4: martian source 192.168.0.120 from 139.130.4.5, on dev vtun2

Other details:

tim@ferrari# run show ip route 139.130.4.5
Routing entry for 0.0.0.0/0
  Known via "kernel", distance 0, metric 0, best
  Last update 3d12h44m ago
  * directly connected, pppoe0

tim@ferrari# run show ip route table 10
Codes: K - kernel route, C - connected, S - static, R - RIP,
       O - OSPF, I - IS-IS, B - BGP, E - EIGRP, N - NHRP,
       T - Table, v - VNC, V - VNC-Direct, A - Babel, D - SHARP,
       F - PBR, f - OpenFabric,
       > - selected route, * - FIB route, q - queued route, r - rejected route

VRF default table 10:
S>* 139.130.0.0/16 [1/0] is directly connected, vtun2, 00:51:21

How can I make this work? I want my client PC to be able to be policy-routed AND be NAT’d, while other clients on the same LAN aren’t.

Thanks!

Tim

After a good nights sleep, the problem dawned on me:

I had

set firewall source-validation strict

Of course that’s not going to work, traffic for 139.130.0.0/16 was coming back via vtun2 when the routing table said it was via pppoe0.

Deleting that line from the firewall fixed the problem - There’s a reason it’s not set by default.

This “bug” was my own making.