Cloudflare giving error 520 / NAT / SSL Handshake error (help request)

Hi,

I am in the process of migrating the config from my Edgerouter Lite to VyOS in my homelab. I have my LAN services working however I am having issue getting ingress working from Cloudflare.

I am hosting a number of sites on Kubernetes in my homelab. The diagram below shows how requests get routed for external and internal users.

If a user is connected to the LAN then they get directed to an internal DNS server (CoreDNS) which resolves the address to the cluster Loadbalancer (MetalLB).

For external access the domain resolves via Cloudflare to the routers external IP.

Internally I can access the sites no problem. However when trying to access externally via Cloudflare I get a 520 (Server is misbehaving).

I can see the requests hitting the Traefik pod and getting TLS handshake error. The SSL mode in Cloudflare is set to Strict. I’ve also tested setting it to flexible from the Cloudflare console.

traefik-9bdfcd858-t2tkz traefik {"level":"debug","msg":"http: TLS handshake error from 162.158.159.119:48222: EOF","time":"2023-01-16T01:48:19Z"}
traefik-9bdfcd858-t2tkz traefik {"level":"debug","msg":"http: TLS handshake error from 172.70.82.74:20234: EOF","time":"2023-01-16T01:48:20Z"}

In the router logs I can see the following errors. It seems that traffic originating from the load balancer is detected as coming from interface eth1.20.

Jan 16 01:51:13 gateway kernel: IPv4: martian source 10.1.0.152 from 10.20.0.1, on dev eth1.20
Jan 16 01:51:13 gateway kernel: ll header: 00000000: e4 3a 6e 5b 06 2e 94 c6 91 af 21 f8 08 00 45 c0
Jan 16 01:51:13 gateway kernel: ll header: 00000010: 02 40

This appears correct to me as the balancer address 10.1.0.152 is being advertised by the node where the Traefik pod is running.

The balancer address is being advertised by MetalLB running on the node. The nodes are on vlan 20 which is configured on interface eth1 (LAN). So the problem appears to be related to the connection being dropped when the ‘martian’ IP is detected.

I have tried disabling ip-source-validation on both the firewall and the interface however this has no impact on the issue and VyOS continues to logs the message above.I feel sure I’ve made an error in the NAT rules or the interface somehow.

What’s troubling me is how it works internally and only fails when the traffic comes via Cloudflare. Also worth noting is that I have a Plex server which is exposed externally and that works completely fine!

I am fairly new to VyOS however I am have some experience of the Edgerouter Lite. The above setup has been working successfully there for over a year. Hopefully somebody can point me in the right direction :slight_smile:

Rolling 1.4 image.

Config: https://pastebin.com/tTGcW0CR

Helo @rust84,

Check the routing. Do you have a route to 10.1.0.152?

Hi @RyVolodya,

Here are my routes. It appears in the list :thinking:

vyos@gateway:~$ show ip route
Codes: K - kernel route, C - connected, S - static, R - RIP,
       O - OSPF, I - IS-IS, B - BGP, E - EIGRP, N - NHRP,
       T - Table, v - VNC, V - VNC-Direct, A - Babel, F - PBR,
       f - OpenFabric,
       > - selected route, * - FIB route, q - queued, r - rejected, b - backup
       t - trapped, o - offload failure

S>* 0.0.0.0/0 [210/0] is directly connected, pppoe0, weight 1, 01:06:10
B>* 10.1.0.1/32 [200/0] via 10.20.0.125, eth1.20, weight 1, 01:06:30
  *                     via 10.20.0.127, eth1.20, weight 1, 01:06:30
  *                     via 10.20.0.128, eth1.20, weight 1, 01:06:30
B>* 10.1.0.3/32 [200/0] via 10.20.0.127, eth1.20, weight 1, 01:06:49
B>* 10.1.0.151/32 [200/0] via 10.20.0.127, eth1.20, weight 1, 01:06:49
B>* 10.1.0.152/32 [200/0] via 10.20.0.125, eth1.20, weight 1, 01:06:30
  *                       via 10.20.0.127, eth1.20, weight 1, 01:06:30
B>* 10.1.0.153/32 [200/0] via 10.20.0.127, eth1.20, weight 1, 01:06:49
B>* 10.1.0.154/32 [200/0] via 10.20.0.127, eth1.20, weight 1, 01:06:49
B>* 10.1.0.155/32 [200/0] via 10.20.0.125, eth1.20, weight 1, 01:06:30
  *                       via 10.20.0.127, eth1.20, weight 1, 01:06:30
  *                       via 10.20.0.128, eth1.20, weight 1, 01:06:30
B>* 10.1.0.156/32 [200/0] via 10.20.0.127, eth1.20, weight 1, 01:06:49
B>* 10.1.0.160/32 [200/0] via 10.20.0.125, eth1.20, weight 1, 01:06:30
  *                       via 10.20.0.127, eth1.20, weight 1, 01:06:30
C>* 10.5.0.0/24 is directly connected, cni-services, 2d02h27m
C>* 10.9.0.0/24 is directly connected, eth1.9, 01:06:54
C>* 10.11.0.0/24 is directly connected, wg01, 2d02h27m
C>* 10.20.0.0/24 is directly connected, eth1.20, 01:06:54
C>* 10.30.0.0/24 is directly connected, eth1.30, 01:06:54
C>* 10.40.0.0/24 is directly connected, eth1.40, 01:06:54
C>* 10.50.0.0/24 is directly connected, eth1.50, 01:06:54
C>* 51.148.77.132/32 is directly connected, pppoe0, 01:06:12
C>* 192.168.1.0/24 is directly connected, eth1, 01:06:54