This is in relation to the open bug ⚓ T5371 "system name-server" is not vrf aware, I can move the discussion there if this isn’t appropriate.
Polishing the rough edges off VRF support is one of my favourite things to think about on VyOS, so I was mulling a few ways to come at this:
- Having a local resolver daemon in each relevant source VRF: potentially resource heavy, complex. Even repurposing pdns would require patching to get it working across VRFs and not lose existing functionality (it’s meant as a caching DNS for network clients, not being bound into a single VRF handling local lookups)
- Writing an nsswitch plugin module, speaking to a resolver daemon living in the lookup source VRF: not everything uses glibc nss (for eg, Golang does its own DNS), and still has the complexity of writing a robust resolver, there’s nothing that’ll work off the shelf
- Getting nscd to run in shared-cache mode inside the lookup source VRF: exponentially worse than writing an nsswitch plugin, with the same issues, plus nscd is famously buggy
- Reading through the various systemd modules that can do DNS, like resolved: they do not appear flexible in the way they would react to VRFs or have the same drawbacks as options above
- Manipulating iproute2 & nftables to punch traffic between VRFs: this one made me wince, but seemed the most effective, so I spent some time on it today. Unfortunately, I can only get VRF<->global to work sanely so far, VRF<->VRF is a no go.
Using ip rules, we can provide a path for the service traffic:
ip rule add iif lo to <dns.ip.addr> ipproto udp dport 53 lookup <vrf-table-id>
ip rule add iif lo to <dns.ip.addr> ipproto tcp dport 53 lookup <vrf-table-id>
Locally generated traffic to the DNS servers would immediately be placed into the right table to route out. This works no matter the condition of the global route table and specifically for the allowed protocol only. Returning traffic to the VRF finds its way back for processes running in global.
The second thought was, it should be possible to do this from other VRFs, not the least because pdns should be able to run in a configurable VRF (not yet possible, but should) while doing lookups to system-nameservers, which might be in another. Other things, like add system image or commit-archive usually depend on DNS and should work transparently from any VRF. With the right logic in place, we could even create VRF-specific views.
However, while the traffic makes it out and back on the network for VRF<->VRF, it never makes it back to the sending socket. I’ve tried to do some fiddling in nftables, but (most critically) there’s no easy way I can see to match local outbound traffic to a process source VRF in nft or iproute2:
- Rules have limited actions available even if that match worked. We only really have the option to throw it straight to a table lookup. By the time this hits nft, oif is the redirected source VRF interface and iif is blank
- Without matching ip rules, lookups are subject to the current VRF route table again and will just refuse to go anywhere without a valid route. We therefore can’t use nft at all without a rule or requiring a full valid route/route-leak, which I’d consider far more of a mess than some sneaky automatic protocol-specific PBR
- Without the ability to mark the outbound connection cleanly, I can’t manipulate return traffic to fix the VRF targeting.
So that’s where I’ve left that for now. I’m hoping someone has an idea of how to do the outbound connection tracking so I can fix inter-VRF or is able to poke enough holes in the whole idea that it’s not workable.
There’s a few more things that can or should be done; it might be useful for more protocols than just DNS, but DNS is somewhat unique in that it’s not a single process like ntpd or charon creating the traffic - it’s literally everything that wants to do a name lookup doing it independently. Automatic NAT rules could be used to create client source-address binding emulation if that was a desired feature. Some extra work would be needed for DHCP-sourced nameservers and to make sure pdns_recursor integrates seamlessly when in use.
I had started to dummy up a configuration style along the lines of:
set system default-source dns vrf MGT
The separate tree may be useful for more protocols if it worked nicely, like HTTP/FTP client connections from add system image or commit-archive, and it avoids breaking the ancient set system name-server
config element with extra sub-nodes.