Thinking about system name-server and VRFs


This is in relation to the open bug ⚓ T5371 "system name-server" is not vrf aware, I can move the discussion there if this isn’t appropriate.

Polishing the rough edges off VRF support is one of my favourite things to think about on VyOS, so I was mulling a few ways to come at this:

  • Having a local resolver daemon in each relevant source VRF: potentially resource heavy, complex. Even repurposing pdns would require patching to get it working across VRFs and not lose existing functionality (it’s meant as a caching DNS for network clients, not being bound into a single VRF handling local lookups)
  • Writing an nsswitch plugin module, speaking to a resolver daemon living in the lookup source VRF: not everything uses glibc nss (for eg, Golang does its own DNS), and still has the complexity of writing a robust resolver, there’s nothing that’ll work off the shelf
  • Getting nscd to run in shared-cache mode inside the lookup source VRF: exponentially worse than writing an nsswitch plugin, with the same issues, plus nscd is famously buggy
  • Reading through the various systemd modules that can do DNS, like resolved: they do not appear flexible in the way they would react to VRFs or have the same drawbacks as options above
  • Manipulating iproute2 & nftables to punch traffic between VRFs: this one made me wince, but seemed the most effective, so I spent some time on it today. Unfortunately, I can only get VRF<->global to work sanely so far, VRF<->VRF is a no go.

Using ip rules, we can provide a path for the service traffic:

ip rule add iif lo to <dns.ip.addr> ipproto udp dport 53 lookup <vrf-table-id>
ip rule add iif lo to <dns.ip.addr> ipproto tcp dport 53 lookup <vrf-table-id>

Locally generated traffic to the DNS servers would immediately be placed into the right table to route out. This works no matter the condition of the global route table and specifically for the allowed protocol only. Returning traffic to the VRF finds its way back for processes running in global.

The second thought was, it should be possible to do this from other VRFs, not the least because pdns should be able to run in a configurable VRF (not yet possible, but should) while doing lookups to system-nameservers, which might be in another. Other things, like add system image or commit-archive usually depend on DNS and should work transparently from any VRF. With the right logic in place, we could even create VRF-specific views.

However, while the traffic makes it out and back on the network for VRF<->VRF, it never makes it back to the sending socket. I’ve tried to do some fiddling in nftables, but (most critically) there’s no easy way I can see to match local outbound traffic to a process source VRF in nft or iproute2:

  • Rules have limited actions available even if that match worked. We only really have the option to throw it straight to a table lookup. By the time this hits nft, oif is the redirected source VRF interface and iif is blank
  • Without matching ip rules, lookups are subject to the current VRF route table again and will just refuse to go anywhere without a valid route. We therefore can’t use nft at all without a rule or requiring a full valid route/route-leak, which I’d consider far more of a mess than some sneaky automatic protocol-specific PBR
  • Without the ability to mark the outbound connection cleanly, I can’t manipulate return traffic to fix the VRF targeting.

So that’s where I’ve left that for now. I’m hoping someone has an idea of how to do the outbound connection tracking so I can fix inter-VRF or is able to poke enough holes in the whole idea that it’s not workable.

There’s a few more things that can or should be done; it might be useful for more protocols than just DNS, but DNS is somewhat unique in that it’s not a single process like ntpd or charon creating the traffic - it’s literally everything that wants to do a name lookup doing it independently. Automatic NAT rules could be used to create client source-address binding emulation if that was a desired feature. Some extra work would be needed for DHCP-sourced nameservers and to make sure pdns_recursor integrates seamlessly when in use.

I had started to dummy up a configuration style along the lines of:

set system default-source dns vrf MGT

The separate tree may be useful for more protocols if it worked nicely, like HTTP/FTP client connections from add system image or commit-archive, and it avoids breaking the ancient set system name-server config element with extra sub-nodes.

I have my routers setup with a single non-default management VRF (eth0 with DHCP client connected to the internal management network with private IP addresses, ssh service only in this VRF) and everything else related to routing public IPs on the Internet (BGP, OSPF, PPPoE server) in the default VRF. I also use local DNS forwarding (allow from and listen on only, forwarding to my two caching nameservers on public IPs, source address set the same as the public IP configured on loopback - also used as router ID, for iBGP over OSPF, etc.).

So it works in the default VRF and resolves Internet DNS names correctly, all DNS queries have a well defined public source IP, though it wouldn’t work if the internal management network had its own separate DNS on a private IP. I use numeric private IPs for management access, yes I know it’s not ideal but after some time my fingers already remeber them automagically :slight_smile:

I’ve found a discussion about it on the MikroTik forum, there “/ip dns set vrf=…” has only been implemented very recently (RouterOS 7.15) and it’s just a single VRF, while VRF for other services was done much earlier - also found a link to some Cisco documentation as an example of a more complete implementation of this feature.

Yup, that config (MGT + global WWW) is in line with my normal config for VyOS. I’d really like to be able to do things like move commit-archive into MGT but that’s only loosely related to the DNS work. Having a split-horizon DNS so the router can resolve internal systems behind MGT, without exposing those nameservers to the Internet, the pdns_recursor, etc is a use case I can easily see. That’s what I’m targeting here.

Under Cisco, which runs our customer L3 core (VyOS is WWW BGP border and RS), our configs have everything living in VRFs. If something is visible in global, someone has made a mistake (with occasional exceptions for MPLS/VXLAN underlay). IOS and derivatives appear structured so that client connections occur in dedicated threads/processes that can be re-pointed easily, individual processes aren’t generating them ad-hoc. This model is what gave me the nsswitch idea, VRF-aware DNS and DNS Views on Cisco lends itself to other parts of the imagined design.

Operational commands in VyOS do take a bit of effort to run DNS lookups from global, for eg, ping does host lookups in the op-mode wrapper so the underlying command doesn’t need to when run with its vrf option.

So, I could just support global lookups through a VRF but the implementation doesn’t seem complete without matching VRF support for the recursor, force vrf X and a few other edge cases, so that is why I was so determined to make VRF-to-VRF lookups work seamlessly. I’m also a little worried that if I can’t make the solution work more generally, kernel behaviour may regress and completely break even global<->VRF in a future update.

I was hoping someone might have an obvious answer that I was missing to bring the return traffic back into the origin VRF, without having to start digging through kernel code. I spent an hour or so running traffic through log actions in nft trying to find a way before making the forum post :slight_smile:.