It looks to me like the
conntrack-sync feature works best when:
- VyOS boxes are in an active/standby configuration
- VRRP is running on all transit interfaces
- VRRP is synchronized so that a single VyOS node is the VRRP active node for all interfaces
All traffic flows through a single VyOS node until it fails. On failure of the active node, VRRP on the surviving node:
- assumes the gateway address on all transit interfaces
conntrackdto dump flow state information previously collected from the failed node into the survivor’s conntrack table.
conntrackd is caching flow updates (not injecting them into the kernel on the receiving system in real time), this configuration of
conntrack-sync is necessarily an active/standby mechanism, closely coupled with VRRP.
Is that a reasonable summary of the basic capability?
I’m trying to figure out whether I can make use of
conntrack-sync for firewalling AWS site-to-site VPN connections within a topology like this:
Because I can’t reliably influence traffic to force flows only to the “active” VyOS node, populating the conntrack table with a VRRP-based trigger isn’t very helpful.
My experiments with using
DisableExternalCache feature to share flow state in real time are promising, but leave me with two questions:
conntrackdis injecting entries in real time so that return traffic can hit either router, what’s the role of the failover mechanism? Why is VRRP here?
- It looks like there’s a race at the start of a new asymmetric flow: Which will arrive first, the SYN/ACK, or the state table entry which allows it?
What’s the right way forward here?
- Decouple the firewall from the VPN, build the firewall as an active/standby pair with VRRP on both sides (symmetry!)
- Use routing metrics (BGP and OSPF) to force traffic onto one side, rely on
DisableExternalCache(AKA risk the race condition) only during the short windows of OSPF/BGP reconvergence.
- Something else?