It looks to me like the conntrack-sync
feature works best when:
- VyOS boxes are in an active/standby configuration
- VRRP is running on all transit interfaces
- VRRP is synchronized so that a single VyOS node is the VRRP active node for all interfaces
All traffic flows through a single VyOS node until it fails. On failure of the active node, VRRP on the surviving node:
- assumes the gateway address on all transit interfaces
- triggers
conntrackd
to dump flow state information previously collected from the failed node into the survivor’s conntrack table.
Because conntrackd
is caching flow updates (not injecting them into the kernel on the receiving system in real time), this configuration of conntrack-sync
is necessarily an active/standby mechanism, closely coupled with VRRP.
Is that a reasonable summary of the basic capability?
I’m trying to figure out whether I can make use of conntrack-sync
for firewalling AWS site-to-site VPN connections within a topology like this:
Because I can’t reliably influence traffic to force flows only to the “active” VyOS node, populating the conntrack table with a VRRP-based trigger isn’t very helpful.
My experiments with using conntrackd
's DisableExternalCache
feature to share flow state in real time are promising, but leave me with two questions:
- If
conntrackd
is injecting entries in real time so that return traffic can hit either router, what’s the role of the failover mechanism? Why is VRRP here? - It looks like there’s a race at the start of a new asymmetric flow: Which will arrive first, the SYN/ACK, or the state table entry which allows it?
What’s the right way forward here?
- Decouple the firewall from the VPN, build the firewall as an active/standby pair with VRRP on both sides (symmetry!)
- Use routing metrics (BGP and OSPF) to force traffic onto one side, rely on
DisableExternalCache
(AKA risk the race condition) only during the short windows of OSPF/BGP reconvergence. - Something else?