Question regarding conntrack-sync

daniel_n · September 3, 2023, 3:18pm

I have a question of understanding about conntrack-sync.

I experiment since a few days on a problem and assume that I have misunderstood the capabilities of conntrack-sync. I want to implement a stateful active/active firewall with two or more redundant VyOS routers.

Is this possible?

My problem is that my routers don’t seem to know/use the firewall connection table of the respective other router. In my ruleset it is defined that packets will be dropped which cannot be related to any known active connection. Because of the active/active configuration I have an asynchronous packet flow. Packets from active connections established via one of the two routers are dropped by the firewall in the respective other router, because the connection from the first router is not known.

Apachez · September 3, 2023, 3:59pm

It seems like conntrackd doesnt really support asymmetric active/active according to:

https://conntrack-tools.netfilter.org/manual.html#sync-aa

Active-Active setups

The Active-Active setup consists of having more than one stateful firewall actively filtering traffic. Thus, we reduce the resource waste that implies to have a backup firewall which is spare.

We can classify the type of Active-Active setups in several families:

Symmetric path routing: The stateful firewalls share the workload in terms of flows, ie. the packets that are part of a flow are always filtered by the same firewall.

Asymmetric multi-path routing: The packets that are part of a flow can be filtered by whatever stateful firewall in the cluster. Thus, every flow-states have to be propagated to all the firewalls in the cluster as we do not know which one would be the next to filter a packet. This setup goes against the design of stateful firewalls as we define the filtering policy based on flows, not in packets anymore.

conntrackd allows you to deploy an symmetric Active-Active setup based on a static approach. For example, assume that you have two virtual IPs, vIP1 and vIP2, and two firewall replicas, FW1 and FW2. You can give the virtual vIP1 to the firewall FW1 and the vIP2 to the FW2.

The asymmetric path scenario is hard: races might occurs between state synchronization and packet forwarding. If you would like to deploy an Active-Active setup with an assymmetic multi-path routing configuration, then, make sure the same firewall forwards packets coming in the original and the reply directions. If you cannot guarantee this and you still would like to deply an Active-Active setup, then you might have to consider downgrading your firewall ruleset policy to stateless filtering.

However I think it should still work in theory (after all conntrackd will sync the conntrack table of one firewall with another) but the limit is a potential race condition which might occur anyway.

That is if flow starts by going through FW1 (lets say TCP SYN) and the processing of that including sending it to FW2 and have that update its conntracktable is slower than the return (TCP SYN+ACK) arrives at FW2 then FW2 have no other option than to drop that return traffic (because at this moment in time the local conntrack table at FW2 wasnt yet updated by conntrackd with information from FW1).

The solution for this is to do a stateless firewalling (basically just how an ACL in a switch/router functions where you only look at the interface/protocol/portnumber and not the TCP flags themselves) or as PaloAlto Networks still let through a few packets until the conntracktable have synced and then kill any unwanted packets (but some unwanted packets will still be able to pass through).

To do stateless you can operate on NOTRACK in the nftables but you then must also open up for ports in both directions.

That is (example):

A->B
TCP srcport:>1023, dstport:80

B->A
TCP srcport:80, dstport:>1023

That is your firewall will no longer be SPI (stateful packet inspection) or for that matter in terms of PaloAlto and the others a NGFW (Next Generation Firewall) but rather “just” a Screening router.

Where the definitions are:

Screening router: Looks strictly just at interface, protocol and portnumber(s).

SPI firewall: As above but adds sessions as in which direction initiated this traffic (this includes UDP and ICMP who gets virtual sessions or rather “connection tracking” even if they themselves dont have “sessions” per se).

Proxybased firewall: As SPI firewall but will also enforce application layer protocols. Such as you wont be able to send SMTP through a HTTP-proxy (unless you first transform that SMTP into a HTTP request/response). This will also terminate the sessions so you have one session at one interface and a different session at another interface. Basically break up session and content on one side and reconstruct it on the other side.

NGFW: As SPI firewall but also includes application identification and SSL-termination, URL-filtering etc. Compared to a proxybased firewall if there is something malicious in a packet which the NGFW didnt detect this malicious packet will be forwarded in its original form to the destination. That is the NGFW wont reconstruct the session (except for SSL which it either have to terminate as SSL-proxy or if its to protect a server and the server doesnt do Perfect Forward Secrecy (as Diffie-Hellman) then it can on the fly decrypt the traffic towards this server and inspect the traffic without recontructing the SSL-session in both directions).

Apachez · September 3, 2023, 4:12pm

A really old post but this one is claiming that DisableExternalCache is needed as an option to do asymmetric firewalling with conntrackd:

Also OpenWRT points to this setting in their active-active example:

https://oldwiki.archive.openwrt.org/doc/recipes/high-availability

Described in the conntrackd manual:

https://conntrack-tools.netfilter.org/manual.html#sync-disable-external

https://manpages.debian.org/bookworm/conntrackd/conntrackd.conf.5.en.html#DisableExternalCache

Im not sure if VyOS have the above enabled by default or what the result would be if it would be enabled by default (if bad results then perhaps add this as an option to “set conntrack”?).

Apachez · September 3, 2023, 4:21pm

According to the template at least in 1.4-rolling there is the option to through VyOS config to disable_external_cache:

github.com

vyos/vyos-1x/blob/current/data/templates/conntrackd/conntrackd.conf.j2

# autogenerated by conntrack_sync.py

# Synchronizer settings
Sync {
    Mode FTFW {
        DisableExternalCache {{ 'on' if disable_external_cache is vyos_defined else 'off' }}
    }
{% for iface, iface_config in interface.items() %}
{%     if iface_config.peer is vyos_defined %}
    UDP {
{%         if listen_address is vyos_defined %}
{%             for address in listen_address %}
        IPv4_address {{ address }}
{%             endfor %}
{%         endif %}
        IPv4_Destination_Address {{ iface_config.peer }}
        Port {{ iface_config.port if iface_config.port is vyos_defined else '3780' }}
        Interface {{ iface }}
        SndSocketBuffer {{ sync_queue_size | int *1024 *1024 }}
        RcvSocketBuffer {{ sync_queue_size | int *1024 *1024 }}

This file has been truncated. show original

set service conntrack-sync disable-external-cache

It also seems to exist for VyOS 1.3:

github.com

vyos/vyos-1x/blob/equuleus/data/templates/conntrackd/conntrackd.conf.tmpl

# autogenerated by conntrack_sync.py

# Synchronizer settings
Sync {
    Mode FTFW {
        DisableExternalCache {{ 'on' if disable_external_cache is defined else 'off' }}
    }
{% for iface, iface_config in interface.items() %}
{%   if loop.first %}
{%     if iface_config.peer is defined and iface_config.peer is not none %}
    UDP {
{%       if listen_address is defined and listen_address is not none %}
{%           for address in listen_address %}
        IPv4_address {{ address }}
{%           endfor %}
{%       endif %}
        IPv4_Destination_Address {{ iface_config.peer }}
        Port {{ iface_config.port if iface_config.port is defined else '3780' }}
{%     else %}
{%       set ip_address = iface | get_ipv4 %}

This file has been truncated. show original

daniel_n · September 3, 2023, 6:13pm

Hey, many thanks for the detailed explanation. With “disable-external-cache” it seems to work. I will wait and see if it works reliably.

Apachez · September 6, 2023, 2:28pm

How did that work so far?