IPv6 Network connections through two VRRP pairs of routers is intermittent

I have two networks that each have two VRRP pairs and are connected via an OSPF network. I have struggled with getting both to work in a fully converged network, as connections between the host networks behind each pair intermittently fails to connect.

I was able to work around this by disabling OSPF on the BACKUP router for each pair, which ensures that traffic only flows between two routers. I only really need VRRP to expose a public IP and break a segment in a bridge to prevent a L2 loop since pvstp is not supported, however the VRRP scripts also enable/disable OSPF.

What I think is happening is a connection from a host in network A that hits either router 1 or 2, is routed to router 1 or 2 in network B and then is received by a remote host. Then the reply from the host in network B hits either router 1 or or 2, which could be different than the gateway the connection was replied on, and is then routed to router 1 or 2 in network A before being received by the original host.
Since the SYN packet opens a TCP session on a router, if the connection is loadbalanced to the other router in the pair the unexpected packet is dropped which causes my issue.

What I would like to know is if there is a feature I’m missing, or if I need to continue troubleshooting my current setup for a slight misconfiguration.

Some details:

  • zone based firewall is used on all 4 routers between all network zones
  • Ping and TCP connections are effected
  • All traffic in this example scenario is in the same “ADMIN” zone, including the OSPF router network and host networks
  • Layer4 hashing is enabled on all 4 routers
  • Turning this off does not improve performance
  • All hosts use ipv6 autoconf from router advertisements, setting static IPv6 gateways and using a VRRP virtual IP on hosts is not preferred
  • MASTER VRRP router and BACKUP VRRP router have ipv6 RA default preference set to HIGH and are both listed as equal in hosts’ “ip -6 route” output
  • setting the BACKUP VRRP router to low default preference does not improve even when “ip -6 route” MASTER is high and BACKUP is low
    *conntrac-sync is configured using the VRRP sync group, I’m not entirely sure this is working.
  • IPv4 connectivity is not an issue because it has to use VRRP virtual IP’s on each router.
  • IPv6 ping from any router directly to a remote host is sucessful, only host to host connectivity is effected.
  • both router pairs have an OSPF connection in area 0 with a cost of 1 between eachother, and connect to the routing switched network in area 0 with a cost of 100.
  • Setting the cost to 1000 for the BACKUP router’s link to the routing network does not improve, the master router still sees an equal cost path between both the BACKUP and MASTER routers of the other network.

Currently on Vyos nightly 20220218 build on all 4 routers.

It has turned out that the firewall IS infact receiving packets that have an invalid state. Setting a rule to allow invalid state between zones allows traffic to flow in the preferred converged network

However, this is not ideal, I would like conntrack-sync to make sure each router always knows of the other’s states, but I believe that the states only are imported on VRRP failover. Am i mistaken? If the states from both firewalls are supposed to always be sync’d then i just need to troubleshoot conntrack and remove these temporary invalid state rules.

Any help would be appreciated.

Can you attach the result of the commands from the VRRP master/slave?:
show conntrack-sync statistics

I’'ll keep things simple with just the network A routers, since this is the only routers I needed to allow invalid state to get traffic flowing.

Router 1 network A

Main Table Statistics:

cache internal:
current active connections:             1415
connections created:                   53594    failed:            0
connections updated:                  183035    failed:            0
connections destroyed:                 52179    failed:            0

cache external:
current active connections:             1158
connections created:                   20714    failed:            0
connections updated:                   37636    failed:            0
connections destroyed:                 19556    failed:            0

traffic processed:
                   0 Bytes                         0 Pckts

multicast traffic (active device=eth6):
            22819844 Bytes sent              6498976 Bytes recv
              275498 Pckts sent                64161 Pckts recv
                   0 Error send                    0 Error recv

message tracking:
                   0 Malformed msgs                    0 Lost msgs

Expect Table Statistics:

cache internal:
current active connections:                0
connections created:                       0    failed:            0
connections updated:                       0    failed:            0
connections destroyed:                     0    failed:            0

cache external:
current active connections:                0
connections created:                       0    failed:            0
connections updated:                       0    failed:            0
connections destroyed:                     0    failed:            0

traffic processed:
                   0 Bytes                         0 Pckts

multicast traffic (active device=eth6):
            22819844 Bytes sent              6498976 Bytes recv
              275498 Pckts sent                64161 Pckts recv
                   0 Error send                    0 Error recv

message tracking:
                   0 Malformed msgs                    0 Lost msgs

Router 2 Network A

Main Table Statistics:

cache internal:
current active connections:              352
connections created:                   19247    failed:            0
connections updated:                   37276    failed:            0
connections destroyed:                 18895    failed:            0

cache external:
current active connections:             1578
connections created:                   52749    failed:            0
connections updated:                  178704    failed:            0
connections destroyed:                 51171    failed:            0

traffic processed:
                   0 Bytes                         0 Pckts

multicast traffic (active device=eth6):
             6171564 Bytes sent             22621664 Bytes recv
               60463 Pckts sent               272576 Pckts recv
                   0 Error send                    0 Error recv

message tracking:
                   0 Malformed msgs                   13 Lost msgs

Expect Table Statistics:

cache internal:
current active connections:                0
connections created:                       0    failed:            0
connections updated:                       0    failed:            0
connections destroyed:                     0    failed:            0

cache external:
current active connections:                0
connections created:                       0    failed:            0
connections updated:                       0    failed:            0
connections destroyed:                     0    failed:            0

traffic processed:
                   0 Bytes                         0 Pckts

multicast traffic (active device=eth6):
             6171564 Bytes sent             22621664 Bytes recv
               60463 Pckts sent               272576 Pckts recv
                   0 Error send                    0 Error recv

message tracking:
                   0 Malformed msgs                   13 Lost msgs

Some more detail the real culprit was “set firewall state-policy invalid action drop”

turning this back on breaks traffic, turning it off restores traffic. The individual zone policy rules don’t do anything, as expected with the global rule.

obviously conntrackd sharing states “live” would be ideal so that really invalid traffic is dropped, but if not can I have a suggestion if its acceptable to allow invalid packets from “semi trusted” zones into “trusted zones”? I see a problem with invalid traffic from a DMZ making it into the ADMIN zone, however the ADMIN zone should be able to connect into the DMZ, which is how my zone rules are configured.

Turning off conntrack external cache fixes this issue without having to allow invalid states

This is a configuration node since 1.2.2 for asymmetric routing if anyone is searching.

1 Like

This topic was automatically closed 2 days after the last reply. New replies are no longer allowed.