Unable to ping self or other hosts in VRF networks, ospfv2/v3 neighbors not forming. Can see traffic in tcpdump

Currently running 1.4 20220226 nightly

I have configured a set of routers with the identical VRF config (router-id is different) to separate the internet failsafe default route from the ADMIN network, which should use ospf default-information route learned from upstream routers to connect to the internet (not the default vrf 0.0.0.0/0 next hop)

After configuring this, I am not able to establish a neighbor relationship or ping each router’s own IP on a VRF interface, nor eachother or other hosts on br1.2 or br1.1023. Am I missing something or is this a bug with VRF and bridge vlans?

ping 10.255.2.2 vrf ADMIN results in no responce
if I run tcpdump on either interface, I’m able to see OSPFv2/v3 hellos from all 4 routers as well as ping request, but no reply.

set interfaces bridge br1 vif 2 vrf 'ADMIN'
set interfaces bridge br1 vif 1023 vrf 'ADMIN'
set protocols static route 0.0.0.0/0 next-hop 1xx.8xxx vrf 'default'
set vrf bind-to-all
set vrf name ADMIN protocols ospf area 0.0.0.0 network '10.255.0.0/24'
set vrf name ADMIN protocols ospf area 0.0.0.0 network '10.255.2.0/24'
set vrf name ADMIN protocols ospf area 0.0.0.4 network '10.255.0.8/30'
set vrf name ADMIN protocols ospf interface br1.2 cost '1'
set vrf name ADMIN protocols ospf interface br1.2 dead-interval '6'
set vrf name ADMIN protocols ospf interface br1.2 hello-interval '1'
set vrf name ADMIN protocols ospf interface br1.2 passive disable
set vrf name ADMIN protocols ospf interface br1.1023 cost '100'
set vrf name ADMIN protocols ospf interface br1.1023 dead-interval '6'
set vrf name ADMIN protocols ospf interface br1.1023 hello-interval '1'
set vrf name ADMIN protocols ospf interface br1.1023 passive disable
set vrf name ADMIN protocols ospf parameters router-id '0.0.0.255'
set vrf name ADMIN protocols ospf redistribute connected route-map 'ospf-connected'
set vrf name ADMIN protocols ospfv3 interface br1.2 area '0.0.0.0'
set vrf name ADMIN protocols ospfv3 interface br1.2 cost '1'
set vrf name ADMIN protocols ospfv3 interface br1.2 dead-interval '6'
set vrf name ADMIN protocols ospfv3 interface br1.2 hello-interval '1'
set vrf name ADMIN protocols ospfv3 interface br1.1023 area '0.0.0.0'
set vrf name ADMIN protocols ospfv3 interface br1.1023 cost '100'
set vrf name ADMIN protocols ospfv3 interface br1.1023 dead-interval '6'
set vrf name ADMIN protocols ospfv3 interface br1.1023 hello-interval '1'
set vrf name ADMIN protocols ospfv3 parameters router-id '0.0.0.255'
set vrf name ADMIN protocols ospfv3 redistribute connected route-map 'ospfv3-connected'
set vrf name ADMIN table '200'

I also have firewall zone-policy set for LOCAL, WAN, and ADMIN. WAN to ADMIN is dropped by default, but LOCAL to ADMIN and ADMIN to LOCAL is allowed by default. (br1.2 and br1.1023 are both in the ADMIN zone)

The end result I’m looking for will only allow traffic to connect to the server ADMIN network (on br1.2) if I’m coming in from br1.1023 OR I’m ssh directly from a router in a disaster recovery scenario.

I don’t understand why do use a BR? , Do you have a switch ? I think it may be easier to use subinterfaces with tag (if both are routers)

There are two switches in between these routers for Layer two redundancy between each other and the upstream virtual routers where all the layer 4 security is. The bridge is necessary in this colocation.

It’s a bit messy but it works, these edge router provide routing for the servers and management interfaces of the switches, and a bridge br0 between vlan br1. 1024 and eth0 that fails between the edge routers via vrrp (to stop a broadcast storm since stp isn’t desired) . This gives the virtual routers two physical paths to the internet should one of these routers fail.

This all also gives me a back door to fix any issue because these two routers are rarely changed.

I understand the idea , but if you have a switch in the middle there are a lot of thing we need to verify . STP (is not blocking a path) IGMP enable(is necessary in many vendors to enable multicast - ospf). I suggest to review the following link:

https://docs.vyos.io/en/latest/configuration/interfaces/ethernet.html#regular-vlans-802-1q

it can be useful for you.

I’m looking at the doc, however the configuration was working before introducing the VRF into the routers. Originally the Internet Default Route, and OSPF routes all resolved with a standard configuration. I do have the bridge vlan sub interfaces added to the VRF as described.

set int br br1 vif 1023 vrf ADMIN

As soon as I removed the config and converted it in a text editor to use the set vrf name ADMIN proto is when things broke down. I can’t even ping router to router in the same 10.255.2.0/24 network, which shouldn’t be an issue. STP is also not blocking any of the ports (because br1 does use STP, br0 does not), because in br1.1023, I can tcpdump proto ospf and see the hellos from all 4 routers. I can also see icmp echo requests go out br1.1023 but the replies aren’t making it into the VRF domain, so to speak.

in the upstream virtual routers, (which have no VRF configuration), I see these edge routers as neighbors in the Init state, which means they also see the ospf hellos, but since I’m missing something in the edge routers, they aren’t acknowledging the relationship.

Thanks for looking with me

I’ve found a few facts that seem interesting, I can ping the IPv6 link local address on each remote router on both br1.2 and br1.1023. I also see a valid arp cache for the remote routers as well as IPv6 neighbor discovery. VRRP is also working across br1.2

So this suggests to me that multicast is being connected in the VRF correctly, however most unicast is not routed into the VRF from the kernel. Meaning unicast bits don’t even make it to the ethernet interface, let alone the bridge VIF.

Also, hello from your neighbors in Alaska!