L2tpv3 mostly working, but one VLAN is not

I am using1.5 nightly (circinus) and creating a bridge over a WAN using l2tpv3. I am attempting to trunk multiple VLANs across and have success with every VLAN except for one. I have reviewed the configuration on this VLAN and compared it to the others that are working and I cannot find any reason for the discrepancy. The only thing that might be different is that the majority of devices on this VLAN 30 are connected to another switch other than the one that VYOS is plugged into; however when I did a packet capture on one of the devices I saw the ARP request and reply, but no ICMP packets.

A few other things to check to rule out issues outside of VyOS:

  • Can you see correctly learned MACs from the other side of the L2 tunnel in switch management?
  • Do the switches have any MAC filters/learning restrictions or limits applied to the interswitch trunks or the VyOS trunks?
  • Is there any difference between VLAN 30 devices on the directly connected switch vs the other? Can they see each other, or can one set see through the cross-WAN tunnel and the other not?

I assume if things are working for the most part that the regular culprits for VMs (permissions for forged transmits & promisc interfaces) are not an issue here.

First, thank you for taking the time to offer some solutions.

I am going to reference the image below in the hopes that it makes my answers clearer:

  • I am bare metal on the VYOS devices, not virtualized.

  • If I attempt a ping from the laptop in site B to 30.18 or 30.1 in site A I do not get a response. The L3 switch in site A correctly lists the IP address of the laptop and MAC address in the ARP table.

    • Site A also shows the laptop MAC address on the same port as the VYOS device.

    • Laptop in site B ARP table does not list 30.1 nor 30.18.

  • If I start a ping from Site A L3 switch to 10.50.30.40 no packets are received and same results for ARP table.

    • On the laptop running wireshark during the ping I see no ARP requests for 10.50.30.40 likely because the L3 switch already knows the MAC in its cache.

    • The wireshark does reveal several other ARP requests from site A. Site B only has the single laptop in the VLAN that is not working. The devices on the other VLANs in site B work correctly.

  • Adding a static ARP entry on the laptop for .30.18 allows me to ping 30.18 from the laptop, but not from site A L3 switch.

The layer 3 switches are Arista and they pretty much factory defaults outside the VLAN configurations for the lab. The only difference is I have a IP helper address configured so the workstations can get DHCP. Could this be possibly interfering?

I can’t imagine what I’m missing at this point.

Site A VYOS config
set interfaces bridge br0 address ‘xx.xx.16.30/24’
set interfaces bridge br0 member interface eth0
set interfaces bridge br0 member interface l2tpeth1
set interfaces dummy dum0 address ‘192.168.0.1/32’
set interfaces l2tpv3 l2tpeth1 destination-port ‘5000’
set interfaces l2tpv3 l2tpeth1 encapsulation ‘ip’
set interfaces l2tpv3 l2tpeth1 peer-session-id ‘1’
set interfaces l2tpv3 l2tpeth1 peer-tunnel-id ‘1’
set interfaces l2tpv3 l2tpeth1 remote ‘192.168.0.2’
set interfaces l2tpv3 l2tpeth1 session-id ‘1’
set interfaces l2tpv3 l2tpeth1 source-address ‘192.168.0.1’
set interfaces l2tpv3 l2tpeth1 source-port ‘5000’
set interfaces l2tpv3 l2tpeth1 tunnel-id ‘1’

Site B VYOS config
set interfaces bridge br0 address ‘xx.xx.16.31/24’
set interfaces bridge br0 member interface eth0
set interfaces bridge br0 member interface l2tpeth1
set interfaces dummy dum0 address ‘192.168.0.2/32’
set interfaces l2tpv3 l2tpeth1 destination-port ‘5000’
set interfaces l2tpv3 l2tpeth1 encapsulation ‘ip’
set interfaces l2tpv3 l2tpeth1 peer-session-id ‘1’
set interfaces l2tpv3 l2tpeth1 peer-tunnel-id ‘1’
set interfaces l2tpv3 l2tpeth1 remote ‘192.168.0.1’
set interfaces l2tpv3 l2tpeth1 session-id ‘1’
set interfaces l2tpv3 l2tpeth1 source-address ‘192.168.0.2’
set interfaces l2tpv3 l2tpeth1 source-port ‘5000’
set interfaces l2tpv3 l2tpeth1 tunnel-id ‘1’

I did a test and removed the IP helper address from the VLAN and then it works as expected. So next question, any solutions to keep a IP helper address on this VLAN?

That makes things a bit more interesting, I’ve not had much to do with Arista, more of a Cisco/Procurve/Aruba man.

Again, just trying to narrow it down a bit:

  • The switches, or lab devices configured appropriately, when directly connected, would work ok or at least react differently?
  • Any IP helpers on other SVIs that do work?
  • MTU usually isn’t a concern for diagnostic pings and DHCP, but your tunnel is MTU-clear?

And for clarification:

  • The IP helper on VLAN 30 was in site A and appeared to kill site B talking to site A devices?
  • eth0 on VyOS is taking a 802.1q trunk from Arista and spanning that between sites?
  • There’s no special native VLAN config for the trunk ports - no special VLANs and untagged traffic is VLAN 1?
  • Aside from VLAN 30, there are other definitely tagged VLANs which do work, not just default VLAN 1?
  • VyOS isn’t exposed to L3 on the carried networks beyond that 16.x subnet?

It’s already sounding somewhat like a switch firmware bug and definitely is if the VyOS part can be isolated away. Looking around online there are some broadcast-related problems with Arista DHCP relay, but I can’t see anything obviously related to the issue as described so far.

I’ve not tried to stuff q-tagged frames into a tunnel, there might be an interaction there, but I can’t fit any way that would look with what you’re describing. I’d expect it to just not work for anything tagged if there were an issue.

VyOS does have DHCP relay services but won’t be the appropriate point to handle it if it’s not involved in L3 for those VLANs.

  • Any SVI with a IP helper fails to work, I added it to some to test that theory as well.

  • No issues with MTU

  • You are correct, helper in Site A and kills site B.

  • Yes, VyOS is taking eth0 and spanning the trunk across sites

  • Untagged traffic is 1.

  • Several of the trunk VLANs are working perfectly. Takes a bit when it first gets turned on for devices to be seen, but once it starts everything seems to work great so far
    *. VyOS is only on that 16.x subnet. I wanted it isolated.

I may open a ticket with Arista to see what they say, but things like this manufactures tend not to be interested in diagnosing. Arista’s support though has been outstanding in all 3 of my interactions with them over the last 8 years.

I used to prefer Cisco until I got to play with an Arista switch, been very pleased with them; outside their lack of support for a feature comparable to Cisco’s RSPAN.