EVPN-MH Implementation Incomplete?

I was testing EVPN-MH on VyOS earlier (on 1.4.0-epa2 and 1.5-rolling). The control plane seems to be implemented completely, so there’s no problem there (proper EVPN routes are generated and propagated). The issue I’m encountering is there doesn’t appear to be any mechanism to push the es-system-mac to the member interfaces to allow for a proper multi-chassis etherchannel (MEC). This may just be me not being able to find the appropriate command to enable it, but I did a fair bit of digging and didn’t see it either in the CLI or in the source.

LACP will make the system-mac the first interface that gets added to the bundle (in Linux). You can’t add a member to the bond if it has a MAC set in the config, so the only way to properly set the value is to change the hw-id, which will allow the MEC to come up properly, but will break VyOS after a reboot. The interface you configure the hw-id on will be missing.

Without this, any downstream customer will effectively be in active-backup rather than active-active. The port-channel will have all but the first interface it receives a PDU from in a suspended state (or fall back to independent if downstream devices allows, which still breaks active-active).

Is there a randomly placed command that will allow for this, or is it something that needs to be added?

If there isn’t currently a mechanism, maybe there can be something at the interface level to accomadate this. Something like one of these:

set interfaces ethernet eth1 lacp-id aa:bb:cc:dd:ee:f0
set interfaces ethernet eth1 evpn-mh-id aa:bb:cc:dd:ee:f0
set interfaces ethernet eth1 es-system-mac aa:bb:cc:dd:ee:f0

Another option could be to update any member interfaces to the es-system-mac under the bond/evpn config (if present), but that would be less explicit for the user.

Hi @L0crian , I’ve been doing some test , but it’s not finish yet . Agree with the idea to add MEC or member interface with the es-system-mac under the bond , if you want talk me on slack and we will share test/knowlege .

Hi,

I’m encountering a similar issue to what was discussed in this thread about EVPN-MH on VyOS. I’m seeing duplicate packets, which seems to be related to the use of VPC (Virtual Port Channel) in my setup.
I have try to set system-mac on the bond interface, also bond0 mac, but nothing seems to work.

Has anyone here found a solution or workaround for this issue? Did you manage to resolve it with a specific command or configuration change? Or is there an update or patch available that addresses this problem?

I would greatly appreciate any insights or solutions you might have!

Thanks in advance!

What version are you running?
What is your setup (topology and device types)?
Can you post your config?
What kind of traffic is being duplicated?

The changes I mentioned in the original post have been added, so really the only thing that is missing from the solution is being able to protodown an interface in the case of node or link failure. That wouldn’t cause your issue.

If the traffic that is being duplicated is BUM traffic, then you may have an issue with your DF. You need to have your VxLAN carrying interfaces be configured as MH uplinks for the DF filters to work correctly, so make sure you have that configured.

Are you running LACP?

For reference, I’ve tested this with both VyOS as the client, and a Cisco Nexus as the client (both in a bridged and routed configuration). Both work without duplicate packets.

Hi,

What version are you running?
Built on: Mon 12 Aug 2024 02:55 UTC

What is your setup (topology and device types)?
Two cisco nexus switch for VPC, 3 routers connected to the vpc, and another access switch connected to each nexus on the vpc. The server is connected to the access switch.

What kind of traffic is being duplicated?
I see all the traffic being duplicate, I do not know why, in test lab everything is working but here if I have two routers up the traffic is 2x times, if I have 3 routers up, the traffic is 3x times, and all the traffic tcp/icmp is duplicated not just the icmp. The traffic from the server.

Can you post your config?
I can share my configuration with you, but it’s quite complex. How can I contact you to send it over?

What I can see is that on the nexus I see the ports are in suspended state for some routers and I do not know why, all have the same configuration.

UPDATE:
This was the config for the bridge:
set interfaces bridge br0 member interface bond0.10 native-vlan '10
set interfaces bridge br0 member interface vxlan10 native-vlan ‘10’
set interfaces bridge br0 vif 10 address ‘10.0.0.1/24’

I have removed vxlan10 from the bridge and it seems that now I do not receive duplicate packet, but I do not know if this is the fix because i still see the prots in suspended state on the nexus.
Question:
If I have multiple vlans do I need to create a bridge for each vlan because in my configuration I put all the vlans on the same bridge?

For example:

set interfaces bridge br0 member interface bond0.10 native-vlan '10
set interfaces bridge br0 member interface bond0.11 native-vlan '11
set interfaces bridge br0 member interface vxlan10 native-vlan ‘10’
set interfaces bridge br0 member interface vxlan11 native-vlan ‘11’
set interfaces bridge br0 vif 10 address ‘10.0.0.1/24’
set interfaces bridge br0 vif 11 address ‘11.0.0.1/24’

Need to be:
set interfaces bridge br0 member interface bond0.10 native-vlan '10
set interfaces bridge br1 member interface bond0.11 native-vlan '11
set interfaces bridge br0 member interface vxlan10 native-vlan ‘10’
set interfaces bridge br1 member interface vxlan11 native-vlan ‘11’
set interfaces bridge br0 vif 10 address ‘10.0.0.1/24’
set interfaces bridge br1 vif 11 address ‘11.0.0.1/24’

Thank you!

You can put the config directly in here. It’s best to put it in a preformatted block by encapsulating the config in backticks like this:
```
Some Config
```
That would create a block that looks like this:

Some Config

You can also click the cog in the text window and select “Preformatted text”, and it’ll drop a block into you message that you can paste into.


  • Do you have EVPN Configured?
    • If you do you need to define an es-sys-mac? This is needed to create the necessary Type4 EVPN routes for DF election.
  • Do you have BGP configured?
    • If you do, do you have all of the Type2 and Type4 EVPN routes you expect on each node?
  • What knobs do you have configured under the VxLAN interface if any (e.g. nolearning, neighbor-suppress, etc…)?

The main things that are needed to see are the config for BGP, the bond, the bridge, and the VxLAN interface.

The reason you likely have suspended interfaces on the Nexus is because each of the routers has a different LACP device-id for each. You need to define the same system-mac for each bond so the Nexus thinks they’re all the same. You can verify this with show lacp neighbor on each Nexus and see if the partner IDs differ. They should be all the same.

One last thing, if you want the 10.0.0.1 IP to serve as the gateway for clients, and to work on all of your nodes, you need to define that as an anycast gateway. Each node should have that same IP, and you need to define them with the same MAC address, so you don’t experience MAC flapping.

So for the VPC to the nexus I use this:
set interfaces bonding bond0 description ‘MH-VPC’
set interfaces bonding bond0 evpn es-df-pref ‘101’
set interfaces bonding bond0 evpn es-id ‘100’
set interfaces bonding bond0 evpn es-sys-mac ‘11:22:33:44:55:66’
set interfaces bonding bond0 lacp-rate ‘fast’
set interfaces bonding bond0 member interface ‘eth4’
set interfaces bonding bond0 member interface ‘eth5’
set interfaces bonding bond0 min-links ‘1’
set interfaces bonding bond0 mode ‘802.3ad’
set interfaces bonding bond0 system-mac ‘11:22:33:44:55:77’
set interfaces bonding bond0 vif 10 description ‘VLAN10’
set interfaces bonding bond0 vif 11 description ‘VLAN11’

I see the correct mac on nexus with show lacp which is 11:22:33:44:55:77
Each router have different es-df-pref, 101, 102, 103 and same mac address on all routers.

As I said before my config for the bridge was this, all the same on all routers:
set interfaces bridge br0 member interface bond0.10 native-vlan '10
set interfaces bridge br0 member interface bond0.11 native-vlan '11
set interfaces bridge br0 member interface vxlan10 native-vlan ‘10’
set interfaces bridge br0 member interface vxlan11 native-vlan ‘11’
set interfaces bridge br0 vif 10 address ‘10.0.0.1/24’
set interfaces bridge br0 vif 11 address ‘11.0.0.1/24’
set interfaces bridge br0 vif 10 ip enable-arp-accept
set interfaces bridge br0 vif 10 mac ‘11:22:33:44:55:10’
set interfaces bridge br0 vif 11 ip enable-arp-accept
set interfaces bridge br0 vif 11 mac ‘11:22:33:44:55:11’

For underlay I am using ospf and the vxlan is setup like this:

set interfaces vxlan vxlan10 parameters nolearning
set interfaces vxlan vxlan10 port ‘4789’
set interfaces vxlan vxlan10 source-address ‘router1’
set interfaces vxlan vxlan10 vni ‘10’

set interfaces vxlan vxlan11 parameters nolearning
set interfaces vxlan vxlan11 port ‘4789’
set interfaces vxlan vxlan11 source-address ‘router1’
set interfaces vxlan vxlan11 vni ‘11’

For bgp I am using:
set protocols bgp address-family l2vpn-evpn advertise-all-vni
set protocols bgp address-family l2vpn-evpn vni 10
set protocols bgp address-family l2vpn-evpn vni 11
set protocols bgp neighbor router1 address-family l2vpn-evpn
set protocols bgp neighbor router1 remote-as ‘myad’
set protocols bgp neighbor router2 address-family l2vpn-evpn
set protocols bgp neighbor router2 remote-as ‘myas’
set protocols bgp system-as ‘myas’

And the interfaces between routers are set as uplink:

set interfaces ethernet eth6 evpn uplink
set interfaces ethernet eth7 evpn uplink

If I set es-sys-mac ‘11:22:33:44:55:66’ I see all the ports on the nexus connected, if I remove some are get suspended because have different mac. When all the ports are connected I do not have ping to the server…
I do not know what I am missing, in my lab with just one switch everything works.

Thank you

Is the server in a separate subnet than what you’re trying to ping from?

With your current configuration, do you still experience duplicate packets?

If only one router is active everything works, as soon as second one become up, the ping is not working, if I remove es-sys-mac from the bond port of one router become suspended on the nexus and I am getting duplicate packets for everything, if I remove vxlan11 from the bridge interface I did not get duplicate. I was not able to make all the ports connected on the nexus and to have working connexion, if I apply es-sys-mac i see on the nexus with show lacp correct mac but the ping is not working at all until I stop second router.

The reason I was asking if you’re pinging outside of the subnet is this could be an ARP issue. Since you’re using an anycast gateway, ARP replies from a client will always return to the same router (unless there’s a topology change that changes hashing). So if the return traffic is coming back on a different router, the traffic will fail.

You need a way to ensure routers know of the client’s MAC. You could do static entries, but the better way is to enable neighbor-suppression on the VxLAN interface. This will map the ARP table to the Type2 routes learned via BGP.

Configure that on all routers and report back.

I have enabled the following command:

set interfaces vxlan vxlan10 parameters neighbor-suppress

Currently, I have two routers up. One of them has both interfaces to the Nexus VPC disabled, but after enabling neighbor-suppress, I am still receiving duplicate packets. If I shut down the router that has its interfaces to the Nexus VPC disabled, the duplicate packets stop.

However, if I bring up the interface on the second router to the VPC, the traffic does not work at all.

On the router with the disabled ports, if I run:

show evpn arp-cache vni 10

I see the MAC addresses of the servers coming from the router that has ports enabled on the VPC, which is good. However, as soon as I enable the interface to the VPC on this router and run the same command, I see nothing.

And on this router a lot of arp request are sent for the ip of the server that is trying to ping.

Do you have a diagram and expected packet flow?

You’re likely going to need to follow the bouncing ball with tcpdump on VyOS to see where the packets are going and where they’re failing.

It’s possible you’re hitting a new bug since it’s been a while since I tested this on VyOS. If I find time over the next few days, I’ll spin it up in a lab with the same version you’re using.

Currently this is my topology with one router shutdown:

In this topology, I am trying to ping the server from the internet. The ICMP packet is arriving at R1, which I can confirm using tcpdump. On R2, when I run show evpn arp-cache vni 10, I see the MAC address of the server, but this is not the case on R1.

R1 is sending ARP requests for the MAC address, and I can see these ARP requests on R2 through the VPC interfaces. If I reboot R2, everything starts to work correctly, and I can also see the MAC address when I run show evpn arp-cache vni 10 on R1. However, once the router has rebooted and is fully up, the traffic stops working again, and the MAC address is no longer visible on R1 when I run show evpn arp-cache vni 10.

If you’d like, I can share my screen so you can control all the devices over serial. We can perform any tests needed, as the devices are not in production.

R2 will have also internet connection later from different provider.

QUESTION:
Do I need to create a VPC for each router? In my configuration, all 4 ports to the VPC are in the same port channel.

I’m thinking of the following scenario: if R1 receives a packet and doesn’t know the MAC address of the server, it will send an ARP request. If the reply from the Nexus is sent only to R2 and R2 does not forward it to R1 using VXLAN, R1 will not learn the MAC address.

Hi,

I have recreated a lab with a similar configuration. In the lab, everything works, but only with a single switch. So, I started by removing all the configuration and applying just what is in the lab. Now, I have discovered some very strange behavior: it seems that the LACP/bonding on the Nexus VPC does not work when both routers are up. If only one router is up, everything works fine, but as soon as the second router comes online and connects, the bonding stops working.

On the routers, I see STP packets coming from the Cisco Nexus. I am now considering the possibility that there might be a network loop, and the Nexus could be blocking the VLANs. For example, on one router, I have the bond interface with VLAN 10 on the bridge, and also VXLAN 10 on the same bridge. On the second router, I have the same VLAN and VXLAN on the bridge. I am thinking that if an STP packet comes to the bridge from VLAN 10 and then goes to the second router using VXLAN 10, it might end up on the same Cisco Nexus.

Could this be the issue in my topology? Do you have any recommendations for using two Nexus devices with VPC and multiple routers?

UPDATE:
I have disabled STP for the VLAN, and the ping started to work.

This is not a fix, just a way to confirm that this is the issue. Do you have any suggestions for my case?

Hi,

It seems that this is the issue that I have: EVPN-MH Split Horizon Filters Not Functional · Issue #15400 · FRRouting/frr · GitHub

How you was able to setup EVPN MH without this issue?

Regards

I remember reading this where it said split-horizon rules were not applied for DHCP for whatever reason: Multihoming does not work on tagged interfaces · FRRouting/frr · Discussion #11487 · GitHub

I’ve never tried to run DHCP over an EVPN-MH deployment with VyOS, so I’m not sure whether it would have worked when I tested it. You can manually do the rules by tagging the packet with a DSCP value as it comes into a PE, and denying it back out towards the port-channel on the other PE.

This is my topology:

And this is what happens:


I will explain here:

From the PC with IP address 59.10, I send a ping to the PC with IP address 57.10.
Router R1 knows about 59.10 locally, but it learns about 57.10 from R2, as you can see it has a remote VTEP.

Using Wireshark, I can see that the packets are arriving at the bond0 interface on R1, but they never go out to the VXLAN interface towards R2 or any other interface. I also do not see any ICMP packets inside R2 from R1 or from the bond0 interface. However, I do see other packets, such as BGP L2VPN and VXLAN packets on the link between the routers. Additionally, I can see STP packets on the bond0 interface of R2, so all the links are working.

If, for example, I reboot R2, R1 will directly learn the MAC address of 57.10, and the ping will start working.

Here is what I think is happening:

Because there is a bridge inside each router that connects the VXLAN interface and the bond0 interface, it seems to create a loop in the network. What comes to the bond0 interface of one router is forwarded by the bridge to the second router, which then forwards it back to the bond0 interface of the first router, causing the first router to receive the same packet again. Because STP (Spanning Tree Protocol) is active in my setup, it blocks the port between the routers to prevent the loop. This is why I am experiencing the issue. If I disable STP, R1 can send packets to R2, but it seems I then experience duplicate packets due to the loops. So, I am thinking there might be an issue with the EVPN implementation.

When I use a lab setup with just one switch without VPC, everything works with the same configuration. However, the switch does not know how to balance the traffic using source-destination hashing, so it sends the traffic to the same router each time. This means that the same router knows both the source and destination MAC addresses, and it works fine even if there is a blocked port between the routers.

In my case, when the source MAC is learned by one router and the destination MAC by another, the forwarding does not seem to work, even though the MAC addresses are learned correctly.

Could you please confirm if my understanding is correct, or let me know if I might have missed something in my configuration that could be causing this behavior? Is it normal for there to be a loop when the bridge interface and VXLAN interface are added to the bridge in each router? Do you have a similar setup, or could it be that my implementation is incorrect?