I’m experiencing an issue which I don’t really understand.
My setup looks as follows:
- 2 hosts in “LAN” network connected to 10Gbit/s switch
- 1 of the hosts has two interfaces with 803.2ad (10Gb/s each) set up
- vyos host also connected to switch, also with 803.2ad (2x 10Gbit/s), NIC is Mellanox ConnectX-2, which is the same as the other 803.2ad host which doesn’t seem to have this issue, also running Linux (Debian)
The 803.2ad ports are configured on the switch (Mikrotik CRS317-1G-16S+), so that shouldn’t be the issue.
When I run iperf as a client between the two (non-router) hosts, I get ~10Gb/s regardless of which host is the server and which is the client. However, if I run the same test with the router, I get 10Gb/s to either host, if the router is the client. If the router is hosting the iperf server, I only get 4-5Gb/s, which I don’t really understand. Both interfaces in the bond are running at 10Gb/s and full duplex, according to ethtool.
I also tried removing one of the interfaces from the bond and running the iperf only via that interface to check if the issue is related to 803.2ad, but the behavior is the same.
I do get some error messages when the bond comes up (e.g., after changing some settings on the switch side, but it seems fine?):
[  893.292330] mlx4_en: eth2: Link Down
[  893.292756] bond0: (slave eth2): speed changed to 0 on port 2
[  893.317749] bond0: (slave eth2): link status definitely down, disabling slave
[  893.317764] mlx4_core 0000:01:00.0: Failed to bond device: -95
[  893.323720] mlx4_en: eth2: Fail to bond device
[  893.342384] mlx4_en: eth3: Link Down
[  893.342864] bond0: (slave eth3): speed changed to 0 on port 1
[  893.397292] mlx4_en: eth3: Link Up
[  893.421767] bond0: (slave eth3): link status up again after 0 ms
[  893.421983] bond0: (slave eth3): link status definitely up, 10000 Mbps full duplex
[  893.422012] mlx4_core 0000:01:00.0: Failed to bond device: -95
[  893.428116] mlx4_en: eth3: Fail to bond device
[  893.547299] mlx4_en: eth2: Link Up
[  893.629975] bond0: (slave eth2): link status definitely up, 10000 Mbps full duplex
[  893.630003] mlx4_core 0000:01:00.0: Failed to bond device: -95
[  893.636169] mlx4_en: eth2: Fail to bond device
$ cat /proc/net/bonding/bond0
Ethernet Channel Bonding Driver: v5.10.17-amd64-vyos
Bonding Mode: IEEE 802.3ad Dynamic link aggregation
Transmit Hash Policy: layer2 (0)
MII Status: up
MII Polling Interval (ms): 100
Up Delay (ms): 0
Down Delay (ms): 0
Peer Notification Delay (ms): 0
802.3ad info
LACP rate: slow
Min links: 0
Aggregator selection policy (ad_select): stable
Slave Interface: eth3
MII Status: up
Speed: 10000 Mbps
Duplex: full
Link Failure Count: 1
Permanent HW addr: 00:02:c9:0f:da:59
Slave queue ID: 0
Aggregator ID: 5
Actor Churn State: none
Partner Churn State: none
Actor Churned Count: 0
Partner Churned Count: 0
Slave Interface: eth2
MII Status: up
Speed: 10000 Mbps
Duplex: full
Link Failure Count: 3
Permanent HW addr: 00:02:c9:0f:da:58
Slave queue ID: 0
Aggregator ID: 5
Actor Churn State: none
Partner Churn State: none
Actor Churned Count: 0
Partner Churned Count: 0