Bond member interfaces down

I noticed that all member interfaces of an active-backup bond are down unless I manually bring them up with ip link set dev ethX up when running VyOS in a VM on Proxmox. This doesn’t seem to happen in a VM on VMWare Workstation.
I was able to reproduce it with this minimal configuration:

Config
set interfaces bonding bond0 member interface 'eth2'
set interfaces bonding bond0 member interface 'eth1'
set interfaces bonding bond0 mode 'active-backup'
set interfaces bonding bond0 vif 100 address 'xxx.xxx.0.1/24'
set interfaces ethernet eth0
set interfaces loopback lo
set system config-management commit-revisions '100'
set system conntrack modules ftp
set system conntrack modules h323
set system conntrack modules nfs
set system conntrack modules pptp
set system conntrack modules sip
set system conntrack modules sqlnet
set system conntrack modules tftp
set system console device ttyS0 speed '115200'
set system host-name xxxxxx
set system login user xxxxxx authentication encrypted-password xxxxxx
set system login user xxxxxx authentication plaintext-password xxxxxx
set system ntp server xxxxx.tld
set system ntp server xxxxx.tld
set system ntp server xxxxx.tld
set system syslog global facility all level 'info'
set system syslog global facility protocols level 'debug'

According to ethtool the link is not detected for any of the bond members (Link detected: no), but as soon as I remove them from the bond that changes to yes and the interface is shown as up in ip link. I tried both virtio and E1000 as interface type, but that didn’t make a difference.

Am I missing anything or is this indeed a bug?
Thanks!

Hi @DerEnderKeks , what is the other side device that you’re using for this bonding? Both VyOS or something different? What is the VyOS version you’re running? Any logs that may point to the issue?

Also, can you share the opposite side configuration as well? For now, try to manually specify the active link for this bonding and check if that helped: Bond / Link Aggregation — VyOS 1.3.x (equuleus) documentation

The other side is just a virtual switch in Proxmox (based on Open vSwitch), which is basically just an unmanaged/dumb switch. (For active-backup that should be sufficient, right?)
I completely forgot to mention, but I’m on 1.4-rolling-202204040217 but the problem also occurred on 1.4-rolling-202203130618.
I couldn’t see anything interesting in journalctl. Are there any specific files I should check?

journalctl -b | grep eth
Apr 04 13:27:38 vyos kernel: virtio_net virtio2 e2: renamed from eth0
Apr 04 13:27:38 vyos kernel: virtio_net virtio4 e4: renamed from eth2
Apr 04 13:27:38 vyos kernel: virtio_net virtio3 e3: renamed from eth1
Apr 04 13:27:41 vyos kernel: virtio_net virtio3 eth1: renamed from e3
Apr 04 13:27:41 vyos kernel: virtio_net virtio2 eth0: renamed from e2
Apr 04 13:27:41 vyos kernel: virtio_net virtio4 eth2: renamed from e4
Apr 04 13:27:44 vyos sudo[1403]:     root : PWD=/ ; USER=root ; COMMAND=/usr/bin/sh -c /usr/sbin/vyshim VYOS_TAGNODE_VALUE='eth0' /usr/libexec/vyos/conf_mode/interfaces-ethernet.py
Apr 04 13:27:44 vyos vyos-configd[737]: Received message: {"type": "node", "data": "VYOS_TAGNODE_VALUE=eth0/usr/libexec/vyos/conf_mode/interfaces-ethernet.py"}
Apr 04 13:27:44 vyos netplugd[950]: eth0: state DOWN flags 0x00001002 BROADCAST,MULTICAST -> 0x00011043 UP,BROADCAST,RUNNING,MULTICAST,10000
Apr 04 13:27:44 vyos netplugd[1439]: /etc/netplug/netplug eth0 in -> pid 1439
Apr 04 13:27:44 vyos kernel: bond0: (slave eth1): making interface the new active one
Apr 04 13:27:44 vyos kernel: bond0: (slave eth1): Enslaving as an active interface with an up link
Apr 04 13:27:44 vyos netplugd[950]: eth1: state DOWN flags 0x00001002 BROADCAST,MULTICAST -> 0x00001843 UP,BROADCAST,RUNNING,SLAVE,MULTICAST
Apr 04 13:27:44 vyos netplugd[1474]: /etc/netplug/netplug eth1 in -> pid 1474
Apr 04 13:27:44 vyos netplugd[950]: eth1: state INNING flags 0x00011843 UP,BROADCAST,RUNNING,SLAVE,MULTICAST,10000 -> 0x00001802 BROADCAST,SLAVE,MULTICAST
Apr 04 13:27:44 vyos kernel: bond0: (slave eth2): Enslaving as a backup interface with an up link
Apr 04 13:27:44 vyos kernel: 8021q: adding VLAN 0 to HW filter on device eth0
Apr 04 13:27:45 vyos netplugd[1602]: /etc/netplug/netplug eth1 probe -> pid 1602
Apr 04 13:27:45 vyos netplugd[950]: eth2: state DOWN flags 0x00001002 BROADCAST,MULTICAST -> 0x00011843 UP,BROADCAST,RUNNING,SLAVE,MULTICAST,10000
Apr 04 13:27:45 vyos netplugd[950]: eth2: state INNING flags 0x00011843 UP,BROADCAST,RUNNING,SLAVE,MULTICAST,10000 -> 0x00001802 BROADCAST,SLAVE,MULTICAST
Apr 04 13:27:45 vyos ntpd[1606]: unable to create socket on eth0 (6) for fe80::c893:eeff:fe87:f7d1%2#123
Apr 04 13:27:46 vyos netplugd[1667]: /etc/netplug/netplug eth2 probe -> pid 1667
Apr 04 13:27:46 vyos netplugd[950]: eth2: state PROBING pid 1667 exited status 256
Apr 04 13:27:46 vyos netplugd[950]: Could not bring eth2 back up
Apr 04 13:27:48 vyos ntpd[1606]: Listen normally on 9 eth0 [fe80::xxx%2]:123

I tried it with eth1 as the primay interface but that didn’t change anything. I noticed that if I remove and add members to the bond it actually puts all of them up. But that only lasts until the system is rebooted.

I can confirm the same behavior with an even simpler config:

set interfaces bonding bond0 member interface 'eth0'
set interfaces bonding bond0 member interface 'eth1'
set interfaces bonding bond0 member interface mode 'active-backup'

However, it seems that more bonding modes are affected.
This can be related to our config scripts and admin state control. For example, here are two logs:
From affected Proxmox:

Apr 04 18:00:31 vyos python3[611]: set_admin_state(lo, up)
Apr 04 18:00:32 vyos python3[611]: set_admin_state(eth0, down)
Apr 04 18:00:32 vyos python3[611]: set_admin_state(eth1, down)
Apr 04 18:00:32 vyos python3[611]: set_admin_state(bond0, up)

From unaffected QEMU:

Apr 04 17:59:44 vyos python3[696]: set_admin_state(lo, up)
Apr 04 17:59:45 vyos python3[696]: set_admin_state(eth0, up)
Apr 04 17:59:45 vyos python3[696]: set_admin_state(eth1, up)
Apr 04 17:59:46 vyos python3[696]: set_admin_state(eth0, down)
Apr 04 17:59:46 vyos python3[696]: set_admin_state(eth0, up)
Apr 04 17:59:46 vyos python3[696]: set_admin_state(eth1, down)
Apr 04 17:59:46 vyos python3[696]: set_admin_state(eth1, up)
Apr 04 17:59:46 vyos python3[696]: set_admin_state(bond0, up)

The behavior must be investigated carefully to be sure that we do not have problems with bonding config logic.
@DerEnderKeks could you create a bug-report in https://phabricator.vyos.net/?

Done: ⚓ T4340 Bonding member interfaces down on Proxmox

Thanks a lot!

We will investigate this. Meanwhile, you can use a very rough workaround to force interfaces to be up. Just modify the file /usr/lib/python3/dist-packages/vyos/ifconfig/bond.py by adding the line:

--- a/python/vyos/ifconfig/bond.py
+++ b/python/vyos/ifconfig/bond.py
@@ -438,6 +438,7 @@ class BondIf(Interface):
                 # any remaining ones
                 Interface(interface).flush_addrs()
                 self.add_port(interface)
+                Interface(interface).set_admin_state('up')
 
         # Primary device interface - must be set after 'mode'
         value = config.get('primary')

Router reboot after is required.

1 Like

Looks like you guys have chocked this up to a possible bug, but would like to mention to OP to be sure that auto start is enabled on those interfaces in Proxmox as well. I’ve run into this issue with the same thing. The down/up transition in VyOS ends up simulating to ProxMox a cable being plugged in.

I’m not aware of any way to have an interface not auto start in Proxmox. What exactly do you mean?

Maybe this?

image

That’s just the settings menu for a network interface of the Proxmox host. In the menu for the interfaces of a VM there is no such option:

image