Hi everyone,
I’m experimenting with the VPP dataplane on VyOS rolling (2025.12.14-0023) and I ran into a boot-time issue when assigning a physical interface to DPDK/VPP.
The system works correctly after manual intervention, but VPP fails during boot and gets restarted by systemd.
I’m trying to understand whether this is a configuration mistake or a potential bug in the rolling release
Hardware
NIC: Intel E810-C QSFP, PCI address: 0000:11:00.0
Kernel driver after binding: vfio-pci
System boot parameters include:
- intel_iommu=on
- hugepages=2048
- pci=realloc
Configuration
Minimal configuration relevant to the interface:
set interfaces ethernet eth4
set vpp settings interface eth4 driver dpdk
Saved config confirms this:
/config/config.boot
interfaces {
ethernet eth4 {
hw-id [MAC_address]
}
}
vpp {
settings {
interface {
eth4 {
driver dpdk
}
}
}
}
Behaviour
After manual restart
If I start VPP manually:
systemctl restart vpp
Everything works as expected:
vppctl show interface
eth4 up
tap4096 up
local0 down
Traffic counters increase and the interface functions normally.
Boot behaviour
However, during boot the following sequence occurs.
VPP starts
“Started vector packet processing engine”
Then fails with:
vlib_pci_bind_to_uio: Skipping PCI device 0000:11:00.0:
device is bound to IOMMU group and vfio-pci driver is not loaded
Later the modules load:
Module 'vfio_iommu_type1' loaded successfully
Module 'vfio_pci' loaded successfully
But then VPP receives:
received SIGTERM from PID 1
and stops.
Another suspicious log
During config activation I see repeated errors:
/usr/libexec/vyos/activate/20-ethernet_offload.py:
Interface "eth4" does not exist
This appears multiple times in:
/config/vyos-activate.log
Observations
-
vfio-pci is correctly loaded eventually
-
the NIC is successfully bound to vfio
-
VPP works perfectly after boot if restarted manually
-
the failure appears related to boot ordering / activation scripts
It looks like either:
-
VPP starts before VFIO modules are ready, or
-
the ethernet activation scripts attempt to manipulate eth4 even though it is being assigned to the VPP dataplane.
Systemd service
Relevant portion of the VPP unit:
ExecStartPre=-/sbin/modprobe uio_pci_generic
ExecStart=/usr/bin/vpp -c /run/vpp/vpp.conf
Note that vfio-pci is not loaded here, but later by vpp.py.
============================================
Questions
-
Is it expected that interfaces ethernet ethX must exist even when the interface is owned by VPP/DPDK?
-
Should Linux activation scripts (like 20-ethernet_offload.py) be skipped for VPP dataplane interfaces?
-
Is the boot order between:
-
vyos-configd
-
vfio modules
-
vpp.service
-
currently guaranteed?
Current workaround
Manually restarting VPP works:
systemctl restart vpp
After that the interface comes up normally.
====================================================
My theory and a proposed potential solution
I believe the boot-time failure may be caused by 20-ethernet_offload.py assuming that every interface under interfaces ethernet is a valid Linux netdev at activation time. In the current script, it iterates over all Ethernet interfaces and immediately instantiates Ethtool(ifname) without checking whether the interface actually exists in /sys/class/net first.
20-ethernet_offload.py
That is fine for normal Linux-owned interfaces, but it becomes problematic for interfaces that are used as metadata anchors for VPP/DPDK. In my case, eth4 has to remain under interfaces ethernet eth4 so the VPP commit logic can reference it, but once VPP takes ownership of that port, the normal Linux activation path may no longer see it as an ordinary kernel netdev. The current script then still tries to query GRO, GSO, LRO, SG, TSO, speed/duplex, and flow-control through Ethtool, which can fail if the interface is no longer present in the normal Linux path.
So the likely issue is not the offload logic itself, but the lack of a guard such as:
-
skip interfaces that do not exist in
/sys/class/net -
optionally also skip interfaces that are configured under
vpp settings interface <ifname>
A simple existence check would make the script more robust for VPP-owned ports, transient boot ordering, and rename timing. In my opinion this is the safest fix because it preserves current behavior for normal Ethernet interfaces while avoiding activation failures on interfaces that are no longer Linux-managed.
My proposal:
#!/usr/bin/env python3
from os.path import exists
from vyos.ethtool import Ethtool
from vyos.configtree import ConfigTree
from vyos.system.image import is_live_boot
def activate(config: ConfigTree):
base = ['interfaces', 'ethernet']
if not config.exists(base):
return
for ifname in config.list_nodes(base):
# Skip interfaces that are not present as Linux netdevs.
# This can happen for ports handed over to VPP/DPDK.
if not exists(f'/sys/class/net/{ifname}'):
continue
# Optional extra protection: skip interfaces explicitly assigned to VPP
# if config.exists(['vpp', 'settings', 'interface', ifname]):
# continue
eth = Ethtool(ifname)
# If GRO is enabled by the Kernel - reflect this on CLI.
# If GRO is configured but unsupported by NIC - remove it from CLI.
configured = config.exists(base + [ifname, 'offload', 'gro'])
enabled, fixed = eth.get_generic_receive_offload()
if configured and fixed:
config.delete(base + [ifname, 'offload', 'gro'])
elif is_live_boot() and enabled and not fixed:
config.set(base + [ifname, 'offload', 'gro'])
# GSO
configured = config.exists(base + [ifname, 'offload', 'gso'])
enabled, fixed = eth.get_generic_segmentation_offload()
if configured and fixed:
config.delete(base + [ifname, 'offload', 'gso'])
elif is_live_boot() and enabled and not fixed:
config.set(base + [ifname, 'offload', 'gso'])
# LRO
configured = config.exists(base + [ifname, 'offload', 'lro'])
enabled, fixed = eth.get_large_receive_offload()
if configured and fixed:
config.delete(base + [ifname, 'offload', 'lro'])
elif is_live_boot() and enabled and not fixed:
config.set(base + [ifname, 'offload', 'lro'])
# SG
configured = config.exists(base + [ifname, 'offload', 'sg'])
enabled, fixed = eth.get_scatter_gather()
if configured and fixed:
config.delete(base + [ifname, 'offload', 'sg'])
elif is_live_boot() and enabled and not fixed:
config.set(base + [ifname, 'offload', 'sg'])
# TSO
configured = config.exists(base + [ifname, 'offload', 'tso'])
enabled, fixed = eth.get_tcp_segmentation_offload()
if configured and fixed:
config.delete(base + [ifname, 'offload', 'tso'])
elif is_live_boot() and enabled and not fixed:
config.set(base + [ifname, 'offload', 'tso'])
# Remove deprecated UFO option
if config.exists(base + [ifname, 'offload', 'ufo']):
config.delete(base + [ifname, 'offload', 'ufo'])
# Validate speed/duplex
speed_path = base + [ifname, 'speed']
duplex_path = base + [ifname, 'duplex']
if config.exists(speed_path) and config.exists(duplex_path):
speed = config.return_value(speed_path)
duplex = config.return_value(duplex_path)
if speed != 'auto' and duplex != 'auto':
if not eth.check_speed_duplex(speed, duplex):
config.delete(speed_path)
config.delete(duplex_path)
# Validate flow-control
flow_control_path = base + [ifname, 'disable-flow-control']
if config.exists(flow_control_path):
if not eth.check_flow_control():
config.delete(flow_control_path)