VPP + DPDK interface causes boot-time failure (eth4 / Intel E810 / vfio-pci)

Hi everyone,

I’m experimenting with the VPP dataplane on VyOS rolling (2025.12.14-0023) and I ran into a boot-time issue when assigning a physical interface to DPDK/VPP.

The system works correctly after manual intervention, but VPP fails during boot and gets restarted by systemd.

I’m trying to understand whether this is a configuration mistake or a potential bug in the rolling release

Hardware

NIC: Intel E810-C QSFP, PCI address: 0000:11:00.0

Kernel driver after binding: vfio-pci

System boot parameters include:

  • intel_iommu=on
  • hugepages=2048
  • pci=realloc

Configuration

Minimal configuration relevant to the interface:

set interfaces ethernet eth4
set vpp settings interface eth4 driver dpdk

Saved config confirms this:

/config/config.boot

interfaces {
ethernet eth4 {
hw-id [MAC_address]
}
}

vpp {
settings {
interface {
eth4 {
driver dpdk
}
}
}
}

Behaviour

After manual restart

If I start VPP manually:

systemctl restart vpp

Everything works as expected:

vppctl show interface

eth4 up
tap4096 up
local0 down

Traffic counters increase and the interface functions normally.

Boot behaviour

However, during boot the following sequence occurs.

VPP starts

“Started vector packet processing engine”

Then fails with:

vlib_pci_bind_to_uio: Skipping PCI device 0000:11:00.0:
device is bound to IOMMU group and vfio-pci driver is not loaded

Later the modules load:

Module 'vfio_iommu_type1' loaded successfully
Module 'vfio_pci' loaded successfully

But then VPP receives:

received SIGTERM from PID 1

and stops.

Another suspicious log

During config activation I see repeated errors:

/usr/libexec/vyos/activate/20-ethernet_offload.py:
Interface "eth4" does not exist

This appears multiple times in:

/config/vyos-activate.log

Observations

  1. vfio-pci is correctly loaded eventually

  2. the NIC is successfully bound to vfio

  3. VPP works perfectly after boot if restarted manually

  4. the failure appears related to boot ordering / activation scripts

It looks like either:

  • VPP starts before VFIO modules are ready, or

  • the ethernet activation scripts attempt to manipulate eth4 even though it is being assigned to the VPP dataplane.

Systemd service

Relevant portion of the VPP unit:

ExecStartPre=-/sbin/modprobe uio_pci_generic
ExecStart=/usr/bin/vpp -c /run/vpp/vpp.conf

Note that vfio-pci is not loaded here, but later by vpp.py.

============================================

Questions

  1. Is it expected that interfaces ethernet ethX must exist even when the interface is owned by VPP/DPDK?

  2. Should Linux activation scripts (like 20-ethernet_offload.py) be skipped for VPP dataplane interfaces?

  3. Is the boot order between:

    • vyos-configd

    • vfio modules

    • vpp.service

currently guaranteed?

Current workaround

Manually restarting VPP works:

systemctl restart vpp

After that the interface comes up normally.

====================================================

My theory and a proposed potential solution

I believe the boot-time failure may be caused by 20-ethernet_offload.py assuming that every interface under interfaces ethernet is a valid Linux netdev at activation time. In the current script, it iterates over all Ethernet interfaces and immediately instantiates Ethtool(ifname) without checking whether the interface actually exists in /sys/class/net first.

20-ethernet_offload.py

That is fine for normal Linux-owned interfaces, but it becomes problematic for interfaces that are used as metadata anchors for VPP/DPDK. In my case, eth4 has to remain under interfaces ethernet eth4 so the VPP commit logic can reference it, but once VPP takes ownership of that port, the normal Linux activation path may no longer see it as an ordinary kernel netdev. The current script then still tries to query GRO, GSO, LRO, SG, TSO, speed/duplex, and flow-control through Ethtool, which can fail if the interface is no longer present in the normal Linux path.

So the likely issue is not the offload logic itself, but the lack of a guard such as:

  • skip interfaces that do not exist in /sys/class/net

  • optionally also skip interfaces that are configured under vpp settings interface <ifname>

A simple existence check would make the script more robust for VPP-owned ports, transient boot ordering, and rename timing. In my opinion this is the safest fix because it preserves current behavior for normal Ethernet interfaces while avoiding activation failures on interfaces that are no longer Linux-managed.

My proposal:

#!/usr/bin/env python3

from os.path import exists

from vyos.ethtool import Ethtool
from vyos.configtree import ConfigTree
from vyos.system.image import is_live_boot


def activate(config: ConfigTree):
    base = ['interfaces', 'ethernet']

    if not config.exists(base):
        return

    for ifname in config.list_nodes(base):
        # Skip interfaces that are not present as Linux netdevs.
        # This can happen for ports handed over to VPP/DPDK.
        if not exists(f'/sys/class/net/{ifname}'):
            continue

        # Optional extra protection: skip interfaces explicitly assigned to VPP
        # if config.exists(['vpp', 'settings', 'interface', ifname]):
        #     continue

        eth = Ethtool(ifname)

        # If GRO is enabled by the Kernel - reflect this on CLI.
        # If GRO is configured but unsupported by NIC - remove it from CLI.
        configured = config.exists(base + [ifname, 'offload', 'gro'])
        enabled, fixed = eth.get_generic_receive_offload()
        if configured and fixed:
            config.delete(base + [ifname, 'offload', 'gro'])
        elif is_live_boot() and enabled and not fixed:
            config.set(base + [ifname, 'offload', 'gro'])

        # GSO
        configured = config.exists(base + [ifname, 'offload', 'gso'])
        enabled, fixed = eth.get_generic_segmentation_offload()
        if configured and fixed:
            config.delete(base + [ifname, 'offload', 'gso'])
        elif is_live_boot() and enabled and not fixed:
            config.set(base + [ifname, 'offload', 'gso'])

        # LRO
        configured = config.exists(base + [ifname, 'offload', 'lro'])
        enabled, fixed = eth.get_large_receive_offload()
        if configured and fixed:
            config.delete(base + [ifname, 'offload', 'lro'])
        elif is_live_boot() and enabled and not fixed:
            config.set(base + [ifname, 'offload', 'lro'])

        # SG
        configured = config.exists(base + [ifname, 'offload', 'sg'])
        enabled, fixed = eth.get_scatter_gather()
        if configured and fixed:
            config.delete(base + [ifname, 'offload', 'sg'])
        elif is_live_boot() and enabled and not fixed:
            config.set(base + [ifname, 'offload', 'sg'])

        # TSO
        configured = config.exists(base + [ifname, 'offload', 'tso'])
        enabled, fixed = eth.get_tcp_segmentation_offload()
        if configured and fixed:
            config.delete(base + [ifname, 'offload', 'tso'])
        elif is_live_boot() and enabled and not fixed:
            config.set(base + [ifname, 'offload', 'tso'])

        # Remove deprecated UFO option
        if config.exists(base + [ifname, 'offload', 'ufo']):
            config.delete(base + [ifname, 'offload', 'ufo'])

        # Validate speed/duplex
        speed_path = base + [ifname, 'speed']
        duplex_path = base + [ifname, 'duplex']
        if config.exists(speed_path) and config.exists(duplex_path):
            speed = config.return_value(speed_path)
            duplex = config.return_value(duplex_path)
            if speed != 'auto' and duplex != 'auto':
                if not eth.check_speed_duplex(speed, duplex):
                    config.delete(speed_path)
                    config.delete(duplex_path)

        # Validate flow-control
        flow_control_path = base + [ifname, 'disable-flow-control']
        if config.exists(flow_control_path):
            if not eth.check_flow_control():
                config.delete(flow_control_path)

Try the latest rolling, there fixed a lot of VPP bugs since 2025.12.14

I updated to the latest version, still the same issue. The key problematic line is still there, unchanged:

eth = Ethtool(ifname)

It still happens immediately inside:

for ifname in config.list_nodes(base):

with no guard to check whether the interface actually exists as a Linux netdev first.