Vyos appears to have re-enabled tso, gso, gro by default in latest VyOS 1.5-rolling-202406181053; this causes regressions

When diagnosing an issue where I would lose network connectivity when making uploads on the network, I observed the following configuration:

show interfaces ethernet eth0
 address dhcp
 description OUTSIDE
 hw-id 04:0e:3c:8b:a6:bf
 offload {
     gro
     gso
     sg
     tso
 }

Previously, VyOS left these off How come offload settings isnt enabled by default? which makes sense because i210, i225 devices (the most common NICs in the world right now) are buggy with them turned on.

These reappeared and caused me significant toil.

Version:          VyOS 1.5-rolling-202406181053
Release train:    current
Release flavor:   generic

Built by:         autobuild@vyos.net
Built on:         Tue 18 Jun 2024 13:50 UTC
Build UUID:       d2c3d495-0be8-455c-83e7-7960a7578ea8
Build commit ID:  2b3d1167850b85

Architecture:     x86_64
Boot via:         installed image
System type:      bare metal

Hardware vendor:  HP
Hardware model:   HP EliteDesk 800 G5 Desktop Mini
Hardware S/N:     MXL95025NY
Hardware UUID:    800b5dc3-e6c8-ba65-0bcb-dc6bfdfbccb2

Copyright:        VyOS maintainers and contributors
1 Like

If this is a bug report, it might pay to state what version you were on and how you arrived at the version you’re on.
And also if you can reproduce this happening again in a test environment.

1 Like

I noticed the same issue when updating from 1.5-rolling-202405310019 to 1.5-rolling-202406190020. I’m not sure if the offloading is what caused my connectivity issues, but recovering from this upgrade was quite difficult.

After performing this upgrade, all of my ethernet interfaces had a new offload section added with several types enabled by default. My WAN connection became completely unavailable, and I was unable to SSH into VyOS from any device on the LAN. DNS queries sent to VyOS were also failing, even for queries that had static mappings that could have been answered without using an upstream resolver. It seems like all communication to VyOS itself was being dropped, despite having firewall rules that should have allowed that traffic.

Using a console connection, I tried deleting the offload sections and then I committed and saved, but that didn’t seem to help. I tried rebooting, but that didn’t seem to help either. I’m pretty sure the offload sections were actually being re-added after reboots, but I’m unsure whether that was a side-effect of me trying to switch back and forth between the new system image and an older one. Either way, I wasn’t able to find a way to get things working at all on the newer version of VyOS.

Downgrading was actually pretty weird too. At one point while on the newer version, I power cycled my ISP’s fiber jack, and that restored Internet connectivity, but not SSH or DNS. I then downgraded to the older system image, but that broke the Internet while fixing SSH and DNS. I probably should have restarted my ISP’s fiber jack at that point as well, since that may have resolved the remaining issues.

Instead, in the end, I had to do a combination of switching back to the older system image, rebooting the VyOS VM, restarting the VyOS host machine, fulling removing power from the host machine instead of just rebooting it (thinking the NICs might have gotten into a bad state that restarts wouldn’t fix), and power cycling my ISP’s fiber jack. After doing those things in a variety of different orders, I eventually got things working again on the older system image.

My suspicion is that offloading broke MAC address spoofing on my WAN interface, which could have caused a majority of my issues. It could also just be buggy offloading implementations on my NICs though.

If I can find some time, I’ll see if I can reproduce these issues in a more controlled test environment instead of in my home network… :slight_smile:

well if anyone is searching for catastrophic connectivity or LAN issues in the latest VyOS issues, you must delete the offload section’s configuration today to resolve your problem.

If this is a bug report, it might pay to state what version you were on and how you arrived at the version you’re on.

I am not sure why I am blocked by Phabricator from making bug reports, but I observed this issue going from 1.5-rolling-202404141045 to 1.5-rolling-202406181053