GRUB Menu fails to load on Serial-only devices (with no KVM)

Problem: boot fails; the Grub Menu doesn’t load beyond a Welcome to GRUB! message in my serial terminal.

Hardware: white-box x86 router platform with no KVM support at all – serial-terminal only

Observations:

  • As far as I can tell, this issue popped up with the new Boot management software as part of the larger “Migrate System Image Utilities to Python” project (T4516), introduced, I believe (for 1.4 Sagitta anyway) in this commit. @zsdc @jestabro
  • If I perform an upgrade-install from an older image using the old boot system, the new image containing the newer boot design boots just fine. (Perhaps because it’s still using the “old” boot setup / install process from the old image?)
  • This particular boot-fails “no GRUB Menu issue” doesn’t seem to be a problem for hardware that does support KVM. (I tested/confirmed two other hardware platforms that have KVM.)
  • Therefore, I’m speculating that some reference in the GRUB config to a (non-existent) [KVM] console (for this serial-only hardware) is the root-cause, however, I’m still digging.
  • Posting all this in case anyone else runs into the issue or has further insight/recommendations on GRUB diagnostics. Suggestions are welcome! (I’m not a GRUB expert!)

Tested with:

  • Private build 1.4-20231221-amd64 (containing latest Sagitta code as of Dec 20, 2023)
  • Latest vyos-1.5-rolling-202312191154-amd64 downloaded from VyOS nightly builds

I encountered this the other day when installing the rolling update from this week. Realized my machine was booting off UEFI mode. It worked when I set my boot to BIOS mode.

Try BIOS mode for booting your drive?

The question is trivial but still. During installation, what type of console did you select?

What console should be used by default? (K: KVM, S: Serial, U: USB-Serial)? (Default: K)

And what is your console number (ttyS0, ttyS2, etc.)?

By default is initialized ttyS0, just like it was for 1.2-1.3. Maybe you do need another number?

Thanks for the input everyone.
@GuybrushThreepwood as far as I know, this particular hardware I’m working with is UEFI-only. (Oh, and welcome to the community!)
@zsdc I selected S at install, and yes, ttyS0 is correct for me.

I did make some more discoveries and have a hacky workaround for the moment.

Here’s what I found: the system actually was booting, however, because Serial wasn’t working, and (at the time), I didn’t have networking configured, I didn’t have any way to reach it to know that. Today, I configured networking manually by booting to another distro via USB stick, mounting the filesystem and edited /config/config.boot directly. This (in conjunction with my tinkering detailed below) enabled me to confirm that the system is still booting even when nothing is displaying correctly.

It turns out the issue (for me, at least) is needing to specify the speed for the Serial port for both GRUB and the OS itself. For my hardware, I need to specify 115200 in two places for things to work.

  1. To get GRUB to load properly in my serial terminal, I needed to modify /boot/grub/grub.cfg. I added these lines (obtained from another system running an older image):
serial --unit=0 --speed=115200
terminal_output --append serial
terminal_input serial console
  1. And to get the OS to work properly with Serial post-GRUB-menu, I needed to modify the image-specific GRUB file; in my case /boot/grub/grub.cfg.d/vyos-versions/1.4-20231221.cfg, and append the Serial speed.

Line 10 (most important; normal boot):

set boot_opts="${boot_opts} console=${console_type}${console_num},115200,"

Line 8 (required for recovery)

set boot_opts="${boot_opts} console=${console_type}${console_num},115200 init=/usr/bin/busybox init"

By making both of those GRUB file modifications, Serial again works as expected.

I’m not sure what the “best” systemic fix would be; perhaps the new GRUB configuration gadgetry would permit the setting of speed via a variable, like the other parameters? If so, it would be essential that it be referenced for both the GRUB menu loading AND the OS boot itself.

Let me know if you need more information, or if there’s another way I can contribute toward getting a fix baked-in to the code.

3 Likes

Oh I faced the issue with the baudrate as well. Last night I compared between 1.3.3 LTS, 1.4.0rc1 and 1.5-rolling and realised that while 115200 is supposed to be the vyos default baudrate for serial console, it only works in 1.3, but not 1.4 or 1.5.

It may have regressed, or probably more accurately, not effected since 1.4/1.5 introduced a different approach to grub.cfg that is entirely different from 1.3

I had to cut short my investigation to leave for my xmas vacation, but I may look into this further, probably also to help patch the functionality back into 1.4/1.5

By the way @marvin , what appliance you’re installing onto? I’m currently trying to get vyos working on my Dell Edge 640 appliance, which I just found out needed newer out-of-tree Intel ixgbe driver for using with unsupported sfp transceivers -.-"

We’re experiencing the same problem on a Dell VEP4600 (both unsupported transceivers but fortunately we can reprogram our optics), and the grub issue. If we install 1.4 we can get it to boot, but a fresh install of 1.5 rolling and we just get “Welcome to Grub”

Thank you for the fix, my PC Engines APU2 & 3 are also affected.

See also ⚓ T5910 Grub problem(?) Serial Console no longer working.

1 Like

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.