Weird behavior with cloud-init

Hi everyone!

We’ve just migrated our full config to be deployed via cloud-init (instead of staging cloud-init and Ansible) to simplify our setup. So far so good, we’re creating beautiful and long cloud-init user-data files, that are syntactically correct, but still fail to load:

[   37.018227] vyos-router[1337]: Mounting VyOS Config...done.
[   39.309720] vyos-router[1337]:  migrate activate configure.
[   40.178320] vyos-config[1343]: Configuration error

We’re also not getting any prompt after this. The vyos-config-debug kernel cmd only partially helps, as with most combinations of commands, we still not get a prompt. We’ve determined some (not sure yet how many) firewall rules and the haproxy load balancer commands to be causing the failure. Interestingly enough, with a partial haproxy config, it still complains about a failure, but at least gets us to a prompt, allowing to login as one of the just created users. In that case the log output under /tmp lists haproxy as failed to load, but apparently it managed to pull through with other config parts. The actually loaded config appears correct except the haproxy section existing but being empty.

My overall question here is, how am I supposed to debug this, as generating different cloud-init seed.iso with one line changes is quite a time consuming task (looking at the amount of firewall rules we have). Also, when we start up a VyOS instance with minimal config and apply our set commands via config on the running router, it accepts the whole file, not complaining about a single line. So it seems like in the scope of cloud-init something else might be causing these issues. Are there any known but not documented limitations or might there be a bug (I’d be willing to investigate, given some pointers about where to focus first).

Almost forgot this: we’re running self build nightlies (due to adding some prometheus exporters). Currently we are on 1.5-rolling-202604291347, so a build from the 29. April 2026. We’re in the process of creating a new build to get native copy.fail and dirtyfrag mitigations. Gotta check upstream first though.

Best regards!

The best way to debug this would be to wait until you receive the configuration error message, then reboot the host into recovery mode, grab the config.boot file from the host, and try to load it on another running host using the load /path/to/config.boot command.

This will help you see which parts of the configuration are incorrect and cannot be applied. You can then either fix them or raise a bug report if needed.

I was unaware of the recovery mode. Don’t I need credentials to login though? Are there any if applying the config failed to begin with?

Also, regarding the idea of moving the resulting config file to another host to load it there: Is there something like the config → commands filter in the opposite direction? That would shorten that path by simply letting me translate the set commands into a config, which I can try to load directly. Or even advise cloud-init to load such a file straight away, instead of doing a list of set commands.

There is no need to enter any credentials for recovery mode. It is for recovery purposes, including situations when credentials are broken.

You also need a config.boot file, not commands. The reason is that the system loads during boot the config.boot file, it doesn’t load commands. To properly mimic behavior, you need to try to load exactly the same config file as was produced by cloud-init on the affected instance.

There is no need to enter any credentials for recovery mode. It is for recovery purposes, including situations when credentials are broken.

That is really good to know!

You also need a config.boot file, not commands. The reason is that the system loads during boot the config.boot file, it doesn’t load commands. To properly mimic behavior, you need to try to load exactly the same config file as was produced by cloud-init on the affected instance.

I think there is a misunderstanding. I’m aware of the config file and the difference from that to actual set commands. I was just asking if there is a way to translate from commands to the resulting config without executing them on an actual instance. Just like it is possible to translate an existing config into commands via dedicated python (?) filter/script. The thought behind this was twofold:

  1. catch obvious errors right away during conversion
  2. skip the command-by-command applying on initial startup (i.e. skip cloud-init, accelerate the boot process; somewhat masking the issue I face but being a better solution overall)

Anyway, I managed to fix the issue without more sophisticated debugging, as I already reduced my overall config to a few lines that were constantly failing. Turns out it was just missing quotes, mostly around IP addresses. That is kind of weird, as I’d expect the same command syntax (and escapes) to work the same be it in cloud-init or regular config mode. Though I understand the commands running through cloud-init might have to go through another layer of shell, which would explain this.