I have a weird case of multiple instances of VyOS losing their public IP addresses on a VMware vCenter infrastructure.
There have been network events on the provider side, and I think it’s linked, but I’m not sure.
When connecting to the VyOS instances, a simple
set interfaces ethernet... command put the IP back and all was working fine again.
But i’d like to know what happened and make sure it doesn’t happen again. I need some help to investigate. What could have been the issue here and how to check it on VyOS side ?
Thanks for your help
Do you use static or dynamic IP address?
After you set the command did you do “commit” followed by “save”?
the IP address was static, and it was configured with cloud-init. It worked, untouched, for weeks before the issue happened. What’s also weird is that it happened to multiple instances of VyOS at the same time. When it happened, our cloud provider was making changes on the public network side. I believe this is what caused the issue, but what I don’t understand is why the IP was totally unset (by unset I mean the
ip eth0 command showed no IP on the interface, even though this interface was configured.
This is the state of the eth0 interface when the issue happened :
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
link/ether 00:50:56:redacted brd ff:ff:ff:ff:ff:ff
valid_lft forever preferred_lft forever
inet6 fe80::250:56ff:fea8:9d02/64 scope link
valid_lft forever preferred_lft forever
But when running
show configuration commands this was the result :
set interfaces ethernet eth0 address 'redacted/32'
set interfaces ethernet eth0 description 'WAN'
set interfaces ethernet eth0 hw-id '00:50:56:redacted'
In order to get the IP address set again, a reboot wasn’t enouth.
Also, a simple
set interfaces ethernet... wasn’t enough, and gave me this result :
Configuration path: [interfaces ethernet eth0 address redacted/32] already exists
I had to
delete the IP configuration, then
set it again to make it work.
Try take a look at which revisions of the config you got in case your provider did something through cloud-init?
The box where VyOS is running virtually is that owned by you or is it somebody elses computer?
Actually I am the one who created the cloud-init configuration using Terraform, and they have been deployed once with no revision since.
We are hosted at OVH private cloud. They host the hypervisors and we run them. They also control all the networking, especially the public network.
Another thing I should tell is only the interface connected to the public network lost the IP configuration. The other interfaces were still configured and running.
I dont have too much of experience of cloud-init but to me that have been a way to set the initial configuration and once a config is set then the config (in VyOS case) of /config/config.boot is being used.
That is after initial startup set the config through:
and then remove the cloud-init stuff set by the vm-host and reboot and the image should be able to survive a reboot etc without losing configuration.
A guess is that OVH after their re-arrange of public Internet perhaps only configured a IPv6 address for you to be used leaving IPv4 fields empty?
I think maybe I’m not clear enough, I’ll try to elaborate a bit.
I don’t believe cloud-init is the issue here. The VyOS has been configured through it and has been rebooted a few times without losing its configuration.
Also, the ipv4 configuration was still here and OK, it’s the interface that lost its IP.
And after a reboot in this state the IP was still not set, even though the configuration was OK. A “delete”, “set” and then “commit” made the interface take the IP configuration again.
And the IPv4 network from OVH is OK, I’m not sure of the changes they’ve made. what I’m sure of is that the ethernet interface lost its IP when they made changes.
OVH has no control over the VyOS configuration, they control the routers (virtual or physical) connected to the vmware ESX infrastructure.
Finally, all is working fine now already. What I would like to understand is how an ethernet interface can lose its IP without a configuration change, and not take it back until we delete and set the configuration again.
Yeah but since they control the VM-hosts it also means they are in control of the cloud-init parameters so if they got changed then of course funny things can occur within VyOS. For example that cloud-init is enabled but the IPv4 fields after OVH did their changes were cleared.
Another possibility is if OVH somehow changed to a IP-adress/subnetmask that somehow isnt valid for the backend like you saw the config but since there were a “,” instead of a “.” or the IP collided with netaddress or broadcastaddress of the subnetmask (cidr) to be used the backend failed to properly configure the IP-address.
VyOS config-mode (and op-mode) is basically just a frontend towards stuff made directly in the linux kernel through common userland tools such as ip, zebra (frr) etc.
Do you perhaps have some logs remaining from this period to perhaps pickup through them why the IP-address went on a vacation?
They control the hosts, in a way that they set them up, manage them, but they have no control over the VMs installed on the hosts. We have full control of the VMs, we create them (manually or via Terraform) and manage them.
So when we create a VyOS instance we have full control of it. We connect the interface to a VM Network controlled by OVH, and we configure the VyOS with the public IP. ARP does the rest (or so I think).
The changes they made were related to the elements behind this “VM Network”.
And I repeat, deleting the configuration and configuring the very same IP address on the VyOS made it work again, as if nothing changed (but only this action did the trick).
Where can I find the kernel logs ? I checked
/var/log/messages but apart from multiple ssh brute force attempts, I noticed nothing really relevant here.
Do you know if the virtual MAC changed, or if you temporarily had a new interface?
This might be what you’re looking for as well.
show log kernel
Hi giga, thanks for the answer.
The mac address didn’t change, I can see the configuration and the real mac address still match. I also don’t think it had a new interface, even temporarily.
I was on holiday, so I missed the chance to check the logs, and now they’re gone. But if it ever happens again I’ll try that.
I’m closing the topic since we can’t go further in looking for the cause, I’ll reopen it in case it happens again and I find something new
Thank you both, though, this community is really great
edit : it seems I can’t change the topic name
Hi @Mrik, We’re really glad you’re finding value in our community! Your positive feedback means a lot to us and fuels our commitment to maintaining this supportive and collaborative space. Our members are truly great, and their willingness to help others is what make this community so special.
As for the topic name, I can assist you in updating it. Just let me know what you’d like it to be, and I’ll take care of the rest. You can send me a private message.
If you have any more questions, thoughts, or if you encounter new developments, don’t hesitate to share. We’re here to help.