Vyos hostname configuration failure and it triggers vyos-router restarted

Hi,

Actually there are two problems happened here, but they are related.

This is log when I configured hostname:

[edit]
vyos@test-1# set system host-name test-2
[edit]
vyos@test-1# commit

Message from root@test-1 on (none) at 13:41 ...
Active configuration has been changed by user 'root' on '?'.
Please make sure you do not have conflicting changes. You can also discard
the current changes by issuing 'exit discard'.
EOF
[ system host-name test-2 ]
Could not connect to vyos-hostsd

[[]] failed
Commit failed
Failed to generate committed config
[edit]
vyos@test-1# commit
No configuration changes to commit
[edit]
vyos@test-1# set system host-name test-2
DEBUG vexit_internal: calling getCompletionEnv() without config session
DEBUG vexit_internal: calling getCompletionEnv() without config session
DEBUG vexit_internal: calling getCompletionEnv() without config session
[edit]
vyos@test-1#

The hostname configuration failure is caused by vyos-hostsd, obviously, something failed and trigger that restart:

Dec 17 13:41:05 test-1 systemd[1]: vyos-hostsd.service: main process exited, code=exited, status=1/FAILURE
Dec 17 13:41:05 test-1 systemd[1]: Unit vyos-hostsd.service entered failed state.
Dec 17 13:41:05 test-1 systemd[1]: vyos-hostsd.service holdoff time over, scheduling restart.
Dec 17 13:41:05 test-1 systemd[1]: Stopping VyOS DNS configuration keeper...
Dec 17 13:41:05 test-1 systemd[1]: Starting VyOS DNS configuration keeper...
Dec 17 13:41:05 test-1 systemd[1]: Started VyOS DNS configuration keeper.

I only can identify that the code reply_msg = self.__socket.recv().decode() in _communicate function of /usr/lib/python3/dist-packages/vyos/hostsd_client.py triggered the problem, maybe there was something wrong in socket communication between client and server, I’m not good at it, so please help to identify the root cause.

But if it’s just about restart of vyos-hostsd, the log won’t be like above. There is another consequent problem, at same time, the vyos-router service is restarted, actaully it’s triggered by restart of vyos-hostsd. So all configuration is reloaded at that time.

Dec 17 13:41:05 test-1 systemd[1]: vyos-router.service: control process exited, code=exited status=127
Dec 17 13:41:05 test-1 systemd[1]: Unit vyos-router.service entered failed state.
Dec 17 13:41:05 test-1 systemd[1]: Starting VyOS Router...
Dec 17 13:41:05 test-1 systemd[1]: Started VyOS Router.

I checked /lib/systemd/system/vyos-hostsd.service, and found that they are dependency in service definition, I don’t know if it’s the cause and if it’s on purpose. But I think it’s not reasonable that restarting vyos-hostsd to trigger restarting of vyos-router which lead to many thing reconfigured unintentionally , it’s very big impact.

[Install]
RequiredBy=cloud-init-local.service vyos-router.service

best regards.

Hi @MapleWang
What version of the VyOS are you using?

@Viacheslav it’s vyos 1.2.3, actually, I found the same problem in latest rolling version either.

I don’t see this problem in VyOS 1.2-rolling-201912170217
I try other versions.

vyos@1.2-roll-ns# set system host-name test-2
[edit]
vyos@1.2-roll-ns# commit
[edit]
vyos@1.2-roll-ns# set system host-name test-1
[edit]
vyos@1.2-roll-ns# commit
[edit]
vyos@1.2-roll-ns# set system host-name test-2
[edit]
vyos@1.2-roll-ns# commit
[edit]
vyos@1.2-roll-ns#
vyos@1.2-roll-ns# show system host-name 
 host-name test-2
[edit]

I can’t reproduce this bug.

vyos@test-1# set system host-name test-2
[edit]
vyos@test-1# commit
[edit]
vyos@test-1#
vyos@test-1# run show version
Version: VyOS 1.2.3

@Viacheslav, actually I filed bug in ⚓ T1885 vyos hostname configuration failure and it triggers vyos-router restarted, but it seems that it’s faster to get response from forum.

Let me update the current status for this problem:

for first problem, why vyos-hostsd is triggered to restart? after investigation, it should be related to ansible script, we will use ansible to deploy something to vyos, we used following task:

    - name: update apt repository cache
      apt:
        update_cache: yes
      become: yes
      become_user: root

It turned out to triggered following problem, it cause that journald was forced to restarted:

systemd-journald assertion 'n + 20 + (object_pid ? 11 : 0)'   function dispathc_message_real()

How it will be related to vyos-hostsd? I guess it’s because that journald and vyos-hostsd both used the zeromq, everytime systemd-journald is restarted, then zeromq used by vyos-hostds is corrupted. I have no solid proof, but it’s very easy to reproduce:

sudo systemctl restart systemd-journald
# then try to configure your hostname

Anyway, it seems ansible 2.7.2 fixed problem between apt module and systemd-journald, but it’s not my point since there are so many ways to trigger restart of journald.

Then it comes to the second problem. Why restart of vyos-hostsd will trigger restart vyos-router? This is problem concerned me more since it’s big impact of system. So as I asked above, it’s on purpose or it’s by mistaken.

best regards.

HI @Viacheslav,

I think you are not fully getting my point. The failure of setting host name is not so big problem for me. But I accidentally found that restart of vyos-hostsd will trigger restart of vyos-router which is probably destructive when it’s not executed in boot time since it will reload configuration and do something else critical.

I understand that vyos-router and vyos-hostsd may need dependency in boot phase, but this relationship should be cut down in running phase to avoid restart of vyos-router unless you are sure vyos-hostsd will not be restarted in any case in this phase, but it turned out that this daemon is little bit of fragile under current version.

Anyway, hope I express myself clearly. And again, thanks very much for your support.

best regards.