Losing DHCP default route during boot

Hi all

I am currently running 1.4-rolling-202102141111 with no issues. However, i tried updating a few weeks ago, ran into this problem. I just retried and still having this issue on upgrade to 1.4-rolling-202104202252.

The internet interface (DHCP from ISP Spectrum) during boot gets its default route, for approx 4 seconds, then it is removed. I know it must as i can start a ping to 8.8.8.8 during the reboot, nothing, nothing, 3-4 pings, back to nothing, wraps up the boot bam nothing.

Once booted, all interfaces are up have their IP’s, a couple of static routes that i have configured are there, the network routes for the interfaces are there, but missing the default route.

Downing the eth0 interface, then bringing it back up, leaves me in a working state, until the next reboot.
set interfaces ethernet eth0 disable
commit
delete interfaces ethernet eth0 disable
commit

Also, just reverting to the previous image works fine too.
1: 1.4-rolling-202104202252 (default boot) (running image)
2: 1.4-rolling-202102141111

I am including logs of the initial boot (matching dhclient) and the same matching logs from downing and up’ing the eth0 interface after boot.

I have the full set of logs, just a bit big.

Snippet on logs matching dhclient of initial boot…

Apr 22 06:32:23 router sudo[5215]:     root : TTY=unknown ; PWD=/ ; USER=root ; COMMAND=/usr/bin/systemctl stop dhclient@lo.service
Apr 22 06:32:23 router sudo[5252]:     root : TTY=unknown ; PWD=/ ; USER=root ; COMMAND=/usr/bin/systemctl stop dhclient@eth2.service
Apr 22 06:32:23 router sudo[5295]:     root : TTY=unknown ; PWD=/ ; USER=root ; COMMAND=/usr/bin/systemctl stop dhclient@eth1.service
Apr 22 06:32:24 router sudo[5348]:     root : TTY=unknown ; PWD=/ ; USER=root ; COMMAND=/usr/bin/systemctl stop dhclient@eth0.service
Apr 22 06:32:24 router sudo[5350]:     root : TTY=unknown ; PWD=/ ; USER=root ; COMMAND=/usr/bin/systemctl start dhclient@eth0.service
Apr 22 06:32:24 router systemd[1]: Created slice system-dhclient.slice.
Apr 22 06:32:24 router dhclient-script-vyos[5367]: Current dhclient PID: 5358, Parent PID: 5352, IP version: 4, All dhclients for interface eth0: 5352 5358
Apr 22 06:32:24 router dhclient-script-vyos[5367]: Passing command to /usr/sbin/ip: "link set dev eth0 up"
Apr 22 06:32:24 router dhclient-script-vyos[5367]: No changes to apply via vyos-hostsd-client
Apr 22 06:32:24 router dhclient[5358]: DHCPDISCOVER on eth0 to 255.255.255.255 port 67 interval 5
Apr 22 06:32:24 router sudo[5418]:     root : TTY=unknown ; PWD=/ ; USER=root ; COMMAND=/usr/bin/systemctl stop dhclient@eth3.service
Apr 22 06:32:24 router dhclient[5358]: DHCPOFFER of XX.XX.5.136 from XX.XX.0.1
Apr 22 06:32:24 router dhclient[5358]: DHCPREQUEST for XX.XX.5.136 on eth0 to 255.255.255.255 port 67
Apr 22 06:32:24 router dhclient[5358]: DHCPACK of XX.XX.5.136 from XX.XX.0.1
Apr 22 06:32:24 router dhclient-script-vyos[5513]: Current dhclient PID: 5358, Parent PID: 1, IP version: 4, All dhclients for interface eth0: 5358
Apr 22 06:32:24 router dhclient-script-vyos[5513]: Passing command to /usr/sbin/ip: "-4 addr add XX.XX.5.136/255.255.224.0 broadcast 255.255.255.255 valid_lft 3600 preferred_lft 3600 dev eth0 lab
Apr 22 06:32:24 router dhclient-script-vyos[5513]: Passing command to /usr/sbin/ip: "link set dev eth0 mtu 1500"
Apr 22 06:32:24 router dhclient-script-vyos[5513]: FRR status: running
Apr 22 06:32:24 router dhclient-script-vyos[5513]: Checking if the route presented in kernel: default via XX.XX.0.1 dev eth0 metric 210
Apr 22 06:32:24 router dhclient-script-vyos[5513]: Converted vtysh command: "ip route 0.0.0.0/0 XX.XX.0.1 eth0 tag 210 210 "
Apr 22 06:32:24 router dhclient-script-vyos[5513]: Sending command to vtysh
Apr 22 06:32:24 router dhclient-script-vyos[5513]: Deleting nameservers with tag "dhcp-eth0" via vyos-hostsd-client
Apr 22 06:32:24 router dhclient-script-vyos[5513]: Adding nameservers "XX.XX.XXX.1 XX.XX.XXX.2" with tag "dhcp-eth0" via vyos-hostsd-client
Apr 22 06:32:25 router dhclient-script-vyos[5513]: Applying changes via vyos-hostsd-client
Apr 22 06:32:25 router dhclient-script-vyos[5513]: No changes to apply via vyos-hostsd-client
Apr 22 06:32:25 router dhclient[5358]: bound to XX.XX.5.136 -- renewal in 1405 seconds.

Disable/Enable commands yield the following logs matching dhclient

Apr 22 06:42:31 router sudo[6783]:     root : TTY=unknown ; PWD=/ ; USER=root ; COMMAND=/usr/bin/systemctl stop dhclient@eth0.service
Apr 22 06:42:31 router dhclient[6785]: Killed old client process
Apr 22 06:42:31 router dhclient[6785]: Killed old client process
Apr 22 06:42:31 router systemd[1]: dhclient@eth0.service: Main process exited, code=killed, status=15/TERM
Apr 22 06:42:32 router dhclient[6785]: DHCPRELEASE of XX.XX.5.136 on eth0 to XX.XX.0.1 port 67
Apr 22 06:42:32 router dhclient[6785]: send_packet: Network is unreachable
Apr 22 06:42:32 router dhclient[6785]: send_packet: please consult README file regarding broadcast address.
Apr 22 06:42:32 router dhclient[6785]: dhclient.c:2878: Failed to send 300 byte long packet over fallback interface.
Apr 22 06:42:32 router dhclient-script-vyos[6787]: Current dhclient PID: 6785, Parent PID: 1, IP version: 4, All dhclients for interface eth0: 6785
Apr 22 06:42:32 router dhclient-script-vyos[6787]: Passing command to /usr/sbin/ip: "-4 addr flush dev eth0 label eth0"
Apr 22 06:42:32 router dhclient-script-vyos[6787]: Deleting search domains with tag "dhcp-eth0" via vyos-hostsd-client
Apr 22 06:42:32 router dhclient-script-vyos[6787]: Deleting nameservers with tag "dhcp-eth0" via vyos-hostsd-client
Apr 22 06:42:32 router dhclient-script-vyos[6787]: Deleting default route: via XX.XX.0.1 dev eth0 metric 210
Apr 22 06:42:32 router dhclient-script-vyos[6787]: FRR status: running
Apr 22 06:42:32 router dhclient-script-vyos[6787]: Checking if the route presented in kernel: default via XX.XX.0.1 dev eth0 metric 210
Apr 22 06:42:32 router dhclient-script-vyos[6787]: Converted vtysh command: "no ip route 0.0.0.0/0 XX.XX.0.1 eth0 tag 210 210 "
Apr 22 06:42:32 router dhclient-script-vyos[6787]: Sending command to vtysh
Apr 22 06:42:32 router dhclient-script-vyos[6787]: Applying changes via vyos-hostsd-client
Apr 22 06:42:33 router systemd[1]: dhclient@eth0.service: Succeeded.
Apr 22 06:42:40 router sudo[6929]:     root : TTY=unknown ; PWD=/ ; USER=root ; COMMAND=/usr/bin/systemctl stop dhclient@eth0.service
Apr 22 06:42:40 router sudo[6931]:     root : TTY=unknown ; PWD=/ ; USER=root ; COMMAND=/usr/bin/systemctl start dhclient@eth0.service
Apr 22 06:42:40 router dhclient-script-vyos[6935]: Current dhclient PID: 6934, Parent PID: 6933, IP version: 4, All dhclients for interface eth0: 6933 6934
Apr 22 06:42:40 router dhclient-script-vyos[6935]: Passing command to /usr/sbin/ip: "link set dev eth0 up"
Apr 22 06:42:40 router dhclient-script-vyos[6935]: No changes to apply via vyos-hostsd-client
Apr 22 06:42:41 router dhclient[6934]: DHCPDISCOVER on eth0 to 255.255.255.255 port 67 interval 8
Apr 22 06:42:41 router dhclient[6934]: DHCPOFFER of XX.XX.5.136 from XX.XX.0.1
Apr 22 06:42:41 router dhclient[6934]: DHCPREQUEST for XX.XX.5.136 on eth0 to 255.255.255.255 port 67
Apr 22 06:42:41 router sudo[6986]:     root : TTY=unknown ; PWD=/ ; USER=root ; COMMAND=/sbin/dhclient -pf /var/lib/dhcp/dhclient_eth0.pid -x eth0
Apr 22 06:42:41 router dhclient[6988]: Killed old client process
Apr 22 06:42:41 router systemd[1]: dhclient@eth0.service: Main process exited, code=killed, status=15/TERM
Apr 22 06:42:41 router systemd[1]: dhclient@eth0.service: Succeeded.
Apr 22 06:42:41 router systemd[1]: dhclient@eth0.service: Service RestartSec=100ms expired, scheduling restart.
Apr 22 06:42:41 router systemd[1]: dhclient@eth0.service: Scheduled restart job, restart counter is at 1.
Apr 22 06:42:41 router dhclient-script-vyos[7002]: Current dhclient PID: 7000, Parent PID: 6999, IP version: 4, All dhclients for interface eth0: 6973 6987 6988 6999 7000
Apr 22 06:42:41 router dhclient-script-vyos[7002]: Stopping dhclient with PID: 6973, PID file:
Apr 22 06:42:41 router dhclient-script-vyos[7002]: PID file  does not exists, killing dhclient with SIGTERM signal
Apr 22 06:42:41 router dhclient-script-vyos[7002]: Stopping dhclient with PID: 6987, PID file: /var/lib/dhcp/dhclient_eth0.pid
Apr 22 06:42:41 router dhclient-script-vyos[7002]: PID file /var/lib/dhcp/dhclient_eth0.pid does not exists, killing dhclient with SIGTERM signal
Apr 22 06:42:41 router sudo[7030]:     root : TTY=unknown ; PWD=/ ; USER=root ; COMMAND=/usr/bin/rm -f /var/lib/dhcp/dhclient_eth0.pid
Apr 22 06:42:41 router sudo[7033]:     root : TTY=unknown ; PWD=/ ; USER=root ; COMMAND=/sbin/dhclient -q -nw -cf /var/lib/dhcp/dhclient_eth0.conf -pf /var/lib/dhcp/dhclient_eth0.pid -lf /var/lib
Apr 22 06:42:41 router dhclient-script-vyos[7002]: Stopping dhclient with PID: 6988, PID file: /var/lib/dhcp/dhclient_eth0.pid
Apr 22 06:42:41 router dhclient-script-vyos[7002]: PID file /var/lib/dhcp/dhclient_eth0.pid does not exists, killing dhclient with SIGTERM signal
Apr 22 06:42:41 router dhclient-script-vyos[7036]: Current dhclient PID: 7035, Parent PID: 7034, IP version: , All dhclients for interface eth0: 6999 7000 7034 7035
Apr 22 06:42:41 router dhclient-script-vyos[7036]: Stopping dhclient with PID: 6999, PID file: /var/lib/dhcp/dhclient_eth0.pid
Apr 22 06:42:41 router dhclient-script-vyos[7036]: PID file /var/lib/dhcp/dhclient_eth0.pid does not exists, killing dhclient with SIGTERM signal
Apr 22 06:42:41 router systemd[1]: dhclient@eth0.service: Main process exited, code=killed, status=15/TERM
Apr 22 06:42:41 router dhclient-script-vyos[7002]: Passing command to /usr/sbin/ip: "link set dev eth0 up"
Apr 22 06:42:41 router dhclient-script-vyos[7002]: No changes to apply via vyos-hostsd-client
Apr 22 06:42:41 router dhclient-script-vyos[7036]: Stopping dhclient with PID: 7000, PID file: /var/lib/dhcp/dhclient_eth0.pid
Apr 22 06:42:41 router dhclient-script-vyos[7036]: PID file /var/lib/dhcp/dhclient_eth0.pid does not exists, killing dhclient with SIGTERM signal
Apr 22 06:42:41 router dhclient-script-vyos[7036]: Passing command to /usr/sbin/ip: "link set dev eth0 up"
Apr 22 06:42:41 router dhclient-script-vyos[7036]: No changes to apply via vyos-hostsd-client
Apr 22 06:42:41 router dhclient[7035]: DHCPDISCOVER on eth0 to 255.255.255.255 port 67 interval 5
Apr 22 06:42:41 router systemd[1]: dhclient@eth0.service: Succeeded.
Apr 22 06:42:41 router dhclient[7035]: DHCPOFFER of XX.XX.5.136 from XX.XX.0.1
Apr 22 06:42:41 router dhclient[7035]: DHCPREQUEST for XX.XX.5.136 on eth0 to 255.255.255.255 port 67
Apr 22 06:42:41 router dhclient[7035]: DHCPACK of XX.XX.5.136 from XX.XX.0.1
Apr 22 06:42:41 router dhclient-script-vyos[7091]: Current dhclient PID: 7035, Parent PID: 1, IP version: , All dhclients for interface eth0: 7035
Apr 22 06:42:41 router dhclient-script-vyos[7091]: Passing command to /usr/sbin/ip: "-4 addr add XX.XX.5.136/255.255.224.0 broadcast 255.255.255.255 valid_lft 2983 preferred_lft 2983 dev eth0 lab
Apr 22 06:42:41 router dhclient-script-vyos[7091]: Passing command to /usr/sbin/ip: "link set dev eth0 mtu 1500"
Apr 22 06:42:41 router dhclient-script-vyos[7091]: FRR status: running
Apr 22 06:42:41 router dhclient-script-vyos[7091]: Checking if the route presented in kernel: default via XX.XX.0.1 dev eth0 metric 210
Apr 22 06:42:41 router dhclient-script-vyos[7091]: Converted vtysh command: "ip route 0.0.0.0/0 XX.XX.0.1 eth0 tag 210 210 "
Apr 22 06:42:41 router dhclient-script-vyos[7091]: Sending command to vtysh
Apr 22 06:42:41 router systemd[1]: dhclient@eth0.service: Service RestartSec=100ms expired, scheduling restart.
Apr 22 06:42:41 router systemd[1]: dhclient@eth0.service: Scheduled restart job, restart counter is at 2.
Apr 22 06:42:41 router dhclient-script-vyos[7091]: Deleting nameservers with tag "dhcp-eth0" via vyos-hostsd-client
Apr 22 06:42:41 router dhclient-script-vyos[7150]: Current dhclient PID: 7149, Parent PID: 7148, IP version: 4, All dhclients for interface eth0: 7035 7148 7149
Apr 22 06:42:41 router dhclient-script-vyos[7150]: Stopping dhclient with PID: 7035, PID file: /var/lib/dhcp/dhclient_eth0.pid
Apr 22 06:42:41 router dhclient-script-vyos[7150]: PID file /var/lib/dhcp/dhclient_eth0.pid does not exists, killing dhclient with SIGTERM signal
Apr 22 06:42:41 router dhclient-script-vyos[7150]: Passing command to /usr/sbin/ip: "link set dev eth0 up"
Apr 22 06:42:41 router dhclient-script-vyos[7150]: No changes to apply via vyos-hostsd-client
Apr 22 06:42:41 router dhclient-script-vyos[7091]: Adding nameservers "XX.XX.XXX.1 XX.XX.XXX.2" with tag "dhcp-eth0" via vyos-hostsd-client
Apr 22 06:42:41 router dhclient[7149]: DHCPDISCOVER on eth0 to 255.255.255.255 port 67 interval 4
Apr 22 06:42:41 router dhclient[7149]: DHCPOFFER of XX.XX.5.136 from XX.XX.0.1
Apr 22 06:42:41 router dhclient[7149]: DHCPREQUEST for XX.XX.5.136 on eth0 to 255.255.255.255 port 67
Apr 22 06:42:41 router dhclient[7149]: DHCPACK of XX.XX.5.136 from XX.XX.0.1
Apr 22 06:42:41 router dhclient-script-vyos[7091]: Applying changes via vyos-hostsd-client
Apr 22 06:42:41 router dhclient-script-vyos[7189]: Current dhclient PID: 7149, Parent PID: 1, IP version: 4, All dhclients for interface eth0: 7149
Apr 22 06:42:41 router dhclient-script-vyos[7091]: No changes to apply via vyos-hostsd-client
Apr 22 06:42:41 router dhclient-script-vyos[7189]: Passing command to /usr/sbin/ip: "-4 addr add XX.XX.5.136/255.255.224.0 broadcast 255.255.255.255 valid_lft 2982 preferred_lft 2982 dev eth0 lab
Apr 22 06:42:41 router dhclient-script-vyos[7189]: Passing command to /usr/sbin/ip: "link set dev eth0 mtu 1500"
Apr 22 06:42:41 router dhclient-script-vyos[7189]: FRR status: running
Apr 22 06:42:41 router dhclient-script-vyos[7189]: Checking if the route presented in kernel: default via XX.XX.0.1 dev eth0 metric 210
Apr 22 06:42:41 router dhclient-script-vyos[7189]: Converted vtysh command: "ip route 0.0.0.0/0 XX.XX.0.1 eth0 tag 210 210 "
Apr 22 06:42:41 router dhclient-script-vyos[7189]: Sending command to vtysh
Apr 22 06:42:42 router dhclient-script-vyos[7189]: Deleting nameservers with tag "dhcp-eth0" via vyos-hostsd-client
Apr 22 06:42:42 router dhclient-script-vyos[7189]: Adding nameservers "XX.XX.XXX.1 XX.XX.XXX.2" with tag "dhcp-eth0" via vyos-hostsd-client
Apr 22 06:42:42 router dhclient-script-vyos[7189]: Applying changes via vyos-hostsd-client
Apr 22 06:42:42 router dhclient-script-vyos[7189]: No changes to apply via vyos-hostsd-client
Apr 22 06:42:42 router dhclient[7149]: bound to XX.XX.5.136 -- renewal in 1194 seconds.

Thank you so much… Currently staying on my working image but can try anything, once the wife is done with her netflix :frowning:

Chris

Hi @newfdaddy!
Can you share the full config and log (not only DHCP parts) from this router?

Hi @zsdc !

Sure thing and thank you!
vyos-dhcp-issue-failed-boot-stripped.txt (1.2 MB) vyos-config-stripped.txt (17.8 KB)

Thank you, Chris!
This is very interesting: according to the provided logs, the default route should be presented in the system. I must ask you for additional details at the moment when the problem occurs:

show ip route
sudo ip r
sudo vtysh -c 'show running-config'

Yes, it is presented (at some point) during the configuration at boot time… As I said, if I start a ping to 8.8.8.8 it starts working during the boot for a few seconds, then stops working. Unfortunately, all of this happens before the login prompt is provided at the console.

I can get to a serial console (virtual) and see if I can run these commands there, or maybe ssh, not sure if ssh is started at that point.

Thank you, to be honest I figured someone was gonna find something dumb I did and give me the old virtual smack in the head… LOL

Everything is possible, including bugs in the system. I just cannot find any messages that may indicate problems with routes in the logs. At least, not that what is known for me. Therefore, I would like to see a bit deeper, to understand what is actually going on - maybe the problem exists but not in DHCP, etc.

So… I did find an interesting log /var/log/frr/frr-reload.log… I am not sure how I can execute those commands right when the issue happens as ssh isnt up, the console hasnt gotten me to a login prompt, and the serial console, same deal.

Also, i did record a screen session for the pre, upgrade routine, post upgrade, what i see, booting while pinging out. I am not 100% certain if it helpful or not… I uploaded it to youtube…

root@router:/var/log/frr# cat frr-reload.log 
2021-04-27 16:06:47,316  INFO: Called via "Namespace(bindir='/usr/bin', confdir='/etc/frr', daemon='zebra', debug=False, filename='/tmp/tmp8u_2nrfv', input=None, log_level='info', overwrite=False, pathspace=None, reload=True, rundir='/var/run/frr', stdout=False, test=False, vty_socket=None)"
2021-04-27 16:06:47,316  INFO: Loading Config object from file /tmp/tmp8u_2nrfv
2021-04-27 16:06:47,468  INFO: Loading Config object from vtysh show running
2021-04-27 16:06:47,590  INFO: Loading Config object from vtysh show running
2021-04-27 16:06:47,784  INFO: Called via "Namespace(bindir='/usr/bin', confdir='/etc/frr', daemon='staticd', debug=False, filename='/tmp/tmp1fnd3sk7', input=None, log_level='info', overwrite=False, pathspace=None, reload=True, rundir='/var/run/frr', stdout=False, test=False, vty_socket=None)"
2021-04-27 16:06:47,784  INFO: Loading Config object from file /tmp/tmp1fnd3sk7
2021-04-27 16:06:47,935  INFO: Loading Config object from vtysh show running
2021-04-27 16:06:48,063  INFO: /var/run/frr/reload-NJNAR8.txt content
['no ip route 0.0.0.0/0 96.35.0.1 eth0 tag 210 210\n',
 'ip route 10.96.0.0/12 192.168.2.17\n',
 'ip route 192.168.200.0/24 192.168.2.6\n']
2021-04-27 16:06:48,142  INFO: Loading Config object from vtysh show running
2021-04-27 16:06:48,264  INFO: /var/run/frr/reload-6J2EYV.txt content
['ip route 10.96.0.0/12 192.168.2.17\n',
 'ip route 192.168.200.0/24 192.168.2.6\n']
root@router:/var/log/frr# exit

I can also confirm when you just add a static route on the latest 1.4 rollings, it drops the default route received via DHCP

Thank you, @newfdaddy!
I believe this is it. It looks like a race condition between DHCP and static configuration in FRR during boot. We will check this.

FYI, here is the task in the bug tracker for this problem. All technical questions move there: ⚓ T3505 Commits do not respect changes in FRR that are not stored in a config

1 Like

Thank you @zsdc @Dmitry

I’ll keep an eye on this. If there is anything I can do (testing etc), reach out. Again, thank you for all the time and effort that goes into making this such a great platform.

I figured I’d go ahead, with this current thought. I performed the following:

  1. Removed the 2 static routes I have (they arent used that frequently, no big deal)

    Interestingly, removing the 2 static routes also caused the DHCP provided default route to be dropped.

  2. Upgraded to the latest rolling

Same end result, no default route, no static routes (I removed them beforehand), no internet.

The /var/log/frr/frr-reload.log file shows the same removal of the default route, but no addition of the 2 static routes (to be expected).

So, perhaps the static routes arent causing this (at least in my case)

The problem is triggered by any changes in the staticd config, including static routes. It is deeper that may look at the beginning, and requires changes in FRR daemons config design.