Static routes and default route randomly lost with watchfrr errors

I had an VyOS (VyOS 1.4-rolling-202103210217) instance running inside virtual machine as my WAN router.

The configuration is quite simple:

  • IPv4 & IPv6 Internet access via single PPPoE connection
  • DHCP and DNS service running on VyOS, serving LAN clients
  • Basic SNAT (masquerade, for Internet) and some DNAT rules applied
  • Not using any dynamic routing features

Strangely, sometimes the internet access from the whole LAN suddenly interrupted unexpectly.

After simple troubleshooting I found these symptoms:

  • PPPoE connection still alive (interface pppoe0 was connected state)
  • Default route lost from kernel routing table ($ ip route)
  • Some error messages shows up in system journal that I can’t understand

I tried these actions to get my Internet access back:

  • Reboot system. Obviously it works.
  • Do “disconnect interface pppoe0; connect interface pppoe0” in operational mode. It also works.

Well, the problem occurs not so frequently, but it does make me feel painful.

Is there anyone could help to see what happened?

Some log attached:

May 08 22:29:44 zzstudio-vyos snmpd[2359]: truncating integer value > 32 bits
May 08 22:29:44 zzstudio-vyos ripd[1002]: snmp[info]: NET-SNMP version 5.7.3 AgentX subagent 
connected
May 08 22:29:44 zzstudio-vyos ripd[1002]: [EC 100663313] SLOW THREAD: task agentx_timeout (7f3804e279d0) ran for 526867ms (cpu time 0ms)
May 08 22:29:44 zzstudio-vyos ripd[1002]: Terminating on signal
May 08 22:29:44 zzstudio-vyos zebra[984]: [EC 4043309122] Client 'rip' encountered an error and is shutting down.
May 08 22:29:44 zzstudio-vyos watchfrr[944]: [EC 268435457] ripd state -> down : unexpected read error: Connection reset by peer
May 08 22:29:44 zzstudio-vyos zebra[984]: snmp[info]: NET-SNMP version 5.7.3 AgentX subagent connected
May 08 22:29:44 zzstudio-vyos zebra[984]: [EC 100663313] SLOW THREAD: task agentx_timeout (7f5ccbf729d0) ran for 531954ms (cpu time 0ms)
May 08 22:29:44 zzstudio-vyos zebra[984]: Terminating on signal
May 08 22:29:44 zzstudio-vyos zebra[984]: release_daemon_table_chunks: Released 0 table chunks
May 08 22:29:44 zzstudio-vyos zebra[984]: client 31 disconnected 0 bgp routes removed from the rib
May 08 22:29:44 zzstudio-vyos zebra[984]: release_daemon_table_chunks: Released 0 table chunks
May 08 22:29:44 zzstudio-vyos zebra[984]: client 34 disconnected 0 vnc routes removed from the rib
May 08 22:29:44 zzstudio-vyos zebra[984]: release_daemon_table_chunks: Released 0 table chunks
May 08 22:29:44 zzstudio-vyos zebra[984]: client 41 disconnected 0 rip routes removed from the rib
May 08 22:29:44 zzstudio-vyos zebra[984]: release_daemon_table_chunks: Released 0 table chunks
May 08 22:29:44 zzstudio-vyos zebra[984]: client 46 disconnected 0 ripng routes removed from the rib
May 08 22:29:44 zzstudio-vyos zebra[984]: release_daemon_table_chunks: Released 0 table chunks
May 08 22:29:44 zzstudio-vyos zebra[984]: client 51 disconnected 0 ospf routes removed from the rib
May 08 22:29:44 zzstudio-vyos zebra[984]: release_daemon_table_chunks: Released 0 table chunks
May 08 22:29:44 zzstudio-vyos zebra[984]: client 56 disconnected 0 ospf6 routes removed from the rib
May 08 22:29:44 zzstudio-vyos zebra[984]: release_daemon_table_chunks: Released 0 table chunks
May 08 22:29:44 zzstudio-vyos zebra[984]: client 61 disconnected 0 isis routes removed from the rib
May 08 22:29:44 zzstudio-vyos zebra[984]: release_daemon_table_chunks: Released 0 table chunks
May 08 22:29:44 zzstudio-vyos zebra[984]: client 66 disconnected 0 ldp routes removed from the rib
May 08 22:29:44 zzstudio-vyos zebra[984]: release_daemon_table_chunks: Released 0 table chunks
May 08 22:29:44 zzstudio-vyos zebra[984]: client 71 disconnected 0 ldp routes removed from the rib
May 08 22:29:44 zzstudio-vyos zebra[984]: release_daemon_table_chunks: Released 0 table chunks
May 08 22:29:44 zzstudio-vyos zebra[984]: client 76 disconnected 5 static routes removed from the rib
May 08 22:29:44 zzstudio-vyos zebra[984]: release_daemon_table_chunks: Released 0 table chunks
May 08 22:29:44 zzstudio-vyos zebra[984]: client 81 disconnected 0 bfd routes removed from the rib
May 08 22:29:44 zzstudio-vyos watchfrr[944]: zebra: slow echo response finally received after 531.822632 seconds
May 08 22:29:44 zzstudio-vyos zebra[984]: Zebra final shutdown
May 08 22:29:44 zzstudio-vyos watchfrr[944]: [EC 268435457] zebra state -> down : read returned EOF
May 08 22:29:44 zzstudio-vyos bgpd[993]: snmp[info]: NET-SNMP version 5.7.3 AgentX subagent connected
May 08 22:29:44 zzstudio-vyos bgpd[993]: [EC 100663313] SLOW THREAD: task agentx_timeout (7f5316ca39d0) ran for 541861ms (cpu time 1ms)
May 08 22:29:44 zzstudio-vyos bgpd[993]: Terminating on signal
May 08 22:29:44 zzstudio-vyos ospf6d[1017]: snmp[info]: NET-SNMP version 5.7.3 AgentX subagent connected
May 08 22:29:44 zzstudio-vyos ospf6d[1017]: [EC 100663313] SLOW THREAD: task agentx_timeout (7f1db30409d0) ran for 540304ms (cpu time 1ms)
May 08 22:29:44 zzstudio-vyos ospf6d[1017]: Terminating on signal SIGINT
May 08 22:29:44 zzstudio-vyos watchfrr[944]: [EC 268435457] ospf6d state -> down : unexpected read error: Connection reset by peer
May 08 22:29:44 zzstudio-vyos ospfd[1012]: snmp[info]: NET-SNMP version 5.7.3 AgentX subagent connected
May 08 22:29:44 zzstudio-vyos ospfd[1012]: [EC 100663313] SLOW THREAD: task agentx_timeout (7f767ca179d0) ran for 531957ms (cpu time 2ms)
May 08 22:29:44 zzstudio-vyos ospfd[1012]: Terminating on signal
May 08 22:29:44 zzstudio-vyos watchfrr[944]: [EC 268435457] bgpd state -> down : unexpected read error: Connection reset by peer
May 08 22:29:44 zzstudio-vyos watchfrr[944]: [EC 268435457] ospfd state -> down : unexpected read error: Connection reset by peer

Error symptoms:

root@zzstudio-vyos:/home/helixzz# ping 114.114.114.114
connect: Network is unreachable
root@zzstudio-vyos:/home/helixzz# ip route
58.39.42.1 dev pppoe0 proto kernel scope link src 58.39.43.219
172.16.28.0/24 dev eth0 proto kernel scope link src 172.16.28.10
172.16.29.0/24 dev eth2 proto kernel scope link src 172.16.29.1
192.168.2.0/24 dev eth1 proto kernel scope link src 192.168.2.2
192.168.254.0/24 dev wg0 proto kernel scope link src 192.168.254.9