Hi,
I recently found a problem of FRR in VyOS. One day, my VyOS router ran out of memory, it lead to watchfrr restarting other frr process like below.
Apr 12 11:34:45 eu-hub zebra[1013]: [EC 4043309090] Unknown netlink nlmsg_type RTM_GETNEIGH(30) vrf 0
Apr 12 11:34:45 eu-hub watchfrr[983]: bgpd: slow echo response finally received after 690.958289 seconds
Apr 12 11:34:45 eu-hub zebra[1013]: [EC 4043309090] Unknown netlink nlmsg_type RTM_GETNEIGH(30) vrf 0
Apr 12 11:34:45 eu-hub watchfrr[983]: ospfd: slow echo response finally received after 691.008029 seconds
Apr 12 11:34:45 eu-hub zebra[1013]: [EC 4043309090] Unknown netlink nlmsg_type RTM_GETNEIGH(30) vrf 0
Apr 12 11:34:45 eu-hub watchfrr[983]: ospf6d: slow echo response finally received after 649.294706 seconds
Apr 12 11:34:45 eu-hub zebra[1013]: [EC 4043309090] Unknown netlink nlmsg_type RTM_GETNEIGH(30) vrf 0
Apr 12 11:34:45 eu-hub watchfrr[983]: zebra: slow echo response finally received after 766.866493 seconds
Apr 12 11:34:45 eu-hub zebra[1013]: [EC 4043309090] Unknown netlink nlmsg_type RTM_GETNEIGH(30) vrf 0
Apr 12 11:34:45 eu-hub zebra[1013]: message repeated 10 times: [ [EC 4043309090] Unknown netlink nlmsg_type RTM_GETNEIGH(30) vrf 0]
Apr 12 11:34:47 eu-hub zebra[1013]: Terminating on signal
Apr 12 11:34:47 eu-hub bgpd[1018]: Terminating on signal
Apr 12 11:34:47 eu-hub ripd[1034]: Terminating on signal
Apr 12 11:34:47 eu-hub ospfd[1042]: Terminating on signal
Apr 12 11:34:47 eu-hub ripngd[1038]: Terminating on signal
Apr 12 11:34:47 eu-hub ospf6d[1046]: Terminating on signal SIGINT
It’s fine that watchfrr try to restore the frr process, but after all frr process restored, all my routes are gone, including static and bgp, so what’s the meaning if the watchfrr just restore the process rather than restoring the traffic?