Applying any rolling image after 1.3-rolling-202012230217 breaks my config. I see no errors on boot, can “load”, “commit” with no errors but get no internet.
I can ping some external IP addresses but not all, pings to any DNS simply doesn’t work.
Nothing changed on config side and booting back into 1.3-rolling-202012230217 gets me up and running again.
Let me know what extra information I can provide to help track down the issue.
Hello @phillipmcmahon, I think this issue comes from updating FRR to the latest stable version. What interesting, in FRR config I see routes on the main table
vyos@vyos# sudo vtysh -c "show run" | grep wg
ip route 0.0.0.0/0 wg1
ip route 0.0.0.0/0 wg0
Note: It happens only after when router booting, in another case
vyos@vyos# run show ip route 0.0.0.0
Routing entry for 0.0.0.0/0
Known via "static", distance 1, metric 0, best
Last update 00:00:32 ago
* directly connected, wg1, weight 1
* directly connected, wg0, weight 1
* 172.16.0.254, via eth0, weight 1
vyos@vyos# delete protocols static table
vyos@vyos# commit
vyos@vyos# set protocols static table 100 interface-route 0.0.0.0/0 next-hop-interface wg0
vyos@vyos# set protocols static table 100 route 0.0.0.0/0 blackhole distance '255'
vyos@vyos# set protocols static table 110 interface-route 0.0.0.0/0 next-hop-interface wg1
vyos@vyos# set protocols static table 110 route 0.0.0.0/0 blackhole distance '255'
vyos@vyos# commit
vyos@vyos# run show ip route 0.0.0.0
Routing entry for 0.0.0.0/0
Known via "static", distance 1, metric 0, best
Last update 00:01:46 ago
* 172.16.0.254, via eth0, weight 1
Totally get it is a bug, but what is stopping me running those commands in vyos-postconfig-bootup.script to delete the routes and re-add them on reboot/upgrade as a temp workaround until the source bug is fixed?
I originated that Phabricator bug, so that’s why it appeared to fix itself when I deleted/re-added the configuration after boot. Unfortunately that was entirely by accident, and depending on the complexity of the config, might be tough to replicate.
With that said, there’s nothing stopping you from using that methodology to fix it if you can find the order that works for you.
I wonder if you’re seeing the same thing we did on a recent rolling build (30th December 2020, where after configure and commit we ended up in this situation:
Jan 3 21:30:47 dekker watchfrr[1141]: [EC 268435457] zebra state -> down : read returned EOF
Jan 3 21:30:47 dekker ospfd[1205]: [EC 134217741] Packet[DD]: Neighbor 46.227.201.3 MTU 2000 is larger than [eth7.1401:46.227.200.237]'s MTU 1500
Jan 3 21:30:47 dekker watchfrr[1141]: [EC 268435457] ripd state -> down : read returned EOF
Jan 3 21:30:48 dekker watchfrr[1141]: [EC 268435457] ripngd state -> down : read returned EOF
Jan 3 21:30:48 dekker watchfrr[1141]: [EC 268435457] ospfd state -> down : read returned EOF
Jan 3 21:30:48 dekker watchfrr[1141]: ospfd state -> up : connect succeeded
Jan 3 21:30:48 dekker watchfrr[1141]: [EC 268435457] ospfd state -> down : unexpected read error: Connection reset by peer
Jan 3 21:30:48 dekker watchfrr[1141]: [EC 268435457] ospf6d state -> down : read returned EOF
Jan 3 21:30:48 dekker watchfrr[1141]: [EC 268435457] ldpd state -> down : read returned EOF
Jan 3 21:30:48 dekker watchfrr[1141]: [EC 268435457] bgpd state -> down : unexpected read error: Connection reset by peer
Jan 3 21:30:48 dekker watchfrr[1141]: [EC 268435457] isisd state -> down : read returned EOF
Jan 3 21:30:48 dekker watchfrr[1141]: [EC 268435457] bfdd state -> down : read returned EOF
Jan 3 21:30:52 dekker watchfrr[1141]: Forked background command [pid 41571]: /usr/lib/frr/watchfrr.sh restart all
Jan 3 21:30:52 dekker staticd[1228]: Terminating on signal