@phasma as I could see there is same configuration for 2 wg interfaces, does this issue applies to both or policy route for wg1 works fine after the reboot? Also please provide the requested logs whenever possible, that also might help to understand why the issue happens. Thanks.
We tried to reproduce your issue, but after reboot, everything is working as expected in our lab.
Some things to consider:
How did you get to 1.3-rc6 version? Upgraded from previous version, or it was a fresh install?
Could you make a new reboot, and before applying your commit that restores functionality, type “compare” and share the result of it? In the logs provided, we couldn’t find when policy POL-ROUTE-ETH1 was loaded.
Just to add to the conversation, I’m running 1.3 RC 6 with policy based routing with multiple WireGuard instances and interface defined routes and have not experienced any issues after rebooting. I am running a site to site, Mullvad client, and server (road warrior) and everything has come back up as expected after multiple reboots.
Tracing route to www.pandora.com [208.85.40.158]
over a maximum of 3 hops:
1 <1 ms <1 ms <1 ms vyos.xxxxxxxxxxx[192.168.83.254]
2 4 ms 4 ms 4 ms vt1.cor2.lond1.ptn.zen.net.uk [51.148.72.22]
3 4 ms 4 ms 4 ms lag-9.p1.thn-lon.zen.net.uk [51.148.73.160]
Deleting the routing interface. Followed by a traceroute to show no change
vyos@vyos# compare
[edit protocols static table 20]
-interface-route 0.0.0.0/0 {
- next-hop-interface wg2 {
- }
-}
[edit protocols static table 30]
-interface-route 0.0.0.0/0 {
- next-hop-interface wg1 {
- }
-}
[edit protocols static]
Tracing route to www.pandora.com [208.85.40.158]
over a maximum of 3 hops:
1 <1 ms <1 ms <1 ms vyos.xxxxxxxxxxxxxxxx [192.168.83.254]
2 4 ms 4 ms 4 ms vt1.cor2.lond1.ptn.zen.net.uk [51.148.72.22]
3 4 ms 4 ms 4 ms lag-9.p1.thn-lon.zen.net.uk [51.148.73.160]
Readding back the wireguard interface followed by a traceroute showing it working.
vyos@vyos# compare
[edit protocols static table 20]
+interface-route 0.0.0.0/0 {
+ next-hop-interface wg2 {
+ }
+}
[edit protocols static table 30]
+interface-route 0.0.0.0/0 {
+ next-hop-interface wg1 {
+ }
+}
[edit protocols static]
Tracing route to www.pandora.com [208.85.40.158]
over a maximum of 3 hops:
1 <1 ms <1 ms <1 ms vyos.xxxxxxxxx [192.168.83.254]
2 77 ms 76 ms 76 ms 10.13.0.1
3 77 ms 77 ms 77 ms te0-7-0-19.rcr22.b001362-2.jfk01.atlas.cogentco.com [38.142.116.241]
To collect necessary debug information, you need to share the output of the next commands at the moment when PBR works and when not:
sudo ip rule show
sudo ip r show table 20
sudo ip r show table 30
sudo ip r get [DST_ADDR] mark [PBR_MARK]
sudo nft list table ip mangle
where: [DST_ADDR] - a destination address traffic to which should be routed via wg interfaces [PBR_MARK] - a mark from the sudo ip rule show output. There should be two of them.
vyos:[~] $ sudo vtysh -c 'show running-config' | tee
Building configuration...
Current configuration:
!
frr version 7.5.1-20210801-00-g8bed329e4
frr defaults traditional
hostname vyos
log syslog
log facility local7
service integrated-vtysh-config
!
ip route 0.0.0.0/0 pppoe0
!
line vty
!
end
vyos:[~] $ sudo journalctl -b /usr/lib/frr/staticd | tee
-- Logs begin at Thu 2021-08-26 17:02:42 UTC, end at Thu 2021-08-26 17:03:31 UTC. --
-- No entries --
Here is output once the fix has been applied
Building configuration...
Current configuration:
!
frr version 7.5.1-20210801-00-g8bed329e4
frr defaults traditional
hostname vyos
log syslog
log facility local7
service integrated-vtysh-config
!
ip route 0.0.0.0/0 wg1 table 30
ip route 0.0.0.0/0 wg2 table 20
ip route 0.0.0.0/0 pppoe0
!
line vty
!
end
vyos:[~] $ sudo journalctl -b /usr/lib/frr/staticd | tee
-- Logs begin at Thu 2021-08-26 15:14:48 UTC, end at Thu 2021-08-26 17:02:06 UTC. --
-- No entries --
does some script run when wg interface goes up? It might only handle main route table.
Create some dummy route on wg interface (for example, to 1.1.1.1/32 ), and see if this dummy route is present after start-up
The problem is clear - routing tables 10 and 20 are not presented in the FRR. The question is: why?
I would expect to see at least anything in logs that may show the reasons, but there are empty.
set protocols static interface-route 0.0.0.1/32 next-hop-interface wg0
set protocols static table 2 interface-route 0.0.0.2/32 next-hop-interface wg0
After reboot:
vyos@vyos:~$ sudo ip r s t 2
0.0.0.2 dev wg0 proto static metric 20
vyos@vyos:~$ sudo ip r s
default via 10.31.76.249 dev eth0 proto ospf metric 20
0.0.0.1 dev wg0 proto static metric 20