I’ve had VyOS running in ESXi 6.7 for over a year now, and I’ve just run into a problem recently where VyOS is sending traffic out the wrong interface. I’ve found that there is a mismatch between VyOS’s route table and the output of the OS’s ip route command.
Apologies for the images - I can’t SSH in so I’m stuck using the ESXi web console, which doesn’t support copy/paste
Interface config (eth0 is DHCP-connected WAN, eth1 is a LAN /30, BGP’d to a Cisco 3750x, though we’re not connected to even get BGP up): https://imgur.com/y6LDV4k
Route output (I have some VPNs with VTIs that are also misbehaving and going out eth0 instead of getting into the VTI, but I suspect its the same issue): https://imgur.com/3DM8BeA
Issue Overview:
Traffic destined for 10.1.0.14 (which is the other end of the eth1 /30) is going out eth0 instead. I can reproduce this with ping, but since I can’t have two windows open at once at the moment due to the router being disconnected, here’s a repro with BGP:
We can see, however - that the VyOS route table shows the proper route, but ip route shows eth0: https://imgur.com/60N2DVo
Thoughts on why this is happening? I’ve rebooted a few times, and over time, this ends up happening. My other VyOS router (running in a primary/failover model via BGP) is just fine with about the exact same config. Anything I can check or pull to debug this?
As I can see in image Imgur: The magic of the Internet , the route assigned to dst address 10.1.0.14 is table 220.
No extra config related to routing table was made in router?
Nothing outstanding. Few static interface routes (none related to 10.1.0.14).
Also to note, route table 220, which I believe is supposed to have zero prefixes, only has the one 0.0.0.0/0 kernel route: https://imgur.com/oTpJNxu
I believe this is overriding the table 0 routes and causing my issue.
On its sister router (same config, different /30 eth1 LAN prefix), route table 220 has no routes and works great. Is there something that’d cause that route to be added to table 220?
No differences in config between the two routers, aside from some IPs and a BGP thing or two (as path route map added, etc). Nothing under protocols static, etc.
Here all all my static routes, they only exist for VPN VTIs:
set protocols static interface-route 10.2.1.1/32 next-hop-interface vti2
set protocols static interface-route 10.3.1.1/32 next-hop-interface vti3
set protocols static interface-route 10.4.1.1/32 next-hop-interface vti4
set protocols static interface-route 10.100.255.4/32 next-hop-interface vti0
set protocols static interface-route 10.100.255.5/32 next-hop-interface vti1
set protocols static interface-route 172.16.0.1/32 next-hop-interface vti11
set protocols static interface-route 192.168.1.1/32 next-hop-interface vti10
Ok there was no reference in my config to 10.1.0.12 when grepped against show conf commands. However when I removed the VPN config entirely (just nuked it via delete vpn ipsec), everything came back up.
I then re-added the VPN config, and everything stayed up. Table 220 is now empty, and communication works as expected. Will report back if anything changes.
Update: After 8ish hours (when VPN re-keyed), I now have route table 220 back, with a default route overriding the main route table, sending all traffic out eth0 instead of eth1.
Going to start fresh on 1.2.7 with a fresh config to see if starting clean sorts things out.
Yes, I use VTIs for all IPSec VPN tunnels. I vastly prefer it for my scenario (have since the EdgeOS 1.x days before I moved to VyOS). Could this be causing the issue?
Will test this and report back in a few days if this resolves it. My other router (with the same VTI configs) doesn’t seem to have this issue, but reading the blog post from 2018 regarding this seems to indicate that it should be having this issue
Thanks @Viacheslav for your help thus far, will let you know how it goes.