we have been running 1.1.8 for quite some time. Over the last ~2 weeks, suddenly we are experiencing crashes. The routers in question are running in a redundant vrrp and both have crashed now. It seems to be more frequent too. So I can’t help but wonder if there is something nasty going around?
I tried updating to the latest rolling build but we couldn’t see all of our interfaces and so I had to revert back.
I see you have problems with IPSec that breaks your BGP sessions
my best advice will be test 1.2 version and move to it
we not plan to fix anything in 1.1.x
Do you think this is due to the old IPSEC config? I can clean it up, but that’s been going on a while and this issue is new.
I tried upgrading to the latest rolling 1.2 version but it didn’t see all the NIC’s. one of the ports in a 4 port NIC totally disappeared. Intel I believe. ideas?
i don’t think is related to your config, but most likely remote side software. I saw that in the past already. 1.1.x becomes really obsolete and cause issues with some new implementations
Over the last 11 days I have taken that router out of service and run exhaustive hardware tests. all pass. The router runs as a VRRP pair for redundancy, with each router maintaining many peering connections (BGP). All in all, it’s a BGP confederation with 8 vyos routers. The config has been working flawlessly for months with no changes and then suddenly this one keeps crashing.
So I have reloaded it to the latest rolling build. While the router was out the other router in the pair did the same thing. that doesn’t sound like hardware to me.
I replaced the disk and loaded the latest nightly build as I said, but last night the router crashed again.
I can’t help but wonder if this is some sort of exploit? anyone got any ideas?
I just stumpled over this discussion and I must say that sometimes we saw similar things.
In our case I was able to narrow down the problem a bit more to a quagga problem with additional attributes which were sent by some peers.
Right now I cannot remember exactly whether it came from the 32 bit community attributes or other BGP data.
But in our case remote BGP data was definitely the reason.