Bgp instance not found

Hi Guys,

I have 2 servers with same configuration and with same version of vyos


image

After aprox 1 month each of them stop to run bgp procees with message
“%bgp instance not found”
Issue appear at 21:10 min

Also here you can find some statistics.



image

Please let me know what can i do.
I know the defect


https://phabricator.vyos.net/T1705

But i cant test these because i cant aford any more downtime, it’s the production system.

Thanks,
Alin.

Same problem for me 2-3mo ago :expressionless: on vyos 1.3

Nobody ?! :open_mouth: i cant belive , today again same problem

any update here? :slight_smile: i relly want to find a solution here

I am not sure if this is relevant or not, but one of my EBGP routers (running 1.2.6) has had its BGP crash around 13:00 Monday and 06:20 Tuesday (both times UTC). On both occasions the router was rebooted.

Two other EBGP routers (one running 1.2.6 & the other 1.2.4) connected to other providers showed no problems.

There did not seem to be anything that interesting in the logs, except that frrwatch could not restart it.

In case it relates to a particular update, I am tcpdumping EBGP messages lest it fails again.

All of my routers are monitored for BGP/OSPF via SNMP.

Maybe you have fullview and get full statistics via snmp.
Which parameters do you get via snmp?

The only BGP parameters polled are the status of the BGP neighbor and their uptime. It does take a full BGP feed.

Hello,

Dose the servers configuration ar the same ? CPU, Ram … ?

My router exhibited the same behaviour yesterday evening. There was high CPU followed by BGP stopping. The relevant FRR logs are attached. I wonder whether the BGP stopping is a symptom of something else. Neither of my other ebgp routers (with similar config) showed any issues.

I have not yet rebooted the router and have not been through the logs in details, but do have PCAPs of the egbp traffic. It will need to be rebooted by the end of today.

Before rebooting it, is there anything else which the Vyos folks would like to be captured? If it might assist, I could provide remote access for them to take a look.

210103-fault3-frrextract.txt (4.4 KB)

Can you please write me your server configuration, the CPU, cuz i think i founded the problem on my curent tests.

Sanitised config, CPU and memory details are in: 120114-info1.txt (15.9 KB)

It is a virtual machine hosted at Vultr. Similar VMs in my own network are not exhibiting the same problem.

This is strange, cpu is good.

For me was like this.

I used an server poweredge r610 with cpu 2 x Intel X5670 2.93, 32gb ram for 6 mo, never had any problems on the same version.

I changed the server with a smaller one, for power usage, an server with Dell Poweredge R320 1 x Xeon Six Core E5-2430 2.2ghz and the problem started, i was thinking is sompting wrong with it, so i installed on another one same config Dell Poweredge R320 1 x Xeon Six Core E5-2430 2.2ghz, same problem.

So i switch back to my r610 with cpu 2 x Intel X5670 2.93, 32gb, and till now never had this problem.

I truly dont understand if is regarding the server or not, cuz i didnt founded any fix for this