Memory Leak on VyOS 1.2 (20180921) consumes 6GB in less than 7 days

BarrySDCA · January 31, 2019, 7:05pm

I have VyOS 1.2.0-rolling+201811170337 running over 60 days. so far no issues. could it be something you’re running and I’m not?

I have:

bgp
ospf
ipv6
vpn

off the top of my head. what are you running?

top - 19:04:24 up 61 days, 50 min, 1 user, load average: 0.46, 0.53, 0.56
Tasks: 202 total, 2 running, 100 sleeping, 0 stopped, 0 zombie
%Cpu(s): 0.0 us, 0.4 sy, 0.0 ni, 46.7 id, 0.0 wa, 0.0 hi, 52.8 si, 0.0 st
KiB Mem: 65948916 total, 3745816 used, 62203100 free, 168016 buffers
KiB Swap: 0 total, 0 used, 0 free. 1539420 cached Mem

tjh · February 15, 2019, 5:28pm

Could this have been related to the BGP Experiments that were being run and discussed on NANOG?

There was HUGE discussion about it: BGP Experiment

hagbard · February 16, 2019, 4:23pm

I don’t think it’s related. We have received a few reports about that, while others with similar configs have no issues at all. As far as I know there is already a direct contact established between the frr team and vyos.
I tried to reproduce it a few times, but weren’t successful doing so, so very hard to analyze.

Mitchell · June 11, 2019, 8:48pm

I would like to report that we are currently with this problem, build: VyOS 1.2.0-rolling+201906010337

Already tested with 1.2.1 EPA3 and LTS (git 04/2019)

We have 2 BGP IPv4 Full Routing tables and a couple of non Full IPv6 Peers. A total 15 peers.

After the server goes Up, everything is fine, bgp ok, traffic is good, latency is great. But minutes after that the free memory starts to decrease to a point is get to 0 and the servers crashes. It takes about 3 days to “leak” all the memory from the server.

Dell R240 8GB Intel X520

luisk · June 13, 2019, 5:16pm

With these versions its OK:

1.2.0-rolling+201807230337 (OK 148 days uptime)
ProLiant DL360 G5

1.2.0-rc9 (OK 121 days uptime)
ProLiant DL380 G5

1.2.0 (OK 93 days uptime)
PowerEdge 2950

1.2.0-rolling+201809210337 (Memory only OK for less than 100 days)
ProLiant DL380 G5
This ticket was reported whit below server/version and after install ATOP the leak slowed down. Now, every 3 months I need to restart the server - before was every 3 days
I was waiting for version 1.2.1 LTS to upgrade this server, but I don’t know where to download it - as contributor (I only have 1.2.0).

hagbard · June 18, 2019, 10:29pm

https://support.vyos.io/en but you need an account, if you registered as contributor you should have received an email quite a while ago.

luisk · June 20, 2019, 7:54pm

Thanks for your reply hagbard.

Yes, I received this e-mail 5 or 6 months ago. I have this contributor badge (at youracclaim), but something seems to be wrong! At this link you sent, only appear the “VyOS 1.1.x” empty folder.

hagbard · June 20, 2019, 8:53pm

Do you see downloads? I have only 1.2.x in that directory.

luisk · June 21, 2019, 6:33pm

Yes, but only with 1.1.x folder (empty).

hagbard · June 24, 2019, 3:52pm

weird, I’d suggest to open a support ticket and ask (via contact us).

luisk · June 27, 2019, 11:27am

“contact us” open the “Feedback TAB”, but it’s empty
I can’t create a ticket. (Is there a e-mail address do send the problem and get this support?)

Everything is empty… On “All items” I think should be some items…

hagbard · October 10, 2019, 4:19pm

Can you please try the workaround in ⚓ T1705 High CPU usage by bgpd when snmp is active?

josueconti · April 17, 2020, 11:15pm

Hi Luis, sorry using this old post, but I have the same problem, so how and where you chance this?
“interval to 300s (OID .1.3.6.1.4.1.2021.4.11.0)”
With Best Regards

Josue

luisk · April 20, 2020, 10:50pm

Hi,
LTS version 1.2.0 is running well on that hardware.
In some other hardware, version 1.2.3 is fine too, but in other hardware not.

josueconti · April 20, 2020, 11:52pm

Hi Luiz, how are you?
Thank you for your reply. I can’t found Vyos-1.2.0 or Vyos-1.2.x versions to download and try do a upgrade.
My current version is 1.1.8, with Dell hardware, model R710.
You said this one" Until yesterday morning 10AM I was using memory SNMP check every 1.800s, then I changed interval to 300s (OID .1.3.6.1.4.1.2021.4.11.0). This change can be seen with “dots” on the next image." But where you did this change?

With Best Regards

Josue

darconada · July 7, 2020, 7:57am

On vyos 1.2.5 issue with too much memory consumed because of the BGP process with a lot of routes in the routing table is solved. Th e difference is how the snmpd process is executed:

affected versions: /usr/sbin/snmpd -LSed -u snmp -g snmp -p /run/snmpd.pid

vyos 1.2.5 (fixed): /usr/sbin/snmpd -LSed -u snmp -g snmp -I -ipCidrRouteTable inetCidrRouteTable -p /run/snmpd.pid

you can check it with ps -axf | grep snmpd

luisk · July 7, 2020, 12:10pm

This change was done on Zabbix, and this only affected the memory graph, but the decrease was the same.

luisk · July 7, 2020, 12:14pm

Even using this change manually in the versions (1.2.0,1.2.1,1.2.3) the memory problem persisted. This change helped to reduce CPU consumption for the SNMP process. In version 1.2.4 and 1.2.5 the drop really seems to have stabilized (not decreasing so much).

josueconti · July 7, 2020, 12:37pm

Thank you so much!

With Best Regards

Josue