Memory Leak on VyOS 1.2 (20180921) consumes 6GB in less than 7 days

I have VyOS 1.2.0-rolling+201811170337 running over 60 days. so far no issues. could it be something you’re running and I’m not?

I have:

bgp
ospf
ipv6
vpn

off the top of my head. what are you running?

top - 19:04:24 up 61 days, 50 min, 1 user, load average: 0.46, 0.53, 0.56
Tasks: 202 total, 2 running, 100 sleeping, 0 stopped, 0 zombie
%Cpu(s): 0.0 us, 0.4 sy, 0.0 ni, 46.7 id, 0.0 wa, 0.0 hi, 52.8 si, 0.0 st
KiB Mem: 65948916 total, 3745816 used, 62203100 free, 168016 buffers
KiB Swap: 0 total, 0 used, 0 free. 1539420 cached Mem

Could this have been related to the BGP Experiments that were being run and discussed on NANOG?

There was HUGE discussion about it: BGP Experiment

I don’t think it’s related. We have received a few reports about that, while others with similar configs have no issues at all. As far as I know there is already a direct contact established between the frr team and vyos.
I tried to reproduce it a few times, but weren’t successful doing so, so very hard to analyze.

I would like to report that we are currently with this problem, build: VyOS 1.2.0-rolling+201906010337

Already tested with 1.2.1 EPA3 and LTS (git 04/2019)

We have 2 BGP IPv4 Full Routing tables and a couple of non Full IPv6 Peers. A total 15 peers.

After the server goes Up, everything is fine, bgp ok, traffic is good, latency is great. But minutes after that the free memory starts to decrease to a point is get to 0 and the servers crashes. It takes about 3 days to “leak” all the memory from the server.

Dell R240 8GB Intel X520

With these versions its OK:

1.2.0-rolling+201807230337 (OK 148 days uptime)
ProLiant DL360 G5

1.2.0-rc9 (OK 121 days uptime)
ProLiant DL380 G5

1.2.0 (OK 93 days uptime)
PowerEdge 2950

1.2.0-rolling+201809210337 (Memory only OK for less than 100 days)
ProLiant DL380 G5
This ticket was reported whit below server/version and after install ATOP the leak slowed down. Now, every 3 months I need to restart the server - before was every 3 days
I was waiting for version 1.2.1 LTS to upgrade this server, but I don’t know where to download it - as contributor (I only have 1.2.0).

https://support.vyos.io/en but you need an account, if you registered as contributor you should have received an email quite a while ago.

Thanks for your reply hagbard.

Yes, I received this e-mail 5 or 6 months ago. I have this contributor badge (at youracclaim), but something seems to be wrong! At this link you sent, only appear the “VyOS 1.1.x” empty folder.

Do you see downloads? I have only 1.2.x in that directory.

Yes, but only with 1.1.x folder (empty).

weird, I’d suggest to open a support ticket and ask (via contact us).

“contact us” open the “Feedback TAB”, but it’s empty
I can’t create a ticket. (Is there a e-mail address do send the problem and get this support?)

Everything is empty… On “All items” I think should be some items…

Can you please try the workaround in ⚓ T1705 High CPU usage by bgpd when snmp is active?

Hi Luis, sorry using this old post, but I have the same problem, so how and where you chance this?
“interval to 300s (OID .1.3.6.1.4.1.2021.4.11.0)”
With Best Regards

Josue

Hi,
LTS version 1.2.0 is running well on that hardware.
In some other hardware, version 1.2.3 is fine too, but in other hardware not.

Hi Luiz, how are you?
Thank you for your reply. I can’t found Vyos-1.2.0 or Vyos-1.2.x versions to download and try do a upgrade.
My current version is 1.1.8, with Dell hardware, model R710.
You said this one" Until yesterday morning 10AM I was using memory SNMP check every 1.800s, then I changed interval to 300s (OID .1.3.6.1.4.1.2021.4.11.0). This change can be seen with “dots” on the next image." But where you did this change?

With Best Regards

Josue

On vyos 1.2.5 issue with too much memory consumed because of the BGP process with a lot of routes in the routing table is solved. Th e difference is how the snmpd process is executed:

affected versions: /usr/sbin/snmpd -LSed -u snmp -g snmp -p /run/snmpd.pid

vyos 1.2.5 (fixed): /usr/sbin/snmpd -LSed -u snmp -g snmp -I -ipCidrRouteTable inetCidrRouteTable -p /run/snmpd.pid

you can check it with ps -axf | grep snmpd

This change was done on Zabbix, and this only affected the memory graph, but the decrease was the same.

1 Like

Even using this change manually in the versions (1.2.0,1.2.1,1.2.3) the memory problem persisted. This change helped to reduce CPU consumption for the SNMP process. In version 1.2.4 and 1.2.5 the drop really seems to have stabilized (not decreasing so much).

Thank you so much!

With Best Regards

Josue