VyOS runs out of memory


#1

Hi everyone,

Im running 1.2.0-rolling+201902250337 currently. I have a couple instances running in AWS, one in particular which is the hub, in a hub-spoke VPN toplogy (IPSEC iKev2), seems to run out of memory after 1-2 weeks. Was not an issue on a much earlier version of 1.2.0 (not sure which version).

Attached are screenshots of TOP, also included logs output, which shows oom killing off processes.

systemd-journal seems to keep going up in usage? Dont know if that is the culprit. Also excuse my lack of Linux knowledge, but in the TOP screenshot which lists processes by descending memory usage, should the sum of RES + cached + buffers not equal ‘used’?

messages.log (215.6 KB)


Run out of Memory Crash
#2

Hello, @kav!
Theoretically, systemd-jounal memory usage must be limited to about 10% of /run partition.
Send, please, a whole content of /var/log/atop/ directory after you catch memory leak next time.


#3

hello @zsdc

Ok will do, I do have another instance which has hit 20% on systemd-journal:

atop.log (27.3 MB)

I have attached the contents of this instance /var/log/atop/ - note I had to change file extension from zip to log to allow attachment (its a bunch of files).


#4

Thank you, @kav!

I see, that systemd-journald use too much of memory in your case. But, I can’t reproduce this in short-time period, even by generating flood into log.
Give me, please, also output of next commands:

sudo journalctl -m | wc -l
sudo journalctl --disk-usage
sudo df -h
sudo du -h -d 3 /run/

Then restart journald:

sudo systemctl restart systemd-journald

Run all commands again and check memory usage by systemd-journald.

The problem looks like a memory leak in systemd, but we need to be sure that there was not our misconfiguration or something else.


#5

Hello All
I have similar problem:
To end of October 2018 I used VYOS 1.8,
from November 2018 to February VYOS 1.2-RC4,
from March I build version 1.2 crux in docker (from guide on Wiki )
My VYOS is running on KVM.
Please look at zabbix monitoring:


on 1.8 - no leak memory but from 1.2-Rc4 memory start leak.
My configuration is only IPSEC with GRE tunell and OpenVPN

Sry but on RC-4 I not save same data, but on 1.2 Crux I can prepare same information if necessary - (I suppose I have 2 months to end of memmory)

Rafał


#6

Ok see attached before and after output and also screenshot of memory after.

after.txt (3.5 KB)
before.txt (3.4 KB)


#7

Thank you, @kav!
Try this workaround, and check if memory stops to leak:

set system task-scheduler task logging-restart executable arguments 'restart systemd-journald'
set system task-scheduler task logging-restart executable path '/bin/systemctl'
set system task-scheduler task logging-restart interval '1d'

@Rafal, could you check which exactly process use memory in your case?


#8

Ive been monitoring the memory consumption of 4 instances running the same version and the odd thing I noticed is that after the first reboot (I think its the first since deployment), memory consumption is much lower and fairly stable now.


#9

@zsdc process journald -
I set workaround today and we will see


#10

So i’ve had this running for quite a few days (the workaroud @zsdc posted) and it does indeed seem to keep the memory consumption of systemd-journal low - but the consumption of memory overall is still high and creeps up. See image, when I add the RES + cached, it does not add up anywhere near used amount.


#11

Also, this particular instance is running as the hub in a hub and spoke VPN topology. However the spokes are all stable in memory usage and have over 150MB available in cached. Is it normal for a few extra tunnels to consume that much more memory on the hub?

EDIT: Ok so I dumped the data into excel and added them up.
RES = 116MB
Buffers = 13MB
Cached = 62MB
Total = ~190MB

So ‘used’ should be roughly 190MB right? But its showing as 482MB, off by 292MB!
When I do the same math on another Ubuntu server I have, the numbers add up as expected (off by 2-3%).

Now the same math on another VyOS instance that only has one tunnel and seems to stable:
RES = 164MB
Buffers = 90MB
Cached = 149MB
Total = ~403MB

Its showing 487MB used, so its off by ~84MB, so no where near as off as the ‘bad’ VyOS instance.


#12

Ok so quick update for those that might experience the same issue… Have upgraded to a newer rolling release: 1.2.0-rolling+201903250337

Issue seems resolved, the instance without any config change has way more free memory and also the systemd-journal has never gone above 1% on memory consumption.


closed #13

This topic was automatically closed 2 days after the last reply. New replies are no longer allowed.