VyOS runs out of memory


#1

Hi everyone,

Im running 1.2.0-rolling+201902250337 currently. I have a couple instances running in AWS, one in particular which is the hub, in a hub-spoke VPN toplogy (IPSEC iKev2), seems to run out of memory after 1-2 weeks. Was not an issue on a much earlier version of 1.2.0 (not sure which version).

Attached are screenshots of TOP, also included logs output, which shows oom killing off processes.

systemd-journal seems to keep going up in usage? Dont know if that is the culprit. Also excuse my lack of Linux knowledge, but in the TOP screenshot which lists processes by descending memory usage, should the sum of RES + cached + buffers not equal ‘used’?

messages.log (215.6 KB)


#2

Hello, @kav!
Theoretically, systemd-jounal memory usage must be limited to about 10% of /run partition.
Send, please, a whole content of /var/log/atop/ directory after you catch memory leak next time.


#3

hello @zsdc

Ok will do, I do have another instance which has hit 20% on systemd-journal:

atop.log (27.3 MB)

I have attached the contents of this instance /var/log/atop/ - note I had to change file extension from zip to log to allow attachment (its a bunch of files).


#4

Thank you, @kav!

I see, that systemd-journald use too much of memory in your case. But, I can’t reproduce this in short-time period, even by generating flood into log.
Give me, please, also output of next commands:

sudo journalctl -m | wc -l
sudo journalctl --disk-usage
sudo df -h
sudo du -h -d 3 /run/

Then restart journald:

sudo systemctl restart systemd-journald

Run all commands again and check memory usage by systemd-journald.

The problem looks like a memory leak in systemd, but we need to be sure that there was not our misconfiguration or something else.


#5

Hello All
I have similar problem:
To end of October 2018 I used VYOS 1.8,
from November 2018 to February VYOS 1.2-RC4,
from March I build version 1.2 crux in docker (from guide on Wiki )
My VYOS is running on KVM.
Please look at zabbix monitoring:


on 1.8 - no leak memory but from 1.2-Rc4 memory start leak.
My configuration is only IPSEC with GRE tunell and OpenVPN

Sry but on RC-4 I not save same data, but on 1.2 Crux I can prepare same information if necessary - (I suppose I have 2 months to end of memmory)

Rafał


#6

Ok see attached before and after output and also screenshot of memory after.

after.txt (3.5 KB)
before.txt (3.4 KB)


#7

Thank you, @kav!
Try this workaround, and check if memory stops to leak:

set system task-scheduler task logging-restart executable arguments 'restart systemd-journald'
set system task-scheduler task logging-restart executable path '/bin/systemctl'
set system task-scheduler task logging-restart interval '1d'

@Rafal, could you check which exactly process use memory in your case?


#8

Ive been monitoring the memory consumption of 4 instances running the same version and the odd thing I noticed is that after the first reboot (I think its the first since deployment), memory consumption is much lower and fairly stable now.