Kernel Memory Leak on VyOS

I have to roll back to the previous version. Software interrupts high load.

@Harunaga, I checked it, really e1000e broken, we will try fix it.

root@R1:~# modprobe e1000e
modprobe: ERROR: could not insert 'e1000e': Exec format error

You see that RSS doesn’t work? Or you mean RPS?

RSS work. The interrupt load on all cores has increased.

I’ve updated the package lists (a change in GitHub - vyos/vyos-world: VyOS metapackage was erroneously not pushed to upstream after manual tests—sorry, my fault!).
Next build should have it right.

In the latest rolling release, I have a problem with high CPU usage:


# cat /proc/interrupts 
           CPU0       CPU1       CPU2       CPU3       
  0:          7          0          0          0   IO-APIC   2-edge      timer
  4:          0          0          0         20   IO-APIC   4-edge      ttyS0
  8:          0          0          1          0   IO-APIC   8-edge      rtc0
  9:          0          0          0          0   IO-APIC   9-fasteoi   acpi
 16:         66          0          0          0   IO-APIC  16-fasteoi   ehci_hcd:usb1
 18:          0          0          0          0   IO-APIC  18-fasteoi   i801_smbus
 23:          0          0         29          0   IO-APIC  23-fasteoi   ehci_hcd:usb2
 27:          0       4042          0          0   PCI-MSI 512000-edge      ahci[0000:00:1f.2]
 36:   64759745          0          0          0   PCI-MSI 1050624-edge      eth2-TxRx-0
 37:          0   65980121          0          0   PCI-MSI 1050625-edge      eth2-TxRx-1
 38:          0          0   66293362          0   PCI-MSI 1050626-edge      eth2-TxRx-2
 39:          0          0          0   66340412   PCI-MSI 1050627-edge      eth2-TxRx-3
 40:       1420          0          0          0   PCI-MSI 1050628-edge      eth2
NMI:         74         73         76         68   Non-maskable interrupts
LOC:     583535     582829     584423     583333   Local timer interrupts
SPU:          0          0          0          0   Spurious interrupts
PMI:         74         73         76         68   Performance monitoring interrupts
IWI:          0          0          0          0   IRQ work interrupts
RTR:          2          0          0          0   APIC ICR read retries
RES:     244378     265526     236929     276985   Rescheduling interrupts
CAL:       1365       1708       1590       1731   Function call interrupts
TLB:         39         18         36         39   TLB shootdowns
TRM:          0          0          0          0   Thermal event interrupts
THR:          0          0          0          0   Threshold APIC interrupts
DFR:          0          0          0          0   Deferred Error APIC interrupts
MCE:          0          0          0          0   Machine check exceptions
MCP:          8          9          9          9   Machine check polls
HYP:          0          0          0          0   Hypervisor callback interrupts
HRE:          0          0          0          0   Hyper-V reenlightenment interrupts
HVS:          0          0          0          0   Hyper-V stimer0 interrupts


top - 15:11:58 up 42 min,  1 user,  load average: 0.22, 0.08, 0.06
Tasks: 113 total,   1 running,  67 sleeping,   0 stopped,   0 zombie
%Cpu(s):  0.2 us,  0.3 sy,  0.0 ni, 69.5 id,  0.0 wa,  0.0 hi, 30.0 si,  0.0 st
KiB Mem:   8134128 total,  2628516 used,  5505612 free,    39020 buffers
KiB Swap:        0 total,        0 used,        0 free.   180368 cached Mem

Can you show output of next commands

show interfaces ethernet eth2 statistics
sudo ethtool -g eth2
sudo ethtool -c eth2
show hardware pci

The good news is memory doesn’t leak.

diag.txt (12.2 KB)

I think you need increase ring buffers, because in stat
rx_missed_errors: 750195
You can execute sudo ethtool -G eth2 rx 4096 tx 4096 for increase ring buffers, but be careful, might be short link up/down. After increasing ring buffers, irq and cpu load might be less.
And I see rx_csum_offload_errors: 52804444 for some reason. Guess new ixgbe better detect these packet.

I increased the ring buffer, but CPU load did not change.

$  sudo ethtool -g eth2
Ring parameters for eth2:
Pre-set maximums:
RX:		4096
RX Mini:	0
RX Jumbo:	0
TX:		4096
Current hardware settings:
RX:		4096
RX Mini:	0
RX Jumbo:	0
TX:		4096

Can you try also disable all offloads and check CPU load? But before, show please sudo ethtool -k eth2

Commands for disabling offloads:

set interfaces ethernet eth2 offload-options tcp-segmentation off
set interfaces ethernet eth2 offload-options generic-receive off
set interfaces ethernet eth2 offload-options generic-segmentation off
commit
sudo ethtool -K eth2 lro off

I was getting memory leaks with rolling release…

Switched to LTS 1.2.0 the server is now working properly

Hardware: Dell R240 with Intel X520

I disabled all offloads earlier:

show interfaces ethernet eth2 offload-options 
 generic-receive off
 generic-segmentation off
 tcp-segmentation off
 udp-fragmentation off


ethtool -k eth2
Features for eth2:
rx-checksumming: on
tx-checksumming: on
	tx-checksum-ipv4: off [fixed]
	tx-checksum-ip-generic: on
	tx-checksum-ipv6: off [fixed]
	tx-checksum-fcoe-crc: on [fixed]
	tx-checksum-sctp: on
scatter-gather: off
	tx-scatter-gather: off
	tx-scatter-gather-fraglist: off [fixed]
tcp-segmentation-offload: off
	tx-tcp-segmentation: off
	tx-tcp-ecn-segmentation: off [fixed]
	tx-tcp-mangleid-segmentation: off
	tx-tcp6-segmentation: off
udp-fragmentation-offload: off
generic-segmentation-offload: off
generic-receive-offload: off
large-receive-offload: off
rx-vlan-offload: on
tx-vlan-offload: on
ntuple-filters: off
receive-hashing: on
highdma: on [fixed]
rx-vlan-filter: on
vlan-challenged: off [fixed]
tx-lockless: off [fixed]
netns-local: off [fixed]
tx-gso-robust: off [fixed]
tx-fcoe-segmentation: on [fixed]
tx-gre-segmentation: on
tx-gre-csum-segmentation: on
tx-ipxip4-segmentation: on
tx-ipxip6-segmentation: on
tx-udp_tnl-segmentation: on
tx-udp_tnl-csum-segmentation: on
tx-gso-partial: on
tx-sctp-segmentation: off [fixed]
tx-esp-segmentation: off [fixed]
tx-udp-segmentation: off [fixed]
fcoe-mtu: off [fixed]
tx-nocache-copy: off
loopback: off [fixed]
rx-fcs: off [fixed]
rx-all: off
tx-vlan-stag-hw-insert: off [fixed]
rx-vlan-stag-hw-parse: off [fixed]
rx-vlan-stag-filter: off [fixed]
l2-fwd-offload: off [fixed]
hw-tc-offload: off
esp-hw-offload: off [fixed]
esp-tx-csum-hw-offload: off [fixed]
rx-udp_tunnel-port-offload: on
tls-hw-tx-offload: off [fixed]
tls-hw-rx-offload: off [fixed]
rx-gro-hw: off [fixed]
tls-hw-record: off [fixed]

I can not use the stable (LTS) version, since I use bfd.

@Harunaga Is critical CPU load for you? All works without issue, isn’t it? For detect this “high load” I think we need perf utils.
Current kernel contain some debug features for detecting memleak, may be it add some load percent

Hello @Harunaga , I propose update VyOS to rolling version 1.2-rolling-201910100117. I confirm better performance without kmemleak.

Hello!
After the execution of the command:

echo off > /sys/kernel/debug/kmemleak

CPU load returned to normal

Hello, can you confirm that cause of memory leak is solved with ixgbe 5.6.3?

Hello!
Yes, I confirm that the cause of the memory leak has been fixed.

Hello, I have the same, or similar problem …
However my memory overflows !!!
1GB average every 2 Days … Until it fills all 8GB and crashes

This topic was automatically closed 2 days after the last reply. New replies are no longer allowed.