SNMP Down after more than 1000 PPPoE users connected

Hi,

I have a problem where snmp goes down after more than 1000 PPPoE users are connected, when I check via ‘top’ snmpd using cpu to 100%, is there a solution to overcome this problem because when snmp is down Vyos can’t be monitored.

najib@vyos:~$ show pppoe-server sessions | match active | count
1404
top - 08:48:34 up 13 days, 20:42,  5 users,  load average: 1.65, 1.56, 1.50
Tasks: 205 total,   2 running, 203 sleeping,   0 stopped,   0 zombie
%Cpu(s):  1.8 us,  7.6 sy,  0.0 ni, 65.0 id,  0.0 wa,  0.0 hi, 25.6 si,  0.0 st
MiB Mem :   7874.3 total,   3636.5 free,   3577.1 used,    660.7 buff/cache
MiB Swap:      0.0 total,      0.0 free,      0.0 used.   3722.0 avail Mem

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
 7323 Debian-+  20   0  398040 378240   9004 R 100.0   4.7   3312:02 snmpd
 5357 frr       20   0 1559668 526348   3980 S   9.0   6.5 123:56.25 zebra
  730 root      20   0  117104  80080   5860 S   6.6   1.0 357:40.85 systemd-journal
 7266 vyos      20   0   13284   7252   3028 S   5.0   0.1   2:33.13 bmon
 1936 ntp       20   0   76764   3884   2952 S   2.7   0.0  45:15.50 ntpd
  814 root      20   0    8080   5248   1564 S   1.7   0.1 230:02.80 haveged
31950 root      20   0  564820  20456   4324 S   1.7   0.3 127:56.75 accel-pppd
10636 root      20   0  225820   4996   2892 S   1.3   0.1  77:29.42 rsyslogd
   21 root      20   0       0      0      0 S   0.7   0.0  19:30.86 ksoftirqd/2
   26 root      20   0       0      0      0 S   0.7   0.0  20:01.64 ksoftirqd/3
   31 root      20   0       0      0      0 S   0.7   0.0  17:30.59 ksoftirqd/4
 3665 vyos      20   0    8564   4500   3212 S   0.7   0.1   3:23.33 htop
16553 najib     20   0   11088   3532   3092 R   0.7   0.0   0:00.04 top
    9 root      20   0       0      0      0 S   0.3   0.0  62:02.12 ksoftirqd/0
   10 root      20   0       0      0      0 I   0.3   0.0  38:56.80 rcu_sched
   16 root      20   0       0      0      0 S   0.3   0.0  17:30.09 ksoftirqd/1
   36 root      20   0       0      0      0 S   0.3   0.0   1:26.80 ksoftirqd/5
   41 root      20   0       0      0      0 S   0.3   0.0   1:31.46 ksoftirqd/6
   61 root      20   0       0      0      0 S   0.3   0.0  13:16.40 ksoftirqd/10
   66 root      20   0       0      0      0 S   0.3   0.0  11:44.25 ksoftirqd/11
 5362 frr       20   0  872216 771596   4996 S   0.3   9.6  49:39.09 bgpd
13224 root      20   0       0      0      0 I   0.3   0.0   0:04.44 kworker/5:2-events_power_efficient
13933 roni      20   0   11088   3472   3020 S   0.3   0.0   0:02.38 top

Thanks.

You could try restarting snmp?

sudo /etc/init.d/snmpd restart

hi @tjh , I’ve tried restarting but there is no effect on the cpu load, snmpd keeps using the cpu to 100%

It seems this snmp-issue-ppp
Do you collect snmp statistics from ppp interfaces?

no, i’m not collecting snmp statistics from pppoe interface, i’m just fetching stats from physical interface, cpu, and memory.

I checked the snmp-issue-ppp, the patch works on net-snmp version 5.8 while vyos 1.3.0 runs net-snmp version 5.7.3

Can you increase one parameter?

sudo nano -c +22 /etc/snmp/snmpd.conf
And replace:

monitor  -r 10 -e linkUpTrap   "Generate linkUp" ifOperStatus != 2
monitor  -r 10 -e linkDownTrap "Generate linkDown" ifOperStatus == 2

To -r 120 or -r 300

monitor  -r 120 -e linkUpTrap   "Generate linkUp" ifOperStatus != 2
monitor  -r 120 -e linkDownTrap "Generate linkDown" ifOperStatus == 2

And restart snmpd service
sudo systemctl restart snmpd

I’ve made changes according to the instructions above, but it doesn’t have any impact, snmpd is still using the cpu to 100%.

dear vyos team,

related to the problem above, it was resolved after I patched the net-snmp according to this patch. currently for snmpd is not using the cpu to 100%.

Thanks.

1 Like

This topic was automatically closed 2 days after the last reply. New replies are no longer allowed.