High CPU usage by bgpd when snmp is active

Hello everyone!

I’m facing a strange issue with bgpd when snmpv2 is active, where it uses 100% CPU all the time.

If snmp is enabled this happens: Screenshot by Lightshot
When I disable (service snmpd stop) everything gets back to normal: Screenshot by Lightshot

If it is relevant, I have 9 BGP sessions (5 v4 and 4 v6), with 3 full routing tables and I’m using self compiled vyos 1.2.1.

Are there any logs where I can get more information about what is causing this?

I’d be really glad if anyone could help me debug this issue.

Thank you!

Hey @aldemaro

Does this happen just by having snmp enabled, or is it tied to when something is polling the device?

Hello @garysteers!

I’ve disabled pooling and the issue is still present. I’ve also tested with SNMPv3 and the same happens. SNMP only need to be active for this to happen.

Looks like there is something wrong with frr+snmp integration, but I can’t figure out what. Are there any bgpd logs? I found frr.log, but nothing wrong there:

Aug 25 10:46:27 localhost zebra[1030]: snmp[info]: NET-SNMP version 5.7.2.1 AgentX subagent connected
Aug 25 10:46:27 localhost bgpd[1034]: snmp[info]: NET-SNMP version 5.7.2.1 AgentX subagent connected
Aug 25 10:46:27 localhost ospfd[1049]: snmp[info]: NET-SNMP version 5.7.2.1 AgentX subagent connected
Aug 25 10:46:27 localhost ospf6d[1053]: snmp[info]: NET-SNMP version 5.7.2.1 AgentX subagent connected
Aug 25 10:46:27 localhost ripd[1041]: snmp[info]: NET-SNMP version 5.7.2.1 AgentX subagent connected

There may be something in /var/log/messages

You can also use monitor snmp (which basically runs the above with a filter for snmpd messages)

I have the same problem. Every day between 7:00 pm and 7:10 pm, the bgpd process ends after snmpd reaches 100% cpu.

The problem only occurs if bgpd and snmpd are running. Even if snmp collection is not being performed.

Occurs in both VM and HW.

Unfortunately there is no useful information:

snmpd started: Screenshot by Lightshot
snmpd stoped: Screenshot by Lightshot

yes we have the same problem… vyos 1.2.2.

Maybe it’s time to report this bug in phabricator?

Anyone knows how to disable the integration between frr and snmpd? This should work as a workaround, and would let we monitor at least some aspects over snmp.

I might setup a test router to try to find out, but any help would be appreciated.

It seems like frrouting issue

I think you can try edit sudo nano /etc/frr/daemons and delete -M snmp from bgpd_options=.... Then run sudo killall bgpd and wait while process run automatically

There is no bgpd_options in this file. I’ll dig deeper and try to find it.

In the meantime the only workaround I found is to stop snmpd.

Edit: I found the actual file to edit to be /etc/frr/daemons.conf. Killing the bgpd process for some reason will not spawn a new process tho, so you will need to reboot the router to bring it back up.

After removing the “-M snmp” option and rebooting. The snmpd service continued to reach 100% CPU and stop responding to queries, but bgpd was unaffected and continued to function normally.

Hello @fvbrasileiro, how many interfaces your router has? Can you provide output top command?

5 interfaces.

Interface  S/L
---------  ---
eth0       u/u
eth1       u/u
eth1.111   u/u
eth2       u/u
eth3       A/D
eth4       u/u
eth4.444   u/u

top - 19:06:24 up 23:50,  1 user,  load average: 0.23, 0.12, 0.04
Tasks: 142 total,   2 running,  99 sleeping,   0 stopped,   0 zombie
%Cpu0  :  3.7 us,  0.3 sy,  0.0 ni, 96.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu1  :  1.4 us,  0.7 sy,  0.0 ni, 96.6 id,  0.3 wa,  0.0 hi,  1.0 si,  0.0 st
%Cpu2  :  0.3 us,  0.3 sy,  0.0 ni, 99.3 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
%Cpu3  : 98.7 us,  1.0 sy,  0.0 ni,  0.0 id,  0.0 wa,  0.0 hi,  0.3 si,  0.0 st
KiB Mem:   4040552 total,  2620536 used,  1420016 free,   111856 buffers
KiB Swap:        0 total,        0 used,        0 free.   244692 cached Mem

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
 3963 snmp      20   0  173256 119956   4504 R 100.0  3.0   2:05.66 snmpd
 4008 root      20   0  104248  20036  14980 S   4.0  0.5  50:36.23 uacctd
 4003 root      20   0  119708  33596  23456 S   1.3  0.8  26:15.33 uacctd
   17 root      20   0       0      0      0 S   0.3  0.0   1:14.61 ksoftirqd/1
 1008 root      20   0   12064   4692   1556 S   0.3  0.1   0:32.71 haveged
 4006 root      20   0  127988  42876  25736 S   0.3  1.1   6:45.53 uacctd
    1 root      20   0  110636   5124   3204 S   0.0  0.1   0:01.95 systemd
    2 root      20   0       0      0      0 S   0.0  0.0   0:00.01 kthreadd
    3 root       0 -20       0      0      0 I   0.0  0.0   0:00.00 rcu_gp
    4 root       0 -20       0      0      0 I   0.0  0.0   0:00.00 rcu_par_gp
    5 root      20   0       0      0      0 I   0.0  0.0   0:00.39 kworker/0:0-cgr

@fvbrasileiro which exactly VyOS version are you using? I think we need perf utility for looking what function is using in process.

You can chose:

me@BGP-2:~$ sh system image
The system currently has the following image(s) installed:

   1: 1.2.3-epa1 (default boot) (running image)
   2: 1.2-rolling-201908222129
   3: 1.2.0-rolling+201908050337
   4: 1.2.0-rolling+201907231917
   5: 1.2.2
   6: 1.2.0-rolling+201906251727
   7: 1.2.0-rolling+201906070337
   8: 1.2.0-rolling+201906030337
   9: 1.2.1

show version should give you the running image version.

The current version is VyOS 1.2.3-epa1.

But, I can also use a vyos-build version to perform the tests.

@fvbrasileiro, can you try some SMP config changes?
edit /etc/default/snmpd
and replace

SNMPDOPTS='-Lsd -Lf /dev/null -u snmp -g snmp -p /run/snmpd.pid'

to

SNMPDOPTS='-LSed -u snmp -g snmp -I -ipCidrRouteTable,inetCidrRouteTable -p /run/snmpd.pid'

Then restart snmp.

After add “I -ipCidrRouteTable,inetCidrRouteTable” and restart, no problem with high CPU.

Good, I think we can add this by default, can you create task on phabricator with your issue?