Maybe it’s time to report this bug in phabricator?
Anyone knows how to disable the integration between frr and snmpd? This should work as a workaround, and would let we monitor at least some aspects over snmp.
I might setup a test router to try to find out, but any help would be appreciated.
Dmitry
September 4, 2019, 10:35am
9
It seems like frrouting issue
opened 05:39PM - 16 Jun 18 UTC
closed 10:35PM - 04 Feb 19 UTC
bug
performance
Since upgrading to v5 (at this point I also enabled SNMP), I see FRR is using 10… 0% of one CPU core. Also I see the following errors in the log:
```
root@cr1:~# grep "Broken pipe" /var/log/frr/frr.log
2018/06/15 10:21:25 BGP: buffer_flush_available: write error on fd 110: Broken pipe
Jun 15 10:21:25 cr1 bgpd[12541]: buffer_flush_available: write error on fd 110: Broken pipe
2018/06/15 10:21:25 BGP: buffer_flush_available: write error on fd 136: Broken pipe
Jun 15 10:21:25 cr1 bgpd[12541]: buffer_flush_available: write error on fd 136: Broken pipe
2018/06/15 10:21:25 BGP: buffer_flush_available: write error on fd 110: Broken pipe
Jun 15 10:21:25 cr1 bgpd[12541]: buffer_flush_available: write error on fd 110: Broken pipe
2018/06/15 10:21:25 BGP: buffer_flush_available: write error on fd 136: Broken pipe
Jun 15 10:21:25 cr1 bgpd[12541]: buffer_flush_available: write error on fd 136: Broken pipe
2018/06/15 10:21:25 BGP: buffer_flush_available: write error on fd 110: Broken pipe
Jun 15 10:21:25 cr1 bgpd[12541]: buffer_flush_available: write error on fd 110: Broken pipe
2018/06/15 17:18:04 BGP: buffer_flush_available: write error on fd 136: Broken pipe
Jun 15 17:18:04 cr1 bgpd[12541]: buffer_flush_available: write error on fd 136: Broken pipe
2018/06/15 17:18:04 BGP: buffer_flush_available: write error on fd 141: Broken pipe
Jun 15 17:18:04 cr1 bgpd[12541]: buffer_flush_available: write error on fd 141: Broken pipe
2018/06/15 17:18:04 BGP: buffer_flush_available: write error on fd 136: Broken pipe
Jun 15 17:18:04 cr1 bgpd[12541]: buffer_flush_available: write error on fd 136: Broken pipe
2018/06/15 17:18:04 BGP: buffer_flush_available: write error on fd 141: Broken pipe
Jun 15 17:18:04 cr1 bgpd[12541]: buffer_flush_available: write error on fd 141: Broken pipe
2018/06/15 17:18:04 BGP: buffer_flush_available: write error on fd 136: Broken pipe
Jun 15 17:18:04 cr1 bgpd[12541]: buffer_flush_available: write error on fd 136: Broken pipe
2018/06/15 20:48:46 BGP: buffer_flush_available: write error on fd 40: Broken pipe
Jun 15 20:48:46 cr1 bgpd[12541]: buffer_flush_available: write error on fd 40: Broken pipe
2018/06/15 20:48:46 BGP: buffer_flush_available: write error on fd 119: Broken pipe
Jun 15 20:48:46 cr1 bgpd[12541]: buffer_flush_available: write error on fd 119: Broken pipe
2018/06/15 20:48:46 BGP: buffer_flush_available: write error on fd 40: Broken pipe
Jun 15 20:48:46 cr1 bgpd[12541]: buffer_flush_available: write error on fd 40: Broken pipe
2018/06/15 20:49:03 BGP: buffer_flush_available: write error on fd 40: Broken pipe
Jun 15 20:49:03 cr1 bgpd[12541]: buffer_flush_available: write error on fd 40: Broken pipe
2018/06/15 20:49:03 BGP: buffer_flush_available: write error on fd 97: Broken pipe
Jun 15 20:49:03 cr1 bgpd[12541]: buffer_flush_available: write error on fd 97: Broken pipe
2018/06/15 20:49:03 BGP: buffer_flush_available: write error on fd 104: Broken pipe
Jun 15 20:49:03 cr1 bgpd[12541]: buffer_flush_available: write error on fd 104: Broken pipe
2018/06/15 20:49:03 BGP: buffer_flush_available: write error on fd 65: Broken pipe
Jun 15 20:49:03 cr1 bgpd[12541]: buffer_flush_available: write error on fd 65: Broken pipe
2018/06/15 20:49:04 BGP: buffer_flush_available: write error on fd 104: Broken pipe
Jun 15 20:49:04 cr1 bgpd[12541]: buffer_flush_available: write error on fd 104: Broken pipe
2018/06/15 20:49:04 BGP: buffer_flush_available: write error on fd 40: Broken pipe
Jun 15 20:49:04 cr1 bgpd[12541]: buffer_flush_available: write error on fd 40: Broken pipe
2018/06/15 20:49:04 BGP: buffer_flush_available: write error on fd 65: Broken pipe
Jun 15 20:49:04 cr1 bgpd[12541]: buffer_flush_available: write error on fd 65: Broken pipe
2018/06/15 20:49:04 BGP: buffer_flush_available: write error on fd 40: Broken pipe
Jun 15 20:49:04 cr1 bgpd[12541]: buffer_flush_available: write error on fd 40: Broken pipe
2018/06/15 20:49:20 BGP: buffer_flush_available: write error on fd 65: Broken pipe
Jun 15 20:49:20 cr1 bgpd[12541]: buffer_flush_available: write error on fd 65: Broken pipe
2018/06/15 23:12:01 BGP: buffer_flush_available: write error on fd 65: Broken pipe
Jun 15 23:12:01 cr1 bgpd[12541]: buffer_flush_available: write error on fd 65: Broken pipe
2018/06/15 23:12:01 BGP: buffer_flush_available: write error on fd 104: Broken pipe
Jun 15 23:12:01 cr1 bgpd[12541]: buffer_flush_available: write error on fd 104: Broken pipe
```
The last entry matches the time when the load increased.
Also see https://gist.github.com/patrick7/4c47c3afa6815f25451d0fc4e33c0469
I think you can try edit sudo nano /etc/frr/daemons
and delete -M snmp
from bgpd_options=...
. Then run sudo killall bgpd
and wait while process run automatically
There is no bgpd_options in this file. I’ll dig deeper and try to find it.
In the meantime the only workaround I found is to stop snmpd.
Edit: I found the actual file to edit to be /etc/frr/daemons.conf. Killing the bgpd process for some reason will not spawn a new process tho, so you will need to reboot the router to bring it back up.
Dmitry:
-M snmp
After removing the “-M snmp” option and rebooting. The snmpd service continued to reach 100% CPU and stop responding to queries, but bgpd was unaffected and continued to function normally.
Dmitry
September 23, 2019, 2:15pm
12
Hello @fvbrasileiro , how many interfaces your router has? Can you provide output top
command?
5 interfaces.
Interface S/L
--------- ---
eth0 u/u
eth1 u/u
eth1.111 u/u
eth2 u/u
eth3 A/D
eth4 u/u
eth4.444 u/u
top - 19:06:24 up 23:50, 1 user, load average: 0.23, 0.12, 0.04
Tasks: 142 total, 2 running, 99 sleeping, 0 stopped, 0 zombie
%Cpu0 : 3.7 us, 0.3 sy, 0.0 ni, 96.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu1 : 1.4 us, 0.7 sy, 0.0 ni, 96.6 id, 0.3 wa, 0.0 hi, 1.0 si, 0.0 st
%Cpu2 : 0.3 us, 0.3 sy, 0.0 ni, 99.3 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu3 : 98.7 us, 1.0 sy, 0.0 ni, 0.0 id, 0.0 wa, 0.0 hi, 0.3 si, 0.0 st
KiB Mem: 4040552 total, 2620536 used, 1420016 free, 111856 buffers
KiB Swap: 0 total, 0 used, 0 free. 244692 cached Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
3963 snmp 20 0 173256 119956 4504 R 100.0 3.0 2:05.66 snmpd
4008 root 20 0 104248 20036 14980 S 4.0 0.5 50:36.23 uacctd
4003 root 20 0 119708 33596 23456 S 1.3 0.8 26:15.33 uacctd
17 root 20 0 0 0 0 S 0.3 0.0 1:14.61 ksoftirqd/1
1008 root 20 0 12064 4692 1556 S 0.3 0.1 0:32.71 haveged
4006 root 20 0 127988 42876 25736 S 0.3 1.1 6:45.53 uacctd
1 root 20 0 110636 5124 3204 S 0.0 0.1 0:01.95 systemd
2 root 20 0 0 0 0 S 0.0 0.0 0:00.01 kthreadd
3 root 0 -20 0 0 0 I 0.0 0.0 0:00.00 rcu_gp
4 root 0 -20 0 0 0 I 0.0 0.0 0:00.00 rcu_par_gp
5 root 20 0 0 0 0 I 0.0 0.0 0:00.39 kworker/0:0-cgr
Dmitry
September 24, 2019, 6:04pm
14
@fvbrasileiro which exactly VyOS version are you using? I think we need perf
utility for looking what function is using in process.
You can chose:
me@BGP-2:~$ sh system image
The system currently has the following image(s) installed:
1: 1.2.3-epa1 (default boot) (running image)
2: 1.2-rolling-201908222129
3: 1.2.0-rolling+201908050337
4: 1.2.0-rolling+201907231917
5: 1.2.2
6: 1.2.0-rolling+201906251727
7: 1.2.0-rolling+201906070337
8: 1.2.0-rolling+201906030337
9: 1.2.1
hagbard
September 24, 2019, 10:40pm
16
show version
should give you the running image version.
The current version is VyOS 1.2.3-epa1.
But, I can also use a vyos-build version to perform the tests.
Dmitry
September 26, 2019, 7:22pm
18
@fvbrasileiro , can you try some SMP config changes?
edit /etc/default/snmpd
and replace
SNMPDOPTS='-Lsd -Lf /dev/null -u snmp -g snmp -p /run/snmpd.pid'
to
SNMPDOPTS='-LSed -u snmp -g snmp -I -ipCidrRouteTable,inetCidrRouteTable -p /run/snmpd.pid'
Then restart snmp.
After add “I -ipCidrRouteTable,inetCidrRouteTable” and restart, no problem with high CPU.
Dmitry
September 27, 2019, 10:30pm
20
Good, I think we can add this by default, can you create task on phabricator with your issue?
erojas
October 9, 2019, 3:27pm
22
I have the same problem and worked well after restart snmp service, but when i reboot the router the /etc/default/snmp file goes to default again, I notice that is modified by the script snmp.py but theres multiples snmp.py files What file should modifiy so, after a router reboot get the right snmp config?
erojas
October 9, 2019, 3:29pm
23
the files I found are the following
/usr/libexec/vyos/conf_mode/snmp.py
/usr/libexec/vyos/op_mode/snmp.py
/lib/live/mount/rootfs/1.2.3.squashfs/usr/libexec/vyos/conf_mode/snmp.py
/lib/live/mount/rootfs/1.2.3.squashfs/usr/libexec/vyos/op_mode/snmp.py
Which one should I modify?
Dmitry
October 9, 2019, 3:53pm
24
You can try change /usr/libexec/vyos/conf_mode/snmp.py or build own vyos-1x with this changes, or wait for this task ⚓ T1705 High CPU usage by bgpd when snmp is active
erojas
October 9, 2019, 4:05pm
25
Thanks a lot I Found it and change line 220 I comment the old one use as seen below
# SNMP template (/etc/default/snmpd) - be careful if you edit the template.
init_config_tmpl = """
### Autogenerated by snmp.py ###
# This file controls the activity of snmpd
# snmpd control (yes means start daemon).
SNMPDRUN=yes
# snmpd options (use syslog, close stdin/out/err).
# SNMPDOPTS='-LSed -u snmp -g snmp -p /run/snmpd.pid'
SNMPDOPTS='-LSed -u snmp -g snmp -I -ipCidrRouteTable,inetCidrRouteTable -p /run/snmpd.pid'
hagbard
October 10, 2019, 2:49pm
26
system
Closed
October 12, 2019, 2:49pm
27
This topic was automatically closed 2 days after the last reply. New replies are no longer allowed.