I have two border routers with VyOS 1.3 built by myself.
Each router have similar configs with 16 IPv4 and 12 IPv6 BGP neighbors, iBGP IPv4 and IPv6 links between routers works correctly. I have downlinks OSPF and OSPFv3. Also OSPF and OSPFv3 between routers.
I have snmp v3 on each router. Zabbix monitors each router with BGP template.
One VyOS router works fine, but another have issue with CPU load by zebra process - up to 50%-60%.
I have read forums and docs, but haven’t find any solution.
I tried to disable snmp - same problem.
And I have a strange problem on router with high CPU load - when VyOS booting and mounts config network interface with vifs and shaper and redirect to ifb is missing. When I configuring by hands this interface after boot this interface works fine.
Can anyone suggest the possible cause of this problem ?
More then twelve hours router works fine. Zebra taking about 1%-2% CPU.
I did not change anything in config before I turned on BGP neighbors.
No I have all 16 IPv4 and 12 IPv6 neighbors on and iBGP link between routers.
After 1 day zebra and bgpd processes periodically takes up to 30% CPU.
Same problem, that was described above.
Nothing has been changed in config and topology on both routers.
And I see many records in /var/log/messages :
Aug 16 10:56:46 vyos-1 bgpd[1392]: snmp[info]: AgentX master disconnected us, reconnecting in 15
Aug 16 10:56:46 vyos-1 bgpd[1392]: [EC 100663303] Failed to set snmp fd back to original settings: Bad file descriptor(9)
Aug 16 10:57:01 vyos-1 bgpd[1392]: snmp[info]: NET-SNMP version 5.7.3 AgentX subagent connected
Aug 16 11:07:46 vyos-1 bgpd[1392]: snmp[info]: AgentX master disconnected us, reconnecting in 15
Aug 16 11:07:46 vyos-1 bgpd[1392]: [EC 100663303] Failed to set snmp fd back to original settings: Bad file descriptor(9)
Aug 16 11:08:01 vyos-1 bgpd[1392]: snmp[info]: NET-SNMP version 5.7.3 AgentX subagent connected
Aug 16 11:19:17 vyos-1 bgpd[1392]: snmp[info]: AgentX master disconnected us, reconnecting in 15
Aug 16 11:19:17 vyos-1 bgpd[1392]: [EC 100663303] Failed to set snmp fd back to original settings: Bad file descriptor(9)
Aug 16 11:19:32 vyos-1 bgpd[1392]: snmp[info]: NET-SNMP version 5.7.3 AgentX subagent connected
Aug 16 11:30:46 vyos-1 bgpd[1392]: snmp[info]: AgentX master disconnected us, reconnecting in 15
Aug 16 11:30:46 vyos-1 bgpd[1392]: [EC 100663303] Failed to set snmp fd back to original settings: Bad file descriptor(9)
Aug 16 11:31:01 vyos-1 bgpd[1392]: snmp[info]: NET-SNMP version 5.7.3 AgentX subagent connected
Aug 16 11:31:46 vyos-1 bgpd[1392]: snmp[info]: AgentX master disconnected us, reconnecting in 15
Aug 16 11:31:46 vyos-1 bgpd[1392]: [EC 100663303] Failed to set snmp fd back to original settings: Bad file descriptor(9)
Aug 16 11:32:01 vyos-1 bgpd[1392]: snmp[info]: NET-SNMP version 5.7.3 AgentX subagent connected
Aug 16 11:38:47 vyos-1 bgpd[1392]: snmp[info]: AgentX master disconnected us, reconnecting in 15
Aug 16 11:38:47 vyos-1 bgpd[1392]: [EC 100663303] Failed to set snmp fd back to original settings: Bad file descriptor(9)
Aug 16 11:39:02 vyos-1 bgpd[1392]: snmp[info]: NET-SNMP version 5.7.3 AgentX subagent connected
Aug 16 11:52:16 vyos-1 bgpd[1392]: snmp[info]: AgentX master disconnected us, reconnecting in 15
Aug 16 11:52:16 vyos-1 bgpd[1392]: [EC 100663303] Failed to set snmp fd back to original settings: Bad file descriptor(9)
Aug 16 11:52:31 vyos-1 bgpd[1392]: snmp[info]: NET-SNMP version 5.7.3 AgentX subagent connected
Aug 16 12:09:16 vyos-1 bgpd[1392]: snmp[info]: AgentX master disconnected us, reconnecting in 15
Aug 16 12:09:16 vyos-1 bgpd[1392]: [EC 100663303] Failed to set snmp fd back to original settings: Bad file descriptor(9)
Aug 16 12:09:31 vyos-1 bgpd[1392]: snmp[info]: NET-SNMP version 5.7.3 AgentX subagent connected
Aug 16 12:21:16 vyos-1 bgpd[1392]: snmp[info]: AgentX master disconnected us, reconnecting in 15
Aug 16 12:21:16 vyos-1 bgpd[1392]: [EC 100663303] Failed to set snmp fd back to original settings: Bad file descriptor(9)
Aug 16 12:21:31 vyos-1 bgpd[1392]: snmp[info]: NET-SNMP version 5.7.3 AgentX subagent connected
Aug 16 12:35:16 vyos-1 bgpd[1392]: snmp[info]: AgentX master disconnected us, reconnecting in 15
Aug 16 12:35:16 vyos-1 bgpd[1392]: [EC 100663303] Failed to set snmp fd back to original settings: Bad file descriptor(9)
Aug 16 12:35:31 vyos-1 bgpd[1392]: snmp[info]: NET-SNMP version 5.7.3 AgentX subagent connected
Aug 16 12:39:46 vyos-1 bgpd[1392]: snmp[info]: AgentX master disconnected us, reconnecting in 15
Aug 16 12:39:46 vyos-1 bgpd[1392]: [EC 100663303] Failed to set snmp fd back to original settings: Bad file descriptor(9)
Aug 16 12:40:01 vyos-1 bgpd[1392]: snmp[info]: NET-SNMP version 5.7.3 AgentX subagent connected
Aug 16 12:43:17 vyos-1 bgpd[1392]: snmp[info]: AgentX master disconnected us, reconnecting in 15
Aug 16 12:43:17 vyos-1 bgpd[1392]: [EC 100663303] Failed to set snmp fd back to original settings: Bad file descriptor(9)
Aug 16 12:43:32 vyos-1 bgpd[1392]: snmp[info]: NET-SNMP version 5.7.3 AgentX subagent connected
Aug 16 13:05:46 vyos-1 bgpd[1392]: snmp[info]: AgentX master disconnected us, reconnecting in 15
Aug 16 13:05:46 vyos-1 bgpd[1392]: [EC 100663303] Failed to set snmp fd back to original settings: Bad file descriptor(9)
Aug 16 13:06:01 vyos-1 bgpd[1392]: snmp[info]: NET-SNMP version 5.7.3 AgentX subagent connected
I have changed BIOS NUMA settings on both routers - HP DL360p G8 (Performance Options > Advanced Performance Tuning Options > NUMA Group Size Optimization > Interleave > Enabled) - and now everything works fine about 24 hours.
I think, that zebra and bgpd high CPU load depends on NUMA mode. Possibly FRR does not support NUMA. That’s why wjen I made one NUMA node with two sockets in it zebra and bgpd takes 1% - 3% CPU time.