Hi there
Not sure this is a bug, but might just be misconfiguration of the node. I see the routing information disappears from frr when few thousand l2tp subscribers connect to the node on VyOS 1.4. Ping to BGP neighbors show network unreachable, accel-ppp starts to consume high cpu. And nothing helps but frr restart or system reboot.
What i did on the system is:
isolated l2tp interfaces from being tracked by snmp
tuned systemd-udevd not to track l2tp interfaces either
set logging to file for frr instead of syslog
but still got the problem periodically.
Need an advice on the way to troubleshoot the issue.
VyOS receives l2tp subscribers from mobile network and terminates the sessions to 100+ VRFs in MPLS core.
attached the atop image during the issue.
This is what bgpd wrote when node stopped working correctly:
2023/02/21 01:41:24 BGP: [MJ4D6-VBJKV][EC 33554454] 10.5.72.2 [Error] bgp_read_packet error: Connection reset by peer
2023/02/21 01:41:25 BGP: [MJ4D6-VBJKV][EC 33554454] 10.5.72.1 [Error] bgp_read_packet error: Connection reset by peer
2023/02/21 01:41:47 BGP: [Q6EWH-J0SDA][EC 33554510] 10.5.72.2(Unknown) has not made any SendQ progress for 1 holdtime (15s), peer overloaded?
2023/02/21 01:41:47 BGP: [Q6EWH-J0SDA][EC 33554510] 10.5.72.1(Unknown) has not made any SendQ progress for 1 holdtime (15s), peer overloaded?
Attaching also the output of accel-cmd show stat: it shows constantly growing number of finishing sessions (control channel) until accel-ppp stopped answering at all. accel-show-stat-21022023.txt (104.6 KB)