Frr loses routing info after 5-12k l2tp subs connected

Hi there
Not sure this is a bug, but might just be misconfiguration of the node. I see the routing information disappears from frr when few thousand l2tp subscribers connect to the node on VyOS 1.4. Ping to BGP neighbors show network unreachable, accel-ppp starts to consume high cpu. And nothing helps but frr restart or system reboot.
What i did on the system is:

  • isolated l2tp interfaces from being tracked by snmp
  • tuned systemd-udevd not to track l2tp interfaces either
  • set logging to file for frr instead of syslog
    but still got the problem periodically.
    Need an advice on the way to troubleshoot the issue.
    VyOS receives l2tp subscribers from mobile network and terminates the sessions to 100+ VRFs in MPLS core.

Please check the atop files to show which process utilizes CPU or memory.

ls -la /var/log/atop

sudo atop -r /var/log/atop.log_xxx


attached the atop image during the issue.
This is what bgpd wrote when node stopped working correctly:

2023/02/21 01:41:24 BGP: [MJ4D6-VBJKV][EC 33554454] 10.5.72.2 [Error] bgp_read_packet error: Connection reset by peer
2023/02/21 01:41:25 BGP: [MJ4D6-VBJKV][EC 33554454] 10.5.72.1 [Error] bgp_read_packet error: Connection reset by peer
2023/02/21 01:41:47 BGP: [Q6EWH-J0SDA][EC 33554510] 10.5.72.2(Unknown) has not made any SendQ progress for 1 holdtime (15s), peer overloaded?
2023/02/21 01:41:47 BGP: [Q6EWH-J0SDA][EC 33554510] 10.5.72.1(Unknown) has not made any SendQ progress for 1 holdtime (15s), peer overloaded?

Attaching also the output of accel-cmd show stat: it shows constantly growing number of finishing sessions (control channel) until accel-ppp stopped answering at all.
accel-show-stat-21022023.txt (104.6 KB)