Frr loses routing info after 5-12k l2tp subs connected

aserkin · February 14, 2023, 8:58am

Hi there
Not sure this is a bug, but might just be misconfiguration of the node. I see the routing information disappears from frr when few thousand l2tp subscribers connect to the node on VyOS 1.4. Ping to BGP neighbors show network unreachable, accel-ppp starts to consume high cpu. And nothing helps but frr restart or system reboot.
What i did on the system is:

isolated l2tp interfaces from being tracked by snmp
tuned systemd-udevd not to track l2tp interfaces either
set logging to file for frr instead of syslog
but still got the problem periodically.
Need an advice on the way to troubleshoot the issue.
VyOS receives l2tp subscribers from mobile network and terminates the sessions to 100+ VRFs in MPLS core.

Viacheslav · February 18, 2023, 11:30am

Please check the atop files to show which process utilizes CPU or memory.

ls -la /var/log/atop

sudo atop -r /var/log/atop.log_xxx

aserkin · February 20, 2023, 10:45pm

attached the atop image during the issue.
This is what bgpd wrote when node stopped working correctly:

2023/02/21 01:41:24 BGP: [MJ4D6-VBJKV][EC 33554454] 10.5.72.2 [Error] bgp_read_packet error: Connection reset by peer
2023/02/21 01:41:25 BGP: [MJ4D6-VBJKV][EC 33554454] 10.5.72.1 [Error] bgp_read_packet error: Connection reset by peer
2023/02/21 01:41:47 BGP: [Q6EWH-J0SDA][EC 33554510] 10.5.72.2(Unknown) has not made any SendQ progress for 1 holdtime (15s), peer overloaded?
2023/02/21 01:41:47 BGP: [Q6EWH-J0SDA][EC 33554510] 10.5.72.1(Unknown) has not made any SendQ progress for 1 holdtime (15s), peer overloaded?

Attaching also the output of accel-cmd show stat: it shows constantly growing number of finishing sessions (control channel) until accel-ppp stopped answering at all.
accel-show-stat-21022023.txt (104.6 KB)

aserkin · May 17, 2023, 4:46pm

I did some more attempts to start the service with vyos 1.4 rolling releases up to yesterday image. The behavior looks the same.
The node serves 10k+ l2tp subscribers during an hour and a half until routes suddenly disappear from kernel.
Peering is organized like this:

IPv4 Unicast Summary (VRF default):
BGP router identifier 10.228.134.1, local AS number 64826 vrf-id 0
BGP table version 2001
RIB entries 1788, using 335 KiB of memory
Peers 4, using 2898 KiB of memory

Neighbor V AS MsgRcvd MsgSent TblVer InQ OutQ Up/Down State/PfxRcd PfxSnt Desc
10.228.134.32 4 39374 16520 12763 0 0 0 17:43:25 111 2 BBR-x-1 vrf AAA
10.228.134.34 4 39374 16506 12763 0 0 0 17:43:25 111 2 BBR-x-2 vrf AAA
10.228.134.36 4 39374 16460 12763 0 0 0 17:43:25 219 2 BBR-x-1 vrf LNS
10.228.134.38 4 39374 16481 12763 0 0 0 17:43:25 219 2 BBR-x-2 vrf LNS

Total number of neighbors 4

IPv4 VPN Summary (VRF default):
BGP router identifier 10.228.134.1, local AS number 64826 vrf-id 0
BGP table version 0
RIB entries 2933, using 550 KiB of memory
Peers 2, using 1449 KiB of memory

Neighbor V AS MsgRcvd MsgSent TblVer InQ OutQ Up/Down State/PfxRcd PfxSnt Desc
10.5.72.1 4 39374 4410101 13230 0 0 0 17:43:22 65270 232 BBR1 vpnv4
10.5.72.2 4 39374 4380640 13230 0 0 0 17:43:22 65269 232 BBR2 vpnv4

Total number of neighbors 2

IPv4 Labeled Unicast Summary (VRF default):
BGP router identifier 10.228.134.1, local AS number 64826 vrf-id 0
BGP table version 3
RIB entries 5, using 960 bytes of memory
Peers 2, using 1449 KiB of memory

Neighbor V AS MsgRcvd MsgSent TblVer InQ OutQ Up/Down State/PfxRcd PfxSnt Desc
10.228.134.40 4 39374 1603 1067 0 0 0 17:43:25 635 2 BBR1 ipv4 LU
10.228.134.42 4 39374 1611 1067 0 0 0 17:43:25 635 2 BBR2 ipv4 LU

where first group of four peers provide connectivity to AAA servers and L2TP access concentrators.
The last group of peers used for label unicast peering. From 10.228.134.40, 10.228.134.42 we receive routes to peers 10.5.72.1,10.5.72.2 used for vpnv4:

aserkin@lns:~$ show ip route 10.5.72.1
Routing entry for 10.5.72.1/32
Known via “bgp”, distance 20, metric 0, best
Last update 17:59:00 ago

10.228.134.40, via eth3, label 48203, weight 1

10.228.134.42, via eth4, label 48403, weight 1

When the problem occured the routes received from LU peers have disappeared from kernel for unknown reason. After that peering with vpnv4 nodes went down and dropped all vpnv4 connectivity:

IPv4 Unicast Summary (VRF default):
BGP router identifier 10.228.134.1, local AS number 64826 vrf-id 0
BGP table version 2512
RIB entries 1788, using 335 KiB of memory
Peers 4, using 2898 KiB of memory

Neighbor V AS MsgRcvd MsgSent TblVer InQ OutQ Up/Down State/PfxRcd PfxSnt Desc
10.228.134.32 4 39374 2503 1891 0 0 0 02:37:23 111 2 BBR-x-1 vrf AAA
10.228.134.34 4 39374 2523 1891 0 0 0 02:37:23 111 2 BBR-x-2 vrf AAA
10.228.134.36 4 39374 2467 1891 0 0 0 02:37:23 219 2 BBR-x-1 vrf LNS
10.228.134.38 4 39374 2466 1891 0 0 0 02:37:23 219 2 BBR-x-2 vrf LNS

Total number of neighbors 4

IPv4 VPN Summary (VRF default):
BGP router identifier 10.228.134.1, local AS number 64826 vrf-id 0
BGP table version 0
RIB entries 2911, using 546 KiB of memory
Peers 2, using 1449 KiB of memory

Neighbor V AS MsgRcvd MsgSent TblVer InQ OutQ Up/Down State/PfxRcd PfxSnt Desc
> 10.5.72.1 4 39374 493587 165618 0 0 0 00:15:47 Active 0 BBR1 vpnv4
> 10.5.72.2 4 39374 433783 165618 0 0 0 00:15:48 Active 0 BBR2 vpnv4

Total number of neighbors 2

IPv4 Labeled Unicast Summary (VRF default):
BGP router identifier 10.228.134.1, local AS number 64826 vrf-id 0
BGP table version 3
RIB entries 5, using 960 bytes of memory
Peers 2, using 1449 KiB of memory

Neighbor V AS MsgRcvd MsgSent TblVer InQ OutQ Up/Down State/PfxRcd PfxSnt Desc
10.228.134.40 4 39374 510 161 0 0 0 02:37:23 635 2 BBR1 ipv4 LU
10.228.134.42 4 39374 509 161 0 0 0 02:37:23 635 2 BBR2 ipv4 LU

Total number of neighbors 2

The problem can be fixed only by node reboot. Or i just do not know how to do the fix)
Why the routes received from FRR can disappear suddenly? What can be the reason?

aserkin@lns:$ show ip route 10.5.72.1
aserkin@lns:$ show ip route
Codes: K - kernel route, C - connected, S - static, R - RIP,
O - OSPF, I - IS-IS, B - BGP, E - EIGRP, N - NHRP,
T - Table, v - VNC, V - VNC-Direct, A - Babel, F - PBR,
f - OpenFabric,
> - selected route, * - FIB route, q - queued, r - rejected, b - backup
t - trapped, o - offload failure

C>* 10.5.28.10/32 is directly connected, l2tp6313, 00:37:57
C>* 10.228.134.0/32 is directly connected, dum0, 02:36:48
C>* 10.228.134.1/32 is directly connected, dum1, 02:36:29
C>* 10.228.134.2/32 is directly connected, dum2, 02:36:44
C>* 10.228.134.32/31 is directly connected, eth1.616, 02:36:06
C>* 10.228.134.34/31 is directly connected, eth2.617, 02:36:07
C>* 10.228.134.36/31 is directly connected, eth1.618, 02:36:06
C>* 10.228.134.38/31 is directly connected, eth2.619, 02:36:06
C>* 10.228.134.40/31 is directly connected, eth3, 02:36:07
C>* 10.228.134.42/31 is directly connected, eth4, 02:36:08

aserkin · July 26, 2023, 8:48am

Hi. I see the following messages in frr.log just before all bgp routes wiped from kernel for some reason:

`2023/07/26 10:26:29 ZEBRA: [HSYZM-HV7HF] Extended Error: Can not replace a nexthop with a nexthop group.`
`2023/07/26 10:26:29 ZEBRA: [WVJCK-PPMGD][EC 4043309093] netlink-dp (NS 0) error: Invalid argument, type=RTM_NEWNEXTHOP(104), seq=100352658, pid=2475708348`
`2023/07/26 10:26:29 ZEBRA: [X5XE1-RS0SW][EC 4043309074] Failed to install Nexthop (1769049[19431932/19431933]) into the kernel`
`2023/07/26 10:26:30 ZEBRA: [HSYZM-HV7HF] Extended Error: Can not replace a nexthop with a nexthop group.`
`2023/07/26 10:26:30 ZEBRA: [WVJCK-PPMGD][EC 4043309093] netlink-dp (NS 0) error: Invalid argument, type=RTM_NEWNEXTHOP(104), seq=100352820, pid=2475708348`
`2023/07/26 10:26:30 ZEBRA: [X5XE1-RS0SW][EC 4043309074] Failed to install Nexthop (1769049[19431932/19431933]) into the kernel`
`2023/07/26 10:26:30 ZEBRA: [HSYZM-HV7HF] Extended Error: Can not replace a nexthop with a nexthop group.`
`2023/07/26 10:26:30 ZEBRA: [WVJCK-PPMGD][EC 4043309093] netlink-dp (NS 0) error: Invalid argument, type=RTM_NEWNEXTHOP(104), seq=100352827, pid=2475708348`
`2023/07/26 10:26:30 ZEBRA: [X5XE1-RS0SW][EC 4043309074] Failed to install Nexthop (1769049[19431932/19431933]) into the kernel`
`2023/07/26 10:26:30 ZEBRA: [HSYZM-HV7HF] Extended Error: Can not replace a nexthop with a nexthop group.`
`2023/07/26 10:26:30 ZEBRA: [WVJCK-PPMGD][EC 4043309093] netlink-dp (NS 0) error: Invalid argument, type=RTM_NEWNEXTHOP(104), seq=100352851, pid=2475708348`
`2023/07/26 10:26:30 ZEBRA: [X5XE1-RS0SW][EC 4043309074] Failed to install Nexthop (1769049[19431932/19431933]) into the kernel`
`2023/07/26 10:26:34 ZEBRA: [SWQK6-6JY63][EC 4043309074] 132:1263:10.59.14.183/32: Failed to enqueue dataplane install`
`2023/07/26 10:27:10 BGP: [KTE2S-GTBDA][EC 100663301] INTERFACE_ADDRESS_DEL: Cannot find IF 492212 in VRF 196`
`2023/07/26 10:27:32 ZEBRA: [SWQK6-6JY63][EC 4043309074] 19:1468:10.115.5.246/32: Failed to enqueue dataplane install`
`2023/07/26 10:27:34 ZEBRA: [SWQK6-6JY63][EC 4043309074] 188:1411:10.11.4.9/32: Failed to enqueue dataplane install`
`2023/07/26 10:27:38 BGP: [KTE2S-GTBDA][EC 100663301] INTERFACE_ADDRESS_DEL: Cannot find IF 488260 in VRF 133`
`2023/07/26 10:27:42 ZEBRA: [SWQK6-6JY63][EC 4043309074] 188:1411:10.11.4.20/32: Failed to enqueue dataplane install`
`2023/07/26 10:27:43 BGP: [KTE2S-GTBDA][EC 100663301] INTERFACE_ADDRESS_DEL: Cannot find IF 493092 in VRF 0`
`2023/07/26 10:27:48 ZEBRA: [SWQK6-6JY63][EC 4043309074] 19:1468:10.115.1.161/32: Failed to enqueue dataplane install`
`2023/07/26 10:27:54 BGP: [KTE2S-GTBDA][EC 100663301] INTERFACE_ADDRESS_DEL: Cannot find IF 493260 in VRF 0`
`2023/07/26 10:27:54 BGP: [KTE2S-GTBDA][EC 100663301] INTERFACE_ADDRESS_DEL: Cannot find IF 493259 in VRF 0`
`2023/07/26 10:27:54 BGP: [KTE2S-GTBDA][EC 100663301] INTERFACE_ADDRESS_DEL: Cannot find IF 493261 in VRF 0`
`2023/07/26 10:27:54 BGP: [KTE2S-GTBDA][EC 100663301] INTERFACE_ADDRESS_DEL: Cannot find IF 493262 in VRF 0`
`2023/07/26 10:27:54 ZEBRA: [SWQK6-6JY63][EC 4043309074] 19:1468:10.115.6.44/32: Failed to enqueue dataplane install`
`2023/07/26 10:28:10 BGP: [HZN6M-XRM1G] %NOTIFICATION: sent to neighbor 10.5.72.3 4/0 (Hold Timer Expired) 0 bytes`
`2023/07/26 10:28:10 BGP: [PXVXG-TFNNT] %ADJCHANGE: neighbor 10.5.72.3(Unknown) in vrf default Down BGP Notification send`
`2023/07/26 10:28:11 BGP: [HZN6M-XRM1G] %NOTIFICATION: sent to neighbor 10.5.72.4 4/0 (Hold Timer Expired) 0 bytes`
`2023/07/26 10:28:11 BGP: [PXVXG-TFNNT] %ADJCHANGE: neighbor 10.5.72.4(Unknown) in vrf default Down BGP Notification send`

following that nothing can repair the system but reboot. (edited)

Apachez · July 26, 2023, 10:12am

Im thinking could it be that you for whatever reason are running out of conntrack table space?

One way if you dont use the firewall settings is to disable conntrack (trying to figure out how to do that in VyOS 1.4) the other is to tweak the timers to something like so (dont just copy the values, depends on how much RAM you got etc but you get a hint of which settings to dig into further):

system {
    conntrack {
        expect-table-size 10485760
        hash-size 10485760
        table-size 10485760
        timeout {
            icmp 10
            other 600
            tcp {
                close 10
                close-wait 30
                established 600
                fin-wait 30
                last-ack 30
                syn-recv 30
                syn-sent 30
                time-wait 30
            }
            udp {
                other 600
                stream 600
            }
        }
    }
    option {
        performance throughput
        reboot-on-panic
        startup-beep
    }
}

Ref: Bgpd 10Gbps nf_conntrack: table full, dropping packet - #4 by sidnei

aserkin · July 26, 2023, 7:30pm

I have
set system conntrack modules
in my config for some reason, but no firewall. Will give it a try removing this though i haven’t seen nf_conntrack messages in the log.
Thank you.

Apachez · July 26, 2023, 7:39pm

Unless you want to utilize the ALG aka firewall helpers you can disable that section by “delete system conntrack modules”.

These are the modules available in 1.4 rolling and can get enabled one by one:

vyos@vyos# set system conntrack modules 
Possible completions:
   ftp                  FTP connection tracking
   h323                 H.323 connection tracking
   nfs                  NFS connection tracking
   pptp                 PPTP connection tracking
   sip                  SIP connection tracking
   sqlnet               SQLnet connection tracking
   tftp                 TFTP connection tracking

My experience of these “helpers” from various vendors and models is that they break more than help

There is “set system conntrack ignore” which can define rules that would bypass connection tracking - this might help for corner cases but I still havent figured out how to completely disable conntrack properly through the config (like if one want to use VyOS as a strict router to push packets with no filtering of the packets flowing through except for destination routes).

aserkin · July 27, 2023, 11:40am

My concerns are related also to the massive route updates which come while l2tp subscribers are connecting/disconnecting. The node works fine with 5-8k subs, but after 12-16k gets unpredictable despite the low cpu utilization and miserable traffic under 20mbps. It can stand a week without a problem or fail the next day after testing starts.

aserkin · July 28, 2023, 1:37pm

Well, i had a chance to investigate the conntrack table behavior during our last tests. Actually it did not go above the default nf_conntrack_max = 262144. It was far lower the limit. Arrow shows the failure time.

Apachez · July 29, 2023, 8:06pm

Looks like you should be safe when it comes to number of conntrack entries but its still worrying that we see a steady increase over time without leveling out.

Also what happen a few hours prior when we see that drop from approx 36k connections down to 31k (like 2cm to the left of your red arrow)?

How does “sudo netstat -atunp” and “sudo netstat -atunp | wc -l” look like over time?

Also output from “sudo free”?

I have encountered something similar with older Linux kernel and Apache2 webserver where shutting it down sometimes failed due to way too many hanged TCP-sessions. Which gave that even if it looked alright with netstat and the process was killed you had a hard time to start it again. Quickfix was to simply reboot the server which would fix everything.

Adjusting TCP-timers fixes those issues to some extent.

Viacheslav · July 30, 2023, 7:25am

The problem is not with conntrack but with FRR

Apachez · July 30, 2023, 8:42am

Most likely yes but at the same time frr is used by many around the world with little to no reports such as this thread.

Apachez · July 30, 2023, 8:47am

Could it be a ulimit issue?

Comparing VyOS with Ubuntu (ulimit -a):

VyOS 1.4-rolling-202307250317 (Linux 6.1.40-amd64-vyos #1 SMP PREEMPT_DYNAMIC Sun Jul 23 21:10:16 UTC 2023 x86_64 GNU/Linux)

core file size          (blocks, -c) 0
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) 31706
max locked memory       (kbytes, -l) 1018836
max memory size         (kbytes, -m) unlimited
open files                      (-n) 1024
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) 8192
cpu time               (seconds, -t) unlimited
max user processes              (-u) 31706
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited

Ubuntu 22.10 (Linux 5.19.0-47-generic #49-Ubuntu SMP PREEMPT_DYNAMIC Sun Jun 18 20:38:50 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux)

real-time non-blocking time  (microseconds, -R) unlimited
core file size              (blocks, -c) 0
data seg size               (kbytes, -d) unlimited
scheduling priority                 (-e) 0
file size                   (blocks, -f) unlimited
pending signals                     (-i) 63233
max locked memory           (kbytes, -l) 2033332
max memory size             (kbytes, -m) unlimited
open files                          (-n) 1024
pipe size                (512 bytes, -p) 8
POSIX message queues         (bytes, -q) 819200
real-time priority                  (-r) 0
stack size                  (kbytes, -s) 8192
cpu time                   (seconds, -t) unlimited
max user processes                  (-u) 63233
virtual memory              (kbytes, -v) unlimited
file locks                          (-x) unlimited

Apachez · July 30, 2023, 9:03am

Not usual to only get 3 hits when searching on Google nowadays but one of these were:

github.com/FRRouting/frr

IPv6 BGP / Zebra - Failed to enqueue dataplane install

opened 12:51PM - 01 Nov 22 UTC

SwimGeek

bug zebra

FRR: 8.3.1 OS: Debian 11.5 Kernel: 5.10.140 --------------- **Describe t…he bug** - [ x ] Did you check if this is a duplicate issue? - [ ] Did you test it on the latest FRRouting/frr master branch? I think there is a bug in the way Zebra logs errors when it can't insert IPv6 prefixes. 'zebra[3869179]: [SWQK6-6JY63][EC 4043309074] 0:254:2c0f:f6d0:27::/48: Failed to enqueue dataplane install' I think the '0:254' part before the route is wrong. **To Reproduce** Not easy to reproduce. We believe we received about 55k IPv6 prefixes from a peer at a peering point. This caused the router to partially fail. After noticing errors in the frr log, zebra could not insert routes, we restarted frr and routing returned to normal. **Additional context** Somebody noticed something similar recently, also contains the '0:254' https://github.com/FRRouting/frr/issues/10199 The log file does not give us enough information to trace the problem in detail, but maybe fixing the logging is a good start.

which at a first glanze seems to fit the description.

A workaround mentioned (would probably need adjustment for VyOS) is to use:

zebra nexthop-group keep 1

Question: Do you have ECMP configured? Do you get the same error with ECMP disabled?

According to above link there seems to be a racecondition between Zebra (FRR) and the kernel when ECMP is being used and one of the participating interfaces bounces which results in the routes being lost.

aserkin · July 30, 2023, 10:46am

Every LAC IP address has two ecmp next-hops

$ show ip route 10.220.1.216
Routing entry for 10.220.1.216/32
  Known via "bgp", distance 20, metric 0, best
  Last update 4d02h05m ago
  * 10.228.134.136, via eth1.618, weight 1
  * 10.228.134.138, via eth2.619, weight 1

Worth trying this workaround. Thank you

aserkin · August 1, 2023, 8:05am

I can confirm the behavior described for ipv4 too. Last night we tried the workaround an it definitely helps. Without zebra nexthop-group keep 1 we had quite unstable behavior when putting down various interfaces towards bgp peers. The node loses connectivity for few minutes, drops l2tp subscribers and finally it got stuck with kernel free of any bgp routes.
After adding zebra workaround i tried the same - put interfaces down for few seconds and nothing happened. After the BGP session restored the nexthop group is recreated with two nexthops. That’s great we probably have stable system after half a year)

Thank you, @Apachez )

Apachez · August 1, 2023, 11:37pm

You’re welcome!

I have filed this as a bug (with a solution) at: ⚓ T5424 Routes vanishes when using FRR with ECMP and one of the ECMP paths is no longer available

system · August 3, 2023, 11:37pm

This topic was automatically closed 2 days after the last reply. New replies are no longer allowed.