VyOS 1.5-rolling-202401260023 FRR BGP daemon crash

We have a testing environment and built a set of MPLS using VYOS. Recently, we found that the FRR daemon (bgp) of VYOS often crashes. Any suggestions, Thanks~~
LOG:
Jun 10 10:11:13 bgpd[2672750]: [N463T-4M950][EC 33554449] u1:s23 attributes too long, cannot send UPDATE
Jun 10 10:11:13 bgpd[2672750]: [N463T-4M950][EC 33554449] u1:s23 attributes too long, cannot send UPDATE
Jun 10 10:11:13 bgpd[2672750]: [N463T-4M950][EC 33554449] u1:s23 attributes too long, cannot send UPDATE
Jun 10 10:11:13 BGP[2672750]: Received signal 11 at 1717985473 (si_addr 0x30, PC 0x555a241f192c); abortingā€¦
Jun 10 10:11:13 BGP[2672750]: /usr/lib/x86_64-linux-gnu/frr/libfrr.so.0(zlog_backtrace_sigsafe+0x6f) [0x7f0c64e8322f]
Jun 10 10:11:13 BGP[2672750]: /usr/lib/x86_64-linux-gnu/frr/libfrr.so.0(zlog_signal+0xf5) [0x7f0c64e83435]
Jun 10 10:11:13 frrinit.sh[2672750]: core_handler: showing active allocations in memory group libfrr
Jun 10 10:11:13 frrinit.sh[2672750]: core_handler: memstats: Buffer : 4 * 24
Jun 10 10:11:13 frrinit.sh[2672750]: core_handler: memstats: Host config : 6 * (variably sized)
Jun 10 10:11:13 frrinit.sh[2672750]: core_handler: memstats: Command Tokens : 13161 * 72
Jun 10 10:11:13 frrinit.sh[2672750]: core_handler: memstats: Command Token Text : 9423 * (variably sized)
Jun 10 10:11:13 frrinit.sh[2672750]: core_handler: memstats: Command Token Help : 9423 * (variably sized)
Jun 10 10:11:13 frrinit.sh[2672750]: core_handler: memstats: Command Argument Name : 2295 * (variably sized)
Jun 10 10:11:13 frrinit.sh[2672750]: core_handler: memstats: RCU thread : 2 * 128
Jun 10 10:11:13 frrinit.sh[2672750]: core_handler: memstats: RCU sequence barrier : 1 * 32
Jun 10 10:11:13 frrinit.sh[2672750]: core_handler: memstats: Scripting : 18 * (variably sized)
Jun 10 10:11:13 frrinit.sh[2672750]: core_handler: memstats: FRR POSIX Thread : 4 * (variably sized)
Jun 10 10:11:13 frrinit.sh[2672750]: core_handler: memstats: POSIX sync primitives : 4 * (variably sized)
Jun 10 10:11:13 frrinit.sh[2672750]: core_handler: memstats: Graph : 42 * 8
Jun 10 10:11:13 frrinit.sh[2672750]: core_handler: memstats: Graph Node : 15485 * 32
Jun 10 10:11:13 frrinit.sh[2672750]: core_handler: memstats: Hash : 581 * (variably sized)
Jun 10 10:11:13 frrinit.sh[2672750]: core_handler: memstats: Hash Bucket : 415306 * 32
Jun 10 10:11:13 frrinit.sh[2672750]: core_handler: memstats: Hash Index : 291 * (variably sized)
Jun 10 10:11:13 frrinit.sh[2672750]: core_handler: memstats: Interface : 7 * 272
Jun 10 10:11:13 frrinit.sh[2672750]: core_handler: memstats: Connected : 9 * 48
Jun 10 10:11:13 frrinit.sh[2672750]: core_handler: memstats: Link List : 62 * 40
Jun 10 10:11:13 frrinit.sh[2672750]: core_handler: memstats: Link Node : 378 * 24
Jun 10 10:11:13 frrinit.sh[2672750]: core_handler: memstats: Temporary memory : 2349 * (variably sized)
Jun 10 10:11:13 frrinit.sh[2672750]: core_handler: memstats: Bitfield memory : 2 * (variably sized)
Jun 10 10:11:13 frrinit.sh[2672750]: core_handler: memstats: Module loading name : 1 * 5
Jun 10 10:11:13 frrinit.sh[2672750]: core_handler: memstats: Nexthop : 16 * 152
Jun 10 10:11:13 frrinit.sh[2672750]: core_handler: memstats: Nexthop label : 16 * 8
Jun 10 10:11:13 BGP[2672750]: /usr/lib/x86_64-linux-gnu/frr/libfrr.so.0(+0xf6001) [0x7f0c64eb8001]
Jun 10 10:11:13 frrinit.sh[2672750]: core_handler: memstats: Northbound Node : 258 * 1192
Jun 10 10:11:13 frrinit.sh[2672750]: core_handler: memstats: Northbound Configuration : 2 * 24
Jun 10 10:11:13 frrinit.sh[2672750]: core_handler: memstats: Northbound Configuration Entry: 2 * 1032
Jun 10 10:11:13 frrinit.sh[2672750]: core_handler: memstats: Prefix : 9 * 56
Jun 10 10:11:13 frrinit.sh[2672750]: core_handler: memstats: Privilege information : 3 * (variably sized)
Jun 10 10:11:13 frrinit.sh[2672750]: core_handler: memstats: Ring buffer : 22 * (variably sized)
Jun 10 10:11:13 frrinit.sh[2672750]: core_handler: memstats: Skip List : 1192 * 56
Jun 10 10:11:13 frrinit.sh[2672750]: core_handler: memstats: Skip Node : 3379 * (variably sized)
Jun 10 10:11:13 frrinit.sh[2672750]: core_handler: memstats: Skiplist Counters : 1192 * 68
Jun 10 10:11:13 frrinit.sh[2672750]: core_handler: memstats: Socket union : 18 * 112
Jun 10 10:11:13 frrinit.sh[2672750]: core_handler: memstats: Stream : 7 * (variably sized)
Jun 10 10:11:13 frrinit.sh[2672750]: core_handler: memstats: Stream FIFO : 22 * 64
Jun 10 10:11:13 frrinit.sh[2672750]: core_handler: memstats: Route table : 105 * 56
Jun 10 10:11:13 frrinit.sh[2672750]: core_handler: memstats: Thread : 47 * 160
Jun 10 10:11:13 frrinit.sh[2672750]: core_handler: memstats: Thread master : 12 * (variably sized)
Jun 10 10:11:13 frrinit.sh[2672750]: core_handler: memstats: Thread Poll Info : 6 * 8192
Jun 10 10:11:13 frrinit.sh[2672750]: core_handler: memstats: Thread stats : 26 * 96
Jun 10 10:11:13 frrinit.sh[2672750]: core_handler: memstats: Typed-hash bucket : 12 * (variably sized)
Jun 10 10:11:13 frrinit.sh[2672750]: core_handler: memstats: Typed-heap array : 1 * (variably sized)
Jun 10 10:11:13 frrinit.sh[2672750]: core_handler: memstats: Vector : 31055 * 24
Jun 10 10:11:13 frrinit.sh[2672750]: core_handler: memstats: Vector index : 31055 * (variably sized)
Jun 10 10:11:13 frrinit.sh[2672750]: core_handler: memstats: VRF : 2 * 216
Jun 10 10:11:13 frrinit.sh[2672750]: core_handler: memstats: VRF bit-map : 3 * 8
Jun 10 10:11:13 frrinit.sh[2672750]: core_handler: memstats: VTY : 2 * (variably sized)
Jun 10 10:11:13 frrinit.sh[2672750]: core_handler: memstats: VTY server : 2 * 32
Jun 10 10:11:13 BGP[2672750]: /lib/x86_64-linux-gnu/libc.so.6(+0x3bfd0) [0x7f0c64ae6fd0]
Jun 10 10:11:13 frrinit.sh[2672750]: core_handler: memstats: Work queue : 6 * 144
Jun 10 10:11:13 frrinit.sh[2672750]: core_handler: memstats: Work queue name string : 6 * (variably sized)
Jun 10 10:11:13 frrinit.sh[2672750]: core_handler: memstats: YANG module : 5 * 48
Jun 10 10:11:13 frrinit.sh[2672750]: core_handler: memstats: Zclient : 2 * 3144
Jun 10 10:11:13 frrinit.sh[2672750]: core_handler: memstats: Redistribution instance IDs : 6 * 2
Jun 10 10:11:13 frrinit.sh[2672750]: core_handler: memstats: log thread-local buffer : 2 * 24608
Jun 10 10:11:13 frrinit.sh[2672750]: core_handler: showing active allocations in memory group logging subsystem
Jun 10 10:11:13 frrinit.sh[2672750]: core_handler: memstats: syslog target : 1 * 56
Jun 10 10:11:13 frrinit.sh[2672750]: core_handler: showing active allocations in memory group bgpd
Jun 10 10:11:13 frrinit.sh[2672750]: core_handler: memstats: Peer KeepAlive Timer : 9 * 24
Jun 10 10:11:13 frrinit.sh[2672750]: core_handler: memstats: BGP Peer pthread Conditional : 1 * 48
Jun 10 10:11:13 frrinit.sh[2672750]: core_handler: memstats: BGP Peer pthread Mutex : 1 * 40
Jun 10 10:11:13 frrinit.sh[2672750]: core_handler: memstats: Mac Hash Entry : 4 * 16
Jun 10 10:11:13 frrinit.sh[2672750]: core_handler: memstats: Mac Hash Entry Intf String : 5 * (variably sized)
Jun 10 10:11:13 frrinit.sh[2672750]: core_handler: memstats: BGP instance : 4 * (variably sized)
Jun 10 10:11:13 frrinit.sh[2672750]: core_handler: memstats: BGP listen socket details : 2 * 144
Jun 10 10:11:13 frrinit.sh[2672750]: core_handler: memstats: BGP peer : 12 * 20864
Jun 10 10:11:13 frrinit.sh[2672750]: core_handler: memstats: BGP peer hostname : 22 * (variably sized)
Jun 10 10:11:13 frrinit.sh[2672750]: core_handler: memstats: Peer group : 1 * 64
Jun 10 10:11:13 frrinit.sh[2672750]: core_handler: memstats: BGP Peer group hostname : 1 * 9
Jun 10 10:11:13 frrinit.sh[2672750]: core_handler: memstats: BGP peer af : 9 * 80
Jun 10 10:11:13 frrinit.sh[2672750]: core_handler: memstats: BGP update group : 1 * 104
Jun 10 10:11:13 frrinit.sh[2672750]: core_handler: memstats: BGP update subgroup : 1 * 240
Jun 10 10:11:13 frrinit.sh[2672750]: core_handler: memstats: BGP packet : 1 * 56
Jun 10 10:11:13 frrinit.sh[2672750]: core_handler: memstats: BGP attribute : 374243 * 320
Jun 10 10:11:13 frrinit.sh[2672750]: core_handler: memstats: BGP aspath : 38517 * 40
Jun 10 10:11:13 BGP[2672750]: /usr/lib/frr/bgpd(bgp_advertise_clean_subgroup+0x1c) [0x555a241f192c]
Jun 10 10:11:13 frrinit.sh[2672750]: core_handler: memstats: BGP aspath seg : 38516 * 24
Jun 10 10:11:13 frrinit.sh[2672750]: core_handler: memstats: BGP aspath segment data : 38516 * (variably sized)
Jun 10 10:11:13 frrinit.sh[2672750]: core_handler: memstats: BGP aspath str : 38517 * (variably sized)
Jun 10 10:11:13 frrinit.sh[2672750]: core_handler: memstats: BGP table : 92 * 56
Jun 10 10:11:13 frrinit.sh[2672750]: core_handler: memstats: BGP node : 2364 * 192
Jun 10 10:11:13 frrinit.sh[2672750]: core_handler: memstats: BGP route : 4374 * 136
Jun 10 10:11:13 frrinit.sh[2672750]: core_handler: memstats: BGP ancillary route info : 4374 * 432
Jun 10 10:11:13 frrinit.sh[2672750]: core_handler: memstats: BGP connected : 3 * 4
Jun 10 10:11:13 frrinit.sh[2672750]: core_handler: memstats: BGP adv attr : 1 * 24
Jun 10 10:11:13 frrinit.sh[2672750]: core_handler: memstats: BGP adv : 1 * 64
Jun 10 10:11:13 frrinit.sh[2672750]: core_handler: memstats: BGP synchronise : 1 * 48
Jun 10 10:11:13 frrinit.sh[2672750]: core_handler: memstats: BGP adj out : 1208 * 96
Jun 10 10:11:13 frrinit.sh[2672750]: core_handler: memstats: BGP multipath info : 1879 * 48
Jun 10 10:11:13 frrinit.sh[2672750]: core_handler: memstats: community : 13 * 40
Jun 10 10:11:13 frrinit.sh[2672750]: core_handler: memstats: community val : 13 * (variably sized)
Jun 10 10:11:13 frrinit.sh[2672750]: core_handler: memstats: community str : 13 * (variably sized)
Jun 10 10:11:13 frrinit.sh[2672750]: core_handler: memstats: extcommunity : 6 * 40
Jun 10 10:11:13 frrinit.sh[2672750]: core_handler: memstats: extcommunity val : 6 * (variably sized)
Jun 10 10:11:13 frrinit.sh[2672750]: core_handler: memstats: extcommunity str : 6 * (variably sized)
Jun 10 10:11:13 frrinit.sh[2672750]: core_handler: memstats: community-list handler : 1 * 120
Jun 10 10:11:13 frrinit.sh[2672750]: core_handler: memstats: BGP nexthop : 9 * 184
Jun 10 10:11:13 frrinit.sh[2672750]: core_handler: memstats: BGP peer update interface : 10 * 5
Jun 10 10:11:13 frrinit.sh[2672750]: core_handler: memstats: BGP own address : 3 * 64
Jun 10 10:11:13 frrinit.sh[2672750]: core_handler: memstats: BGP EVPN MH Information : 1 * 56
Jun 10 10:11:13 frrinit.sh[2672750]: core_handler: memstats: Software Version : 9 * (variably sized)
Jun 10 10:11:13 frrinit.sh[2672750]: core_handler: memstats: BGP Martian Addr Intf String : 3 * 5
Jun 10 10:11:13 BGP[2672750]: /usr/lib/frr/bgpd(subgroup_update_packet+0x583) [0x555a241f4163]
Jun 10 10:11:13 frrinit.sh[2672750]: core_handler: memstats: BGP PBR Context : 1 * 32
Jun 10 10:11:13 frrinit.sh[2672750]: core_handler: memstats: BGP interface context : 7 * 4
Jun 10 10:11:13 frrinit.sh[2672750]: core_handler: memstats: BGP EVPN instance information : 1 * 56
Jun 10 10:11:13 frrinit.sh[2672750]: core_handler: showing active allocations in memory group rfapi
Jun 10 10:11:13 frrinit.sh[2672750]: core_handler: memstats: NVE Configuration : 1 * 2984
Jun 10 10:11:13 frrinit.sh[2672750]: core_handler: memstats: RFAPI Generic : 1 * 296
Jun 10 10:11:13 frrinit.sh[2672750]: core_handler: memstats: RFAPI Import Table : 1 * 208
Jun 10 10:11:13 frrinit.sh[2672750]: core_handler: memstats: RFAPI Monitor Encap : 2187 * 40
Jun 10 10:11:13 frrinit.sh[2672750]: core_handler: memstats: RFAPI IT Extra : 1197 * 40
Jun 10 10:11:13 BGP[2672750]: /usr/lib/frr/bgpd(bgp_generate_updgrp_packets+0x429) [0x555a241b19c9]
Jun 10 10:11:13 BGP[2672750]: /usr/lib/x86_64-linux-gnu/frr/libfrr.so.0(event_call+0x7d) [0x7f0c64eca07d]
Jun 10 10:11:13 BGP[2672750]: /usr/lib/x86_64-linux-gnu/frr/libfrr.so.0(frr_run+0xc0) [0x7f0c64e7af00]
Jun 10 10:11:13 BGP[2672750]: /usr/lib/frr/bgpd(main+0x373) [0x555a241506b3]
Jun 10 10:11:13 BGP[2672750]: /lib/x86_64-linux-gnu/libc.so.6(+0x271ca) [0x7f0c64ad21ca]
Jun 10 10:11:13 BGP[2672750]: /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x85) [0x7f0c64ad2285]
Jun 10 10:11:13 BGP[2672750]: /usr/lib/frr/bgpd(_start+0x21) [0x555a24152431]
Jun 10 10:11:13 BGP[2672750]: in thread (bgp_generate_updgrp_packets) scheduled from ā€¦/bgpd/bgp_fsm.c:1006 bgp_adjust_routeadv()
Jun 10 10:11:13 watchfrr[1658]: [HD38Q-0HBRT][EC 268435457] bgpd state ā†’ down : read returned EOF
Jun 10 10:11:13 zebra[1677]: [VXKFG-8SJRV][EC 4043309121] Client ā€˜vncā€™ encountered an error and is shutting down.
Jun 10 10:11:13 zebra[1677]: [VXKFG-8SJRV][EC 4043309121] Client ā€˜bgpā€™ encountered an error and is shutting down.
Jun 10 10:11:13 zebra[1677]: [YDZ55-W3VM6] release_daemon_table_chunks: Released 0 table chunks
Jun 10 10:11:13 zebra[1677]: [JPSA8-5KYEA] client 31 disconnected 0 vnc routes removed from the rib
Jun 10 10:11:13 zebra[1677]: [S929C-NZR3N] client 31 disconnected 0 vnc nhgs removed from the rib
Jun 10 10:11:13 zebra[1677]: [YDZ55-W3VM6] release_daemon_table_chunks: Released 0 table chunks
Jun 10 10:11:13 zebra[1677]: [JPSA8-5KYEA] client 17 disconnected 0 bgp routes removed from the rib
Jun 10 10:11:13 zebra[1677]: [S929C-NZR3N] client 17 disconnected 0 bgp nhgs removed from the rib
Jun 10 10:11:18 systemd[1]: [email protected]: Deactivated successfully.
Jun 10 10:11:18 systemd[1]: [email protected]: Scheduled restart job, restart counter is at 921838.
Jun 10 10:11:18 systemd[1]: Stopped [email protected] - Serial Getty on ttyS0.
Jun 10 10:11:18 systemd[1]: Started [email protected] - Serial Getty on ttyS0.
Jun 10 10:11:18 agetty[2932135]: /dev/ttyS0: not a tty
Jun 10 10:11:18 watchfrr[1658]: [YFT0P-5Q5YX] Forked background command [pid 2932136]: /usr/lib/frr/watchfrr.sh restart bgpd
Jun 10 10:11:18 frrinit.sh[2932136]: Cannot stop bgpd: pid 2672750 not running
Jun 10 10:11:18 watchfrr.sh[2932142]: Cannot stop bgpd: pid 2672750 not running
Jun 10 10:11:18 zebra[1677]: [V98V0-MTWPF] client 17 says hello and bids fair to announce only bgp routes vrf=0
Jun 10 10:11:18 zebra[1677]: [V98V0-MTWPF] client 31 says hello and bids fair to announce only vnc routes vrf=0
Jun 10 10:11:19 bgpd[2932144]: [VTVCM-Y2NW3] Configuration Read in Took: 00:00:00
Jun 10 10:11:19 bgpd[2932144]: [G6NKK-8C6DV] end_config: VTY:0x55c2117b6430, pending SET-CFG: 0
Jun 10 10:11:19 watchfrr[1658]: [QDG3Y-BY5TN] bgpd state ā†’ up : connect succeeded

BGP Configļ¼š
protocols {
bgp {
address-family {
ipv4-unicast {
redistribute {
connected {
}
static {
}
}
}
}
neighbor eth1.2303 {
interface {
peer-group PE_IPv4
}
}
neighbor vti4 {
interface {
peer-group CE_IPv4
}
update-source dum5
}
parameters {
graceful-restart {
}
graceful-shutdown
log-neighbor-changes
router-id 169.254.100.4
}
peer-group CE_IPv4 {
address-family {
ipv4-unicast {
allowas-in {
}
as-override
soft-reconfiguration {
}
}
}
ebgp-multihop 255
remote-as 65533
}
peer-group PE_IPv4 {
address-family {
ipv4-unicast {
allowas-in {
}
as-override
soft-reconfiguration {
}
}
}
ebgp-multihop 255
remote-as 9936
update-source eth1.2303
}
system-as 65511
}
}
table 2303

I want to know what version of FRR corresponds to the this VYOS version?

I want to know what version of FRR corresponds to the this VYOS version?

You can pull that up with vtysh and/or dpkg, for example I have a 1.5-rolling-202406020021 that shows:

$ vtysh -c 'show ver'
FRRouting 9.1 (TEST-VYOS-LEFT) on Linux(6.6.32-amd64-vyos).
[...big text blob snipped...]
$ dpkg -l frr
[...snip...]
||/ Name           Version            Architecture Description
+++-==============-==================-============-=============================================================
ii  frr            9.1-172-g923799172 amd64        FRRouting suite of internet protocols (BGP, OSPF, IS-IS, ...)

Received signal 11

A SIGSEGV is a serious fault that should probably be reported upstream if it hasnā€™t already been resolved. If you can confirm itā€™s still a problem with the most recent VyOS rolling (there have been changes to the FRR build since the date on your rolling), that would be a starting point.

If you want to check a point-in-time build process for the FRR package used, itā€™s in the vyos-build repo at: vyos-build/packages/frr at current Ā· vyos/vyos-build Ā· GitHub

2 Likes

This is excellent advice and exactly what I was going to post.

Please do keep us updated though!

Since you are using VyOS 1.5-rolling-202401260023 I would start by trying the latest version which currently is VyOS 1.5-rolling-202406060020 to see if the problem remains.

The above can be downloaded from: Releases Ā· vyos/vyos-rolling-nightly-builds Ā· GitHub

Then as config the command mode is prefered since it makes it easier to copy the config elsewhere.

Like so:

show config commands | strip-private

You can leave out that ā€œstrip-privateā€ part if you dont have anything sensitive in your config.

Info about hardware like what kind of CPU (vendor/model) and amount of RAM would be helpful aswell. Also which network cards and drivers you are using (output of lspci and lsmod).

If the crashes are often at the same address, this is likely a bug. If it happens at various random addresses, could likely be caused by hardware issues like bad RAM. ECC RAM helps, sadly not all hardware platforms support it. Try stressing the hardware with https://www.memtest.org/ for a few hours - boot-time RAM test by BIOS alone is not sufficient. Long long ago a good test was a Linux kernel build, GCC was good at detecting bad RAM due to heavy use of pointers, BGP may be simlar as it manages big routing tables.