Vyos kill bgpd process

Hi,

I use vyos for bgp router with full routing.
I tried use 2 sessions BGP full routing and more 8 peers, but the bgpd process close and I need reboot my vyos.
If I use only one session then it works.

Does anyone have any idea what may be happening?

thanks.

@sidnei What version? How many RAM?

Hi,
version VyOS 1.3-rolling-202003031151
memory 8Gb

I need to see log files
sudo cat /var/log/frr/frr.log
sudo dmesg -T

And files from the directory
/var/log/atop

Indicate the date and time when the problem occurred.

Hi, bellow an part of frr.log:
Apr 6 07:04:19 BGP-DOMUS bgpd[1049]: [EC 33554454] 172.16.1.2 [Error] bgp_read_packet error: Connection reset by peer
Apr 6 07:08:19 BGP-DOMUS bgpd[1049]: [EC 33554454] 172.16.1.2 [Error] bgp_read_packet error: Connection reset by peer
Apr 6 07:26:19 BGP-DOMUS bgpd[1049]: [EC 33554454] 172.16.1.2 [Error] bgp_read_packet error: Connection reset by peer
Apr 6 07:30:27 BGP-DOMUS bgpd[1049]: [EC 33554465] 177.125.214.98 [FSM] Failure handling event BGP_Start in state Idle, prior events BGP_Stop, BGP_Stop, fd -1
Apr 6 07:31:26 BGP-DOMUS bgpd[1049]: bgp_update_receive: rcvd End-of-RIB for IPv4 Unicast from 177.125.214.98 in vrf default
Apr 6 07:32:19 BGP-DOMUS bgpd[1049]: [EC 33554454] 172.16.1.2 [Error] bgp_read_packet error: Connection reset by peer
Apr 6 07:33:04 BGP-DOMUS bgpd[1049]: bgp_update_receive: rcvd End-of-RIB for IPv4 Unicast from 177.125.210.254 in vrf default
Apr 6 07:42:19 BGP-DOMUS bgpd[1049]: [EC 33554454] 172.16.1.2 [Error] bgp_read_packet error: Connection reset by peer
Apr 6 07:46:19 BGP-DOMUS bgpd[1049]: [EC 33554454] 172.16.1.2 [Error] bgp_read_packet error: Connection reset by peer
Apr 6 07:56:19 BGP-DOMUS bgpd[1049]: [EC 33554454] 172.16.1.2 [Error] bgp_read_packet error: Connection reset by peer
Apr 6 08:10:19 BGP-DOMUS bgpd[1049]: [EC 33554454] 172.16.1.2 [Error] bgp_read_packet error: Connection reset by peer
Apr 6 08:18:19 BGP-DOMUS bgpd[1049]: [EC 33554454] 172.16.1.2 [Error] bgp_read_packet error: Connection reset by peer
Apr 6 08:20:19 BGP-DOMUS bgpd[1049]: [EC 33554454] 172.16.1.2 [Error] bgp_read_packet error: Connection reset by peer
Apr 6 08:32:19 BGP-DOMUS bgpd[1049]: [EC 33554454] 172.16.1.2 [Error] bgp_read_packet error: Connection reset by peer
Apr 6 08:34:19 BGP-DOMUS bgpd[1049]: [EC 33554454] 172.16.1.2 [Error] bgp_read_packet error: Connection reset by peer
Apr 6 08:38:19 BGP-DOMUS bgpd[1049]: [EC 33554454] 172.16.1.2 [Error] bgp_read_packet error: Connection reset by peer
Apr 6 08:46:19 BGP-DOMUS bgpd[1049]: [EC 33554454] 172.16.1.2 [Error] bgp_read_packet error: Connection reset by peer
Apr 6 08:48:19 BGP-DOMUS bgpd[1049]: [EC 33554454] 172.16.1.2 [Error] bgp_read_packet error: Connection reset by peer
Apr 6 09:07:26 BGP-DOMUS zebra[1020]: Received signal 11 at 1586164046 (si_addr 0x0, PC 0x7f2a91cfc85b); aborting…
Apr 6 09:07:26 BGP-DOMUS zebra[1020]: Backtrace for 16 stack frames:
Apr 6 09:07:26 BGP-DOMUS zebra[1020]: /usr/lib/x86_64-linux-gnu/frr/libfrr.so.0(zlog_backtrace_sigsafe+0x60) [0x7f2a91cf7100]
Apr 6 09:07:26 BGP-DOMUS zebra[1020]: /usr/lib/x86_64-linux-gnu/frr/libfrr.so.0(zlog_signal+0x10c) [0x7f2a91cf757c]a91cf7100]
Apr 6 09:07:26 BGP-DOMUS zebra[1020]: /usr/lib/x86_64-linux-gnu/frr/libfrr.so.0(+0x72f74) [0x7f2a91d17f74]91cf757c]a91cf7100]
Apr 6 09:07:26 BGP-DOMUS zebra[1020]: /lib/x86_64-linux-gnu/libpthread.so.0(+0x12730) [0x7f2a91b8b730]f74]91cf757c]a91cf7100]
Apr 6 09:07:26 BGP-DOMUS zebra[1020]: /usr/lib/x86_64-linux-gnu/frr/libfrr.so.0(+0x5785b) [0x7f2a91cfc85b]91cf757c]a91cf7100]
Apr 6 09:07:26 BGP-DOMUS zebra[1020]: /usr/lib/frr/zebra(+0x479c3) [0x559dbdea49c3]5785b) [0x7f2a91cfc85b]91cf757c]a91cf7100]
Apr 6 09:07:26 BGP-DOMUS zebra[1020]: /usr/lib/frr/zebra(+0x47909) [0x559dbdea4909]5785b) [0x7f2a91cfc85b]91cf757c]a91cf7100]
Apr 6 09:07:26 BGP-DOMUS zebra[1020]: /usr/lib/frr/zebra(zebra_nhg_rib_find+0x4d) [0x559dbdea464d]1cfc85b]91cf757c]a91cf7100]
Apr 6 09:07:26 BGP-DOMUS zebra[1020]: /usr/lib/frr/zebra(nexthop_active_update+0x594) [0x559dbdea4fe4]85b]91cf757c]a91cf7100]
Apr 6 09:07:26 BGP-DOMUS zebra[1020]: /usr/lib/frr/zebra(+0x5005f) [0x559dbdead05f]4) [0x559dbdea4fe4]85b]91cf757c]a91cf7100]
Apr 6 09:07:26 BGP-DOMUS zebra[1020]: /usr/lib/x86_64-linux-gnu/frr/libfrr.so.0(work_queue_run+0xc8) [0x7f2a91d2ed28]1cf7100]
Apr 6 09:07:26 BGP-DOMUS zebra[1020]: /usr/lib/x86_64-linux-gnu/frr/libfrr.so.0(thread_call+0x56) [0x7f2a91d256a6]28]1cf7100]
Apr 6 09:07:26 BGP-DOMUS zebra[1020]: /usr/lib/x86_64-linux-gnu/frr/libfrr.so.0(frr_run+0xd8) [0x7f2a91cf53e8]6a6]28]1cf7100]
Apr 6 09:07:26 BGP-DOMUS zebra[1020]: /usr/lib/frr/zebra(main+0x32e) [0x559dbde7b6fe]un+0xd8) [0x7f2a91cf53e8]6a6]28]1cf7100]
Apr 6 09:07:26 BGP-DOMUS zebra[1020]: /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xeb) [0x7f2a919dc09b]6a6]28]1cf7100]
Apr 6 09:07:26 BGP-DOMUS zebra[1020]: /usr/lib/frr/zebra(_start+0x2a) [0x559dbde7bdfa]n+0xeb) [0x7f2a919dc09b]6a6]28]1cf7100]
Apr 6 09:07:26 BGP-DOMUS zebra[1020]: in thread work_queue_run scheduled from lib/workqueue.c:140#012f2a91cfc85b); aborting…
Apr 6 09:07:26 BGP-DOMUS watchfrr[990]: [EC 268435457] zebra state → down : read returned EOF
Apr 6 09:07:31 BGP-DOMUS watchfrr[990]: [EC 100663303] Forked background command [pid 7666]: /usr/lib/frr/watchfrr.sh restart all
Apr 6 09:07:31 BGP-DOMUS ospf6d[1069]: Terminating on signal SIGINT
Apr 6 09:07:31 BGP-DOMUS ripngd[1061]: Terminating on signal
Apr 6 09:07:31 BGP-DOMUS bgpd[1049]: Terminating on signal
Apr 6 09:07:31 BGP-DOMUS ospfd[1065]: Terminating on signal
Apr 6 09:07:31 BGP-DOMUS ripd[1057]: Terminating on signal
Apr 6 09:07:31 BGP-DOMUS watchfrr[990]: [EC 268435457] ripngd state → down : read returned EOF
Apr 6 09:07:31 BGP-DOMUS watchfrr[990]: [EC 268435457] staticd state → down : read returned EOF
Apr 6 09:07:31 BGP-DOMUS watchfrr[990]: [EC 268435457] bfdd state → down : read returned EOF
Apr 6 09:07:31 BGP-DOMUS watchfrr[990]: [EC 268435457] ospf6d state → down : read returned EOF
Apr 6 09:07:31 BGP-DOMUS watchfrr[990]: [EC 268435457] ripd state → down : read returned EOF
Apr 6 09:07:31 BGP-DOMUS watchfrr[990]: [EC 268435457] ospfd state → down : read returned EOF
Apr 6 09:07:32 BGP-DOMUS bgpd[1049]: [EC 33554499] sendmsg_nexthop: zclient_send_message() failed
Apr 6 09:07:34 BGP-DOMUS bgpd[1049]: message repeated 7 times: [ [EC 33554499] sendmsg_nexthop: zclient_send_message() failed]
Apr 6 09:07:34 BGP-DOMUS bgpd[1049]: [EC 33554499] sendmsg_nexthop: zclient_send_message() failed

The problem occurred in Abr,6 9:07hs
And dmesg -T I don’t have more that date:
sudo dmesg -T
[Wed Apr 8 09:47:03 2020] IPv4: martian source 10.64.64.63 from 172.16.0.129, on dev eth3.3003
[Wed Apr 8 09:47:03 2020] ll header: 00000000: 00 1b 21 bc 35 af 64 d1 54 c9 bd af 08 00 …!.5.d.T…
[Wed Apr 8 09:47:04 2020] IPv4: martian source 10.64.64.63 from 172.16.0.129, on dev eth3.3003
[Wed Apr 8 09:47:04 2020] ll header: 00000000: 00 1b 21 bc 35 af 64 d1 54 c9 bd af 08 00 …!.5.d.T…
[Wed Apr 8 09:47:04 2020] IPv4: martian source 192.168.205.7 from 172.16.0.129, on dev eth3.3053
[Wed Apr 8 09:47:04 2020] ll header: 00000000: 00 1b 21 bc 35 af 64 d1 54 01 74 00 08 00 …!.5.d.T.t…
[Wed Apr 8 09:47:05 2020] IPv4: martian source 10.64.64.63 from 172.16.0.129, on dev eth3.3003
[Wed Apr 8 09:47:05 2020] ll header: 00000000: 00 1b 21 bc 35 af 64 d1 54 c9 bd af 08 00 …!.5.d.T…
[Wed Apr 8 09:47:06 2020] IPv4: martian source 192.168.205.7 from 172.16.0.129, on dev eth3.3053
[Wed Apr 8 09:47:06 2020] ll header: 00000000: 00 1b 21 bc 35 af 64 d1 54 01 74 00 08 00 …!.5.d.T.t…
[Wed Apr 8 09:47:06 2020] IPv4: martian source 10.64.64.63 from 172.16.0.129, on dev eth3.3003
[Wed Apr 8 09:47:06 2020] ll header: 00000000: 00 1b 21 bc 35 af 64 d1 54 c9 bd af 08 00 …!.5.d.T…
[Wed Apr 8 09:47:07 2020] IPv4: martian source 10.64.64.63 from 172.16.0.129, on dev eth3.3003
[Wed Apr 8 09:47:07 2020] ll header: 00000000: 00 1b 21 bc 35 af 64 d1 54 c9 bd af 08 00 …!.5.d.T…
[Wed Apr 8 09:47:08 2020] IPv4: martian source 10.64.64.63 from 172.16.0.129, on dev eth3.3003
[Wed Apr 8 09:47:08 2020] ll header: 00000000: 00 1b 21 bc 35 af 64 d1 54 c9 bd af 08 00 …!.5.d.T…
[Wed Apr 8 09:47:09 2020] IPv4: martian source 10.64.64.63 from 172.16.0.129, on dev eth3.3003
[Wed Apr 8 09:47:09 2020] ll header: 00000000: 00 1b 21 bc 35 af 64 d1 54 c9 bd af 08 00 …!.5.d.T…
[Wed Apr 8 09:47:10 2020] IPv4: martian source 10.64.64.63 from 172.16.0.129, on dev eth3.3003
[Wed Apr 8 09:47:10 2020] ll header: 00000000: 00 1b 21 bc 35 af 64 d1 54 c9 bd af 08 00 …!.5.d.T…
[Wed Apr 8 09:47:10 2020] IPv4: martian source 192.168.205.7 from 172.16.0.129, on dev eth3.3053
[Wed Apr 8 09:47:10 2020] ll header: 00000000: 00 1b 21 bc 35 af 64 d1 54 01 74 00 08 00 …!.5.d.T.t…
[Wed Apr 8 09:47:11 2020] IPv4: martian source 10.64.64.63 from 172.16.0.129, on dev eth3.3003
[Wed Apr 8 09:47:11 2020] ll header: 00000000: 00 1b 21 bc 35 af 64 d1 54 c9 bd af 08 00 …!.5.d.T…
[Wed Apr 8 09:47:11 2020] IPv4: martian source 192.168.205.7 from 172.16.0.129, on dev eth3.3053
[Wed Apr 8 09:47:11 2020] ll header: 00000000: 00 1b 21 bc 35 af 64 d1 54 01 74 00 08 00 …!.5.d.T.t…
[Wed Apr 8 09:47:12 2020] IPv4: martian source 192.168.205.7 from 172.16.0.129, on dev eth3.3053
[Wed Apr 8 09:47:12 2020] ll header: 00000000: 00 1b 21 bc 35 af 64 d1 54 01 74 00 08 00 …!.5.d.T.t…
[Wed Apr 8 09:47:12 2020] IPv4: martian source 10.64.64.63 from 172.16.0.129, on dev eth3.3003
[Wed Apr 8 09:47:12 2020] ll header: 00000000: 00 1b 21 bc 35 af 64 d1 54 c9 bd af 08 00 …!.5.d.T…
[Wed Apr 8 09:47:13 2020] IPv4: martian source 192.168.205.7 from 172.16.0.129, on dev eth3.3053
[Wed Apr 8 09:47:13 2020] ll header: 00000000: 00 1b 21 bc 35 af 64 d1 54 01 74 00 08 00 …!.5.d.T.t…
[Wed Apr 8 09:47:13 2020] IPv4: martian source 10.64.64.63 from 172.16.0.129, on dev eth3.3003
[Wed Apr 8 09:47:13 2020] ll header: 00000000: 00 1b 21 bc 35 af 64 d1 54 c9 bd af 08 00 …!.5.d.T…
[Wed Apr 8 09:47:14 2020] IPv4: martian source 10.64.64.63 from 172.16.0.129, on dev eth3.3003
[Wed Apr 8 09:47:14 2020] ll header: 00000000: 00 1b 21 bc 35 af 64 d1 54 c9 bd af 08 00 …!.5.d.T…
[Wed Apr 8 09:47:14 2020] IPv4: martian source 192.168.205.7 from 172.16.0.129, on dev eth3.3053
[Wed Apr 8 09:47:14 2020] ll header: 00000000: 00 1b 21 bc 35 af 64 d1 54 01 74 00 08 00 …!.5.d.T.t…
[Wed Apr 8 09:47:16 2020] net_ratelimit: 1 callbacks suppressed

Thanks for help!

Sidnei

This is the /var/log/atop

ls -la /var/log/atop/
total 101768
drwxr-xr-x 1 root root 4096 Apr 8 00:00 .
drwxr-xr-x 1 root root 4096 Apr 8 09:56 …
-rw-r–r-- 1 root root 3373888 Mar 11 00:00 atop_20200310
-rw-r–r-- 1 root root 3255312 Mar 12 00:00 atop_20200311
-rw-r–r-- 1 root root 3176940 Mar 13 00:00 atop_20200312
-rw-r–r-- 1 root root 3191834 Mar 14 00:00 atop_20200313
-rw-r–r-- 1 root root 3125644 Mar 15 00:00 atop_20200314
-rw-r–r-- 1 root root 3102426 Mar 16 00:00 atop_20200315
-rw-r–r-- 1 root root 3121152 Mar 17 00:00 atop_20200316
-rw-r–r-- 1 root root 3208467 Mar 18 00:00 atop_20200317
-rw-r–r-- 1 root root 3218072 Mar 19 00:00 atop_20200318
-rw-r–r-- 1 root root 3728506 Mar 20 00:00 atop_20200319
-rw-r–r-- 1 root root 3346983 Mar 21 00:00 atop_20200320
-rw-r–r-- 1 root root 3154567 Mar 22 00:00 atop_20200321
-rw-r–r-- 1 root root 3264755 Mar 23 00:00 atop_20200322
-rw-r–r-- 1 root root 3601723 Mar 24 00:00 atop_20200323
-rw-r–r-- 1 root root 4185547 Mar 25 00:00 atop_20200324
-rw-r–r-- 1 root root 3637000 Mar 26 00:00 atop_20200325
-rw-r–r-- 1 root root 3477719 Mar 27 00:00 atop_20200326
-rw-r–r-- 1 root root 3195817 Mar 28 00:00 atop_20200327
-rw-r–r-- 1 root root 3291758 Mar 29 00:00 atop_20200328
-rw-r–r-- 1 root root 3737564 Mar 30 00:00 atop_20200329
-rw-r–r-- 1 root root 3864632 Mar 31 00:00 atop_20200330
-rw-r–r-- 1 root root 3717536 Apr 1 00:00 atop_20200331
-rw-r–r-- 1 root root 3821560 Apr 2 00:00 atop_20200401
-rw-r–r-- 1 root root 4004054 Apr 3 00:00 atop_20200402
-rw-r–r-- 1 root root 3929133 Apr 4 00:00 atop_20200403
-rw-r–r-- 1 root root 3820381 Apr 5 00:00 atop_20200404
-rw-r–r-- 1 root root 3724397 Apr 6 00:00 atop_20200405
-rw-r–r-- 1 root root 3977189 Apr 7 00:00 atop_20200406
-rw-r–r-- 1 root root 4210623 Apr 8 00:00 atop_20200407
-rw-r–r-- 1 root root 1556290 Apr 8 09:50 atop_20200408
-rw-r–r-- 1 root root 0 Apr 8 00:00 daily.log
-rw------- 1 root root 0 Apr 8 00:00 dummy_after
-rw------- 1 root root 0 Apr 8 00:00 dummy_before
[edit]

Sidnei