VyOS 1.3 rolling - zebra crashes

Hi,

We’re running 1.3-rolling-202001190217 with FRRouting v7.3-dev-20191226-00-gd7cce42cc and can reproduce zebra crashing when it appears to be setting nexthop.

I, perhaps mistakenly, opened an issue on FFRouting’s Github project where I provided a relatively simply FRR configuration and debug logs. Developers there are asking some questions I please need some answers to:

  • What is the latest commit signature of FRR used in VyOS 1.3 rolling?

  • How do I generate a core dump in VyOS?

  • How often is FRR in VyOS 1.3-rolling updated? I observe numerous commits in the last 30 days relating to nexthop logic

A question I have:

  • Shouldn’t VyOS restart crashed processes? I would image that it would continually cycle in a situation where the problem is reproducible but what about ‘one in a blue moon’ scenarios?

Regards
David Herselman

You can get the latest FRR commit signature when running vtysh -c show version

FRR is not updated automatically. An update is deployed by re-running our CI Job at https://ci.vyos.net/job/vyos-build-frr/job/master/

which we do from time to time until there is a new FRR release - then we pin the Git Tag.

As VyOS is a regular Linux system coredump generation needs to be enabled by setting

ulimit -c unlimited
1 Like

Many thanks!

To double check then, as VyOS 1.3-rolling-202001190217 reports FRR version as 7.3-dev-20191226-00-gd7cce42cc I would presume it to be the development branch with commits up to the 25th of December 2019 as I presume the builds to happen shortly after midnight?

I presume the subscription VyOS uses FRR release versions and not rolling as Releases · FRRouting/frr · GitHub only references ‘frr-7.3-dev’ with commit signature eef47e1 on the 6th of September 2019.

VyOS 1.3 rolling is not generating core dumps when I set ‘ulimit -c unlimited’ due to it presumably only applying to the current session. I tried adding this to /usr/libexec/vyos/init/vyos-router before ‘/usr/lib/frr/frrinit.sh start’ in the ‘start ()’ function without it generating a core dump file in /var/core

Any suggestions on where I need to set this, to obtain a core dump when FRR dies?

PS: I presume watchfrr should be restarting processes but BGP remains non-functional whilst OSPF recovers. Running ‘vtysh -c “show running”’ shows everything except for BGP. Is this by design or another problem?

Apologies, I think I have a better understanding of the automated build environment and really appreciate the granular detail jenkins provides. It was great to see an updated build of frr this afternoon and that the latest rolling release including it was already available.

The problem is gone but it would still be useful to know how to obtain a core dump in VyOS in future.

No core dumps in /var/core or /tmp after doing the following:

admin@testing:~$ sysctl -a -r core_pattern;
kernel.core_pattern = /var/core/core-%e-%p-%t

#sysctl -w kernel.core_pattern="/tmp/frr-core-%e-%p-%t"

pico /etc/security/limits.conf;
 *  soft  core  unlimited
init 6;

admin@testing:~$ ulimit -a | grep core;
core file size          (blocks, -c) unlimited