VyOS Random rebooting

Hi There,

I am running into a very serious issue,

My core Vyos is randomly rebooting.
I could see no logs. Password authentication is disabled - no point of hack.
When I monitor the logs I see below:

255.255.255.255. *

Not sure if any broadcast is happening.

But I have been on this for a week, and no resolution.

rituka@DC:~$ sh ver
Version: VyOS 1.1.8
Description: VyOS 1.1.8 (helium)
Copyright: 2017 VyOS maintainers and contributors
Built by: maintainers@vyos.net
Built on: Sat Nov 11 13:44:36 UTC 2017
Build ID: 1711111344-b483efc
System type: x86 64-bit
Boot via: image
HW model: Super Server
HW S/N: 1X20841912
HW UUID: 0080E78B-26EB-E911-8000-3CECEF4048A2
Uptime: 21:41:26 up 1:32, 2 users, load average: 0.05, 0.07, 0.12

Please HELP!!!

Regards
Rita

I personally recommend you upgrade to 1.2+ or above. 1.1.8 is way too old

Determine if there is a possible cyber attack against Vyos

I have been running 1.1.8 & 1.1.7 throughout my infra, no issues.
How can I check if any cyber attach?

I’m not official, but maybe I can give you a sense of where to look for the problem. VyOS uses a Debian-based underlying system, and the restart of VyOS may be due to some special condition that causes the kernel to panic.

a) check the system kernel log and analyze the network traffic to determine if there is any abnormal situation.

b) Take some time to test the regularity of automatic kernel restart to analyze where the problem may occur.

This may take a lot of time. If conditions permit, you can test 1.2 and above in the same environment to see if the same problem will occur.

This debugging is based on a basic assumption: the kernel cannot restart automatically without reason.

Thanks a lot!

But broadcast 255.255.255.255 *, it this normal?

But broadcast 255.255.255.255 *, it this normal?

255.255.255.255 the broadcast address is a bit strange, but you didn’t provide more logs to confirm that there was a problem.

Here is an explanation of the address on the Internet

“255.255.255.255” is a kind of limited broadcast address, which is usually sent when the computer does not know its own IP address, and then returns the target address to the sending device to obtain the target IP, such as asking for the address from the DHCP server when the device starts up. Generally, the router will not forward the broadcast packet whose target is the limited broadcast address.

Hi,

Please let me know what logs to provide.
I have been stuck on this issue real bad.

The Vyos has been rebooting after every 2 hours.

You won’t get any real support here for 1.1.8 I’m sorry, it’s just SO old.

You are way better off to try the latest 1.3 release candidate, or build your own 1.2 ISO (it’s not difficult) and test with that.

If I had to guess, I’d suggest it’s a hardware fault causing your reboots (memory, overtheating etc)

I can only provide one direction, and at present, I can’t be sure about your actual situation (at the same time, I’m not a network security or kernel developer yet).

Maybe the dmesg kernel logs and traffic capture from upstream routes or switches will help

If I had to guess, I’d suggest it’s a hardware fault causing your reboots (memory, overtheating etc)

@tjh is suggestion is worthy of reference

Again, I need to reiterate my suggestion: I recommend that you use the updated vyos

Hi,

I tried version 1.2 and the issue still persists.

Changed the hardware/power source.

Is my network or my WAN link causing the issue?

Hello @rituka, try to collect logs to the remote host.
Did you researched run show log after reboot?
Did RAM passes memtest?
Which NICs do you have on this HW?

Hi,

Yea, I did check the logs,
The logs just shows the VPN/routing info.
RAM is fine.
I am using onboard NIC and extra intel 1G NIC.

Which exactly NICs?
I heard that some NICs might be affected for the packet of death.
Do you have a remote console for this router?
It looks like HW issue or kernel panic.

@Dmitry: this definitely is the kernel panic.
But now I removed the NIC as well, an hour before and still the router is rebooting.!

Yes, I have the access to the vyos.

I can recommend to disable reboot on panic and collect data to figure out what happened

sudo su -l
echo 0 > /proc/sys/kernel/panic

Then just take a screenshot and let’s look

rituka@DC:~$ sudo su -l

root@DC:~# echo 0 > /proc/sys/kernel/panic

root@DC:~#

Did this!!

Let’s monitor.
Btw, any other logs to share?

root@DC:~# echo 0 > /proc/sys/kernel/panic

this crashed my router.

These are logs, just before the reboot:

Mar 23 01:15:00 SLURM-DC kernel: [ 0.000000] ACPI: RSDP 00000000000f05b0 000024 (v02 SUPERM)
Mar 23 01:15:00 SLURM-DC kernel: [ 0.000000] ACPI: XSDT 000000006dd240b0 0000DC (v01 SUPERM SUPERM 01072009 AMI 00010013)
Mar 23 01:15:00 SLURM-DC kernel: [ 0.000000] ACPI: FACP 000000006dd46970 00010C (v05 01072009 AMI 00010013)
Mar 23 01:15:00 SLURM-DC kernel: [ 0.000000] ACPI Error: Gpe0Block - 32-bit FADT register is too long (32 bytes, 256 bits) to convert to GAS struct - 255 bits max, truncating (20131115/tbfadt-202)
Mar 23 01:15:00 SLURM-DC kernel: [ 0.000000] ACPI: DSDT 000000006dd24220 02274F (v02 SUPERM SMCI–MB 01072009 INTL 20120913)
Mar 23 01:15:00 SLURM-DC kernel: [ 0.000000] ACPI: FACS 000000006dd8ff80 000040
Mar 23 01:15:00 SLURM-DC kernel: [ 0.000000] ACPI: APIC 000000006dd46a80 000084 (v03 01072009 AMI 00010013)
Mar 23 01:15:00 SLURM-DC kernel: [ 0.000000] ACPI: FPDT 000000006dd46b08 000044 (v01 01072009 AMI 00010013)
Mar 23 01:15:00 SLURM-DC kernel: [ 0.000000] ACPI: FIDT 000000006dd46b50 00009C (v01 01072009 AMI 00010013)
Mar 23 01:15:00 SLURM-DC kernel: [ 0.000000] ACPI: SPMI 000000006dd46bf0 000041 (v05 SUPERM SMCI–MB 00000000 AMI. 00000000)
Mar 23 01:15:00 SLURM-DC kernel: [ 0.000000] ACPI: MCFG 000000006dd46c38 00003C (v01 SUPERM SMCI–MB 01072009 MSFT 00000097)
Mar 23 01:15:00 SLURM-DC kernel: [ 0.000000] ACPI: HPET 000000006dd46c78 000038 (v01 SUPERM SMCI–MB 01072009 AMI. 0005000B)
Mar 23 01:15:00 SLURM-DC kernel: [ 0.000000] ACPI: LPIT 000000006dd46cb0 000094 (v01 INTEL GNLR 00000000 MSFT 0000005F)
Mar 23 01:15:00 SLURM-DC kernel: [ 0.000000] ACPI: SSDT 000000006dd46d48 000248 (v02 INTEL sensrhub 00000000 INTL 20120913)
Mar 23 01:15:00 SLURM-DC kernel: [ 0.000000] ACPI: SSDT 000000006dd46f90 002BAE (v02 INTEL PtidDevc 00001000 INTL 20120913)
Mar 23 01:15:00 SLURM-DC kernel: [ 0.000000] ACPI: SSDT 000000006dd49b40 000BE3 (v02 INTEL Ther_Rvp 00001000 INTL 20120913)
Mar 23 01:15:00 SLURM-DC kernel: [ 0.000000] ACPI: DBGP 000000006dd4a728 000034 (v01 INTEL 00000000 MSFT 0000005F)
Mar 23 01:15:00 SLURM-DC kernel: [ 0.000000] ACPI: DBG2 000000006dd4a760 000054 (v00 INTEL 00000000 MSFT 0000005F)
Mar 23 01:15:00 SLURM-DC kernel: [ 0.000000] ACPI: SSDT 000000006dd4a7b8 000615 (v02 INTEL xh_Zumba 00000000 INTL 20120913)
Mar 23 01:15:00 SLURM-DC kernel: [ 0.000000] ACPI: PRAD 000000006dd4add0 0000CA (v02 PRADID PRADTID 00000001 INTL 20120913)
Mar 23 01:15:00 SLURM-DC kernel: [ 0.000000] ACPI: SSDT 000000006dd4aea0 00547E (v02 SaSsdt SaSsdt 00003000 INTL 20120913)
Mar 23 01:15:00 SLURM-DC kernel: [ 0.000000] ACPI: UEFI 000000006dd50320 000042 (v01 00000000 00000000)
Mar 23 01:15:00 SLURM-DC kernel: [ 0.000000] ACPI: SSDT 000000006dd50368 000E73 (v02 CpuRef CpuSsdt 00003000 INTL 20120913)
Mar 23 01:15:00 SLURM-DC kernel: [ 0.000000] ACPI: DMAR 000000006dd511e0 0000A8 (v01 INTEL SKL 00000001 INTL 00000001)
Mar 23 01:15:00 SLURM-DC kernel: [ 0.000000] ACPI: EINJ 000000006dd51288 000130 (v01 AMI AMI.EINJ 00000000 AMI. 00000000)
Mar 23 01:15:00 SLURM-DC kernel: [ 0.000000] ACPI: ERST 000000006dd513b8 000230 (v01 AMIER AMI.ERST 00000000 AMI. 00000000)
Mar 23 01:15:00 SLURM-DC kernel: [ 0.000000] ACPI: BERT 000000006dd515e8 000030 (v01 AMI AMI.BERT 00000000 AMI. 00000000)
Mar 23 01:15:00 SLURM-DC kernel: [ 0.000000] ACPI: HEST 000000006dd51618 00027C (v01 AMI AMI.HEST 00000000 AMI. 00000000)
Mar 23 01:15:00 SLURM-DC kernel: [ 0.000000] Zone ranges:
Mar 23 01:15:00 SLURM-DC kernel: [ 0.000000] DMA [mem 0x00001000-0x00ffffff]
Mar 23 01:15:00 SLURM-DC kernel: [ 0.000000] DMA32 [mem 0x01000000-0xffffffff]
Mar 23 01:15:00 SLURM-DC kernel: [ 0.000000] Normal [mem 0x100000000-0x2853fffff]
Mar 23 01:15:00 SLURM-DC kernel: [ 0.000000] Movable zone start for each node
Mar 23 01:15:00 SLURM-DC kernel: [ 0.000000] Early memory node ranges
Mar 23 01:15:00 SLURM-DC kernel: [ 0.000000] node 0: [mem 0x00001000-0x00097fff]
Mar 23 01:15:00 SLURM-DC kernel: [ 0.000000] node 0: [mem 0x00100000-0x68698fff]
Mar 23 01:15:00 SLURM-DC kernel: [ 0.000000] node 0: [mem 0x686c4000-0x6d0d5fff]
Mar 23 01:15:00 SLURM-DC kernel: [ 0.000000] node 0: [mem 0x6d433000-0x6d61dfff]
Mar 23 01:15:00 SLURM-DC kernel: [ 0.000000] node 0: [mem 0x6ffff000-0x6fffffff]
Mar 23 01:15:00 SLURM-DC kernel: [ 0.000000] node 0: [mem 0x100000000-0x2853fffff]
Mar 23 01:15:00 SLURM-DC kernel: [ 0.000000] Built 1 zonelists in Zone order, mobility grouping on. Total pages: 2009469
Mar 23 01:15:00 SLURM-DC kernel: [ 0.000000] Policy zone: Normal
Mar 23 01:15:00 SLURM-DC kernel: [ 0.000000] Kernel command line: BOOT_IMAGE=/boot/VyOS-1.1.8/vmlinuz boot=live quiet vyatta-union=/boot/VyOS-1.1.8 console=ttyS0,9600 console=tty0
Mar 23 01:15:00 SLURM-DC kernel: [ 0.000000] Memory: 7947072K/8165560K available (4044K kernel code, 594K rwdata, 1652K rodata, 976K init, 700K bss, 218488K reserved)
Mar 23 01:15:00 SLURM-DC kernel: [ 0.010000] tsc: Fast TSC calibration failed
Mar 23 01:15:00 SLURM-DC kernel: [ 0.050000] tsc: Unable to calibrate against PIT
Mar 23 01:15:00 SLURM-DC kernel: [ 0.002344] ENERGY_PERF_BIAS: Set to ‘normal’, was ‘performance’
Mar 23 01:15:00 SLURM-DC kernel: [ 0.002344] ENERGY_PERF_BIAS: View and update with x86_energy_perf_policy(8)
Mar 23 01:15:00 SLURM-DC kernel: [ 0.017939] ACPI: All ACPI Tables successfully acquired
Mar 23 01:15:00 SLURM-DC kernel: [ 0.243644] ACPI: Executed 24 blocks of module-level executable AML code
Mar 23 01:15:00 SLURM-DC kernel: [ 0.303808] [Firmware Bug]: ACPI: BIOS OSI(Linux) query ignored
Mar 23 01:15:00 SLURM-DC kernel: [ 0.380655] ACPI: Dynamic OEM Table Load:
Mar 23 01:15:00 SLURM-DC kernel: [ 0.380657] ACPI: PRAD (null) 0000CA (v02 PRADID PRADTID 00000001 INTL 20120913)
Mar 23 01:15:00 SLURM-DC kernel: [ 0.409421] ACPI: SSDT 000000006d188c18 00037F (v02 PmRef Cpu0Cst 00003001 INTL 20120913)
Mar 23 01:15:00 SLURM-DC kernel: [ 0.410116] ACPI: Dynamic OEM Table Load:
Mar 23 01:15:00 SLURM-DC kernel: [ 0.410118] ACPI: SSDT (null) 00037F (v02 PmRef Cpu0Cst 00003001 INTL 20120913)
Mar 23 01:15:00 SLURM-DC kernel: [ 0.410227] ACPI: SSDT 000000006d189798 00070E (v02 PmRef Cpu0Ist 00003000 INTL 20120913)
Mar 23 01:15:00 SLURM-DC kernel: [ 0.410915] ACPI: Dynamic OEM Table Load:
Mar 23 01:15:00 SLURM-DC kernel: [ 0.410916] ACPI: SSDT (null) 00070E (v02 PmRef Cpu0Ist 00003000 INTL 20120913)
Mar 23 01:15:00 SLURM-DC kernel: [ 0.469917] ACPI: SSDT 000000006d188618 0005AA (v02 PmRef ApIst 00003000 INTL 20120913)
Mar 23 01:15:00 SLURM-DC kernel: [ 0.470673] ACPI: Dynamic OEM Table Load:
Mar 23 01:15:00 SLURM-DC kernel: [ 0.470674] ACPI: SSDT (null) 0005AA (v02 PmRef ApIst 00003000 INTL 20120913)
Mar 23 01:15:00 SLURM-DC kernel: [ 0.499417] ACPI: SSDT 000000006d1aec18 000119 (v02 PmRef ApCst 00003000 INTL 20120913)
Mar 23 01:15:00 SLURM-DC kernel: [ 0.500115] ACPI: Dynamic OEM Table Load:
Mar 23 01:15:00 SLURM-DC kernel: [ 0.500116] ACPI: SSDT (null) 000119 (v02 PmRef ApCst 00003000 INTL 20120913)
Mar 23 01:15:00 SLURM-DC kernel: [ 0.530970] ACPI Exception: AE_NOT_FOUND, While evaluating Sleep State [_S1
] (20131115/hwxface-580)
Mar 23 01:15:00 SLURM-DC kernel: [ 0.530975] ACPI Exception: AE_NOT_FOUND, While evaluating Sleep State [_S2_] (20131115/hwxface-580)
Mar 23 01:15:00 SLURM-DC kernel: [ 0.530979] ACPI Exception: AE_NOT_FOUND, While evaluating Sleep State [_S3_] (20131115/hwxface-580)
Mar 23 01:15:00 SLURM-DC kernel: [ 0.531045] mce: [Firmware Bug]: Ignoring request to disable invalid MCA bank 8.
Mar 23 01:15:00 SLURM-DC kernel: [ 0.531045] mce: [Firmware Bug]: Ignoring request to disable invalid MCA bank 9.
Mar 23 01:15:00 SLURM-DC kernel: [ 0.531046] mce: [Firmware Bug]: Ignoring request to disable invalid MCA bank 10.
Mar 23 01:15:00 SLURM-DC kernel: [ 0.531047] mce: [Firmware Bug]: Ignoring request to disable invalid MCA bank 11.
Mar 23 01:15:00 SLURM-DC kernel: [ 0.531047] mce: [Firmware Bug]: Ignoring request to disable invalid MCA bank 12.
Mar 23 01:15:00 SLURM-DC kernel: [ 0.531048] mce: [Firmware Bug]: Ignoring request to disable invalid MCA bank 13.
Mar 23 01:15:00 SLURM-DC kernel: [ 0.531049] mce: [Firmware Bug]: Ignoring request to disable invalid MCA bank 14.
Mar 23 01:15:00 SLURM-DC kernel: [ 0.632399] ACPI: Enabled 6 GPEs in block 00 to 7F
Mar 23 01:15:00 SLURM-DC kernel: [ 0.632690] SCSI subsystem initialized
Mar 23 01:15:00 SLURM-DC kernel: [ 0.867172] type=2000 audit(1616462089.870:1): initialized
Mar 23 01:15:00 SLURM-DC kernel: [ 0.867815] bounce pool size: 64 pages
Mar 23 01:15:00 SLURM-DC kernel: [ 2.215494] i8042: No controller found
Mar 23 01:15:00 SLURM-DC kernel: [ 2.261746] Key type dns_resolver registered
Mar 23 01:15:00 SLURM-DC kernel: [ 2.812861] scsi 2:0:0:0: Direct-Access ATA WDC WDS120G2G0A- UE45 PQ: 0 ANSI: 5
Mar 23 01:15:00 SLURM-DC kernel: [ 2.813002] sd 2:0:0:0: Attached scsi generic sg0 type 0
Mar 23 01:15:00 SLURM-DC kernel: [ 2.813071] sd 2:0:0:0: [sda] 234455040 512-byte logical blocks: (120 GB/111 GiB)
Mar 23 01:15:00 SLURM-DC kernel: [ 2.813363] sd 2:0:0:0: [sda] Write Protect is off
Mar 23 01:15:00 SLURM-DC kernel: [ 2.813444] sd 2:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn’t support DPO or FUA
Mar 23 01:15:00 SLURM-DC kernel: [ 2.814796] sd 2:0:0:0: [sda] Attached SCSI disk
Mar 23 01:15:00 SLURM-DC kernel: [ 3.162398] raid6: sse2x1 13235 MB/s
Mar 23 01:15:00 SLURM-DC kernel: [ 3.332363] raid6: sse2x2 16180 MB/s
Mar 23 01:15:00 SLURM-DC kernel: [ 3.502330] raid6: sse2x4 18565 MB/s
Mar 23 01:15:00 SLURM-DC kernel: [ 3.502331] raid6: using algorithm sse2x4 (18565 MB/s)
Mar 23 01:15:00 SLURM-DC kernel: [ 3.502331] raid6: using ssse3x2 recovery algorithm
Mar 23 01:15:00 SLURM-DC kernel: [ 9.023416] EXT4-fs (sda1): warning: maximal mount count reached, running e2fsck is recommended
Mar 23 01:15:00 SLURM-DC kernel: [ 9.606854] random: debconf-communi urandom read with 125 bits of entropy available
Mar 23 01:15:00 SLURM-DC kernel: [ 9.876195] random: nonblocking pool is initialized
Mar 23 01:15:00 SLURM-DC kernel: [ 10.169719] platform microcode: Direct firmware load failed with error -2
Mar 23 01:15:00 SLURM-DC kernel: [ 10.169721] platform microcode: Falling back to user helper
Mar 23 01:15:00 SLURM-DC kernel: [ 10.185887] platform microcode: Direct firmware load failed with error -2
Mar 23 01:15:00 SLURM-DC kernel: [ 10.185889] platform microcode: Falling back to user helper
Mar 23 01:15:00 SLURM-DC kernel: [ 10.187550] platform microcode: Direct firmware load failed with error -2
Mar 23 01:15:00 SLURM-DC kernel: [ 10.187552] platform microcode: Falling back to user helper
Mar 23 01:15:00 SLURM-DC kernel: [ 10.189018] platform microcode: Direct firmware load failed with error -2
Mar 23 01:15:00 SLURM-DC kernel: [ 10.189019] platform microcode: Falling back to user helper
Mar 23 01:15:00 SLURM-DC kernel: [ 10.481387] power_meter ACPI000D:00: Ignoring unsafe software power cap!
Mar 23 01:15:00 SLURM-DC kernel: [ 10.908804] usb 1-14.1: ep 0x81 - rounding interval to 64 microframes, ep desc says 80 microframes
Mar 23 01:15:00 SLURM-DC kernel: [ 10.908806] usb 1-14.1: ep 0x82 - rounding interval to 32 microframes, ep desc says 40 microframes
Mar 23 01:15:00 SLURM-DC zebra[2661]: Zebra 0.99.20.1 starting: vty@0
Mar 23 01:15:00 SLURM-DC ripd[2663]: RIPd 0.99.20.1 starting: vty@0
Mar 23 01:15:00 SLURM-DC ripngd[2665]: RIPNGd 0.99.20.1 starting: vty@0
Mar 23 01:15:00 SLURM-DC ospfd[2667]: OSPFd 0.99.20.1 starting: vty@0
Mar 23 01:15:00 SLURM-DC ospf6d[2669]: OSPF6d (Quagga-0.99.20.1 ospf6d-0.9.7r) starts: vty@0
Mar 23 01:15:00 SLURM-DC bgpd[2671]: BGPd 0.99.20.1 starting: vty@0, bgp@:179

A hard reboot bought up the router.

I think it may be related to the following mistakes:

Mar 23 01:15:00 SLURM-DC kernel: [ 9.876195] random: nonblocking pool is initialized
Mar 23 01:15:00 SLURM-DC kernel: [ 10.169719] platform microcode: Direct firmware load failed with error -2
Mar 23 01:15:00 SLURM-DC kernel: [ 10.169721] platform microcode: Falling back to user helper
Mar 23 01:15:00 SLURM-DC kernel: [ 10.185887] platform microcode: Direct firmware load failed with error -2
Mar 23 01:15:00 SLURM-DC kernel: [ 10.185889] platform microcode: Falling back to user helper
Mar 23 01:15:00 SLURM-DC kernel: [ 10.187550] platform microcode: Direct firmware load failed with error -2
Mar 23 01:15:00 SLURM-DC kernel: [ 10.187552] platform microcode: Falling back to user helper
Mar 23 01:15:00 SLURM-DC kernel: [ 10.189018] platform microcode: Direct firmware load failed with error -2
Mar 23 01:15:00 SLURM-DC kernel: [ 10.189019] platform microcode: Falling back to user helper
Mar 23 01:15:00 SLURM-DC kernel: [ 10.481387] power_meter ACPI000D:00: Ignoring unsafe software power cap!

@jack9603301I could not find much about these error?

Is this hardware issue?