VyOS 1.1.8 and KVM Freeze


#1

Hi !

I’ve got a problem which (IMHO) somehow related to KVM/bridge networking.
In short - host OS (tried SuSE 42.3, SuSE Leap 15.0, Debian 9) freezes completely at random intervals. At first, I blamed JetWay fanless PC with 10 built-in Ethernet interfaces. Recently I reinstalled whole system on another PC with PCI 4-port Ethernet card - and what a surprise - this night machine becomes frozen.

I checked QEMU logs, found only these suspicious records.

… VM log
warning: host doesn’t support requested feature: CPUID.01H:ECX.pdcm [bit 15]
warning: host doesn’t support requested feature: CPUID.01H:ECX.osxsave [bit 27]
main-loop: WARNING: I/O thread spun for 1000 iterations

… syslog
Jul 12 17:13:36 linrt libvirtd[1096]: 2018-07-12 14:13:36.116+0000: 1096: error : qemuMonitorIO:710 : internal error: End of file from qemu monitor
Jul 12 17:13:36 linrt virtlogd[1257]: 2018-07-12 14:13:36.318+0000: 1257: error : virNetSocketReadWire:1801 : End of file while reading data: Input/output error

Changing network devices from virtio to rtl don’t change anything. CPU load is very low all the time. RAM consumed by both host OS and VyOS is below 900 MB.

It is possible that my bridge definition is missing something, and that causes freezes of host OS?

iface br2_DMZ inet manual
   bridge_ports enp18s0
   bridge_stp off
   bridge_waitpot 0
   bridge_fd 0.0
auto br2_INT
iface br2_INT inet manual
   bridge_ports enp19s0
   bridge_stp off
   bridge_waitpot 0
   bridge_fd 0.0

Thanks in advance for any help.


#2

Looks like a hardware issue. Does it run stable if you have other VMs running on it? vyos 1.1.8 is a very old debian, try 1.2 instead.
It has a new kenrel too, so you can benefit from paravirtualization too.


#3

Hmm. I didn‘t have frezes on KVM, not with 1.1.8 (only used that one for a short amount of time) and not with 1.2.0.
So I‘d suspect a different root cause than KVM in general.

What chipset have you configured in KVM for the VM?
Do other debian based VMs run fine or do you have freezes there as well?


#4

Same issue on 2 completely different PCs.
These are routers and only VyOS VM is running on KVM.


#5

I suspect my bridge definition (on my first post) but not sure.
KVM chipset is i440FX.
Its not a VM freeze, its a Debian KVM host freeze. Same issue with SuSE KVM host.


#6

Ok, I think with i440FX you should be fine. (I had issues with Q35 on stable because Q35 seems to require newer KVM/QEMU version than on stable. After moving to testing it was fine.)

My bridge definition is almost identical to yours except that I also have an ip for the host. So far I’ve had no issues with it and therefore doubt this is the problem. (I have bridge_fd 0 instead of 0.0)

Are you seeing anything suspect in dmesg?

Edit: Coincidence or not the exact same just happened here…


#7

There are no massive reports for issues on KVM
so i will suggest HW issues too


#8

You meant you also got KVM host freeze ?


#9

Yes, actually twice yesterday for the first time. That was with 1.2.0-rolling+201807120337 I have now upgraded to 1.2.0-rolling+201807150337 [1.2.0-rolling+201807150337 and it seems ok. But I’ll have to observe…


#10

Leave it for a few days with moderate traffic. Its not seems to be dependent upon VyOS version, host OS version and even hardware, looks like problem with bridge as I pointed out in my firs post.


#11

It was/is actually my network card, suddenly again after being fine for days.


#12

Host os is OK or frozen to death ?


#13

The first two times - yes. Then I only lost the NIC/bridge connectivity but maybe it would have ended in a freeze as well…

It’s really a special situation I have here with my 10GE NIC with Aquantia Chip which is not yet as battle proof as other chipsets. I think it has nothing to do with VyOS apart from the fact that the only VM I expose this bridge/NIC to is the VyOS VM.


#14

Hi, binaryanomaly,

Did you solved problem with the freezes ?


#15

Well, … let’s just say it did not yet happen again. I’m quite confident it is related to my NIC and I probably forgot to turn off GRO/LRO which can lead to this behavior in bridge mode with my NIC.