Very slow performance on Celeron J3160 - configure->show takes over 20s

angelnu · May 27, 2022, 2:10am

After testing Vyos on 2 Proxmox VMs on my NUC I have moved my primary vyos instance to a dedicated HW: Nitrowall - Celeron J3160 with 8 GBit of RAM. It is the same vyos configuration and version: 1.4-rolling-202204300743

While the network traffic is working well, the vyos configuration commands do not:

vyos@vyos:~$ configure
[edit]
vyos@vyos# time $(show>/dev/null)
real    0m24.267s
user    0m2.332s
sys     0m5.149s

During that time I observe cli-shell-api and unionfs-fuse taking a high CPU usage:

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND       
  95543 vyos      20   0  224796   2332      0 S  52.9   0.0   0:07.04 unionfs-fuse                                                                                                                                                                       
  95605 vyos      20   0   25036  11828   7856 S  17.6   0.1   0:04.11 cli-shell-api

I debugged it a little further and found that the same command takes way longer when running in configuration mode vs the operation mode:

vyos@vyos:~$ time cli-shell-api showCfg>/dev/null

real    0m0.688s
user    0m0.309s
sys     0m0.374s
vyos@vyos:~$ configure
[edit]
vyos@vyos# time cli-shell-api showCfg>/dev/null

real    0m25.923s
user    0m2.311s
sys     0m5.493s
[edit]

Unfortunately this is as well killing the performance of the Terraform provider I am helping to develop as each Terraform refresh does multiple API calls whereeach takes 25 seconds. This was taking just a second before in my NUC.

Is this the expected performance of a Celeron J3160 ?
Is cli-shell-api going away as part of the rewrite python+XML?

I will open a proper bug in Fabricator and try to help debugging - waiting for my new id to be approved there.

zsdc · May 27, 2022, 1:18pm

We are missing here one very important thing - the configuration used on the device. I never saw such a slow work with common configs (up to 1000 lines, I would say). Can you post it? Based on unionfs CPU usage, it seems that it should be not the smallest one.

angelnu · May 27, 2022, 1:55pm

The configuration has 2054 lines - mostly DHCP static rules. I can upload it if it helps - I am looking for the helper script to remove keys and other personal info.

But what it surprised me is that the same configuration on the NUC takes less than a second to show: I moved the /config/config.boot from the NUC to the Nitrowall. This is what it makes me think something is really wrong.

Update:

# show configuration commands|grep -v static|wc
    226    1692   13689

Update 2: another example of the huge difference in elapsed time between the NUC and the Celeron J3160 - the same change (adding a wireguard peer&static rule - 5 commands in total):

NUC: 7+7 seconds
Celeron: 186 + 150 seconds

module.vyos.module.common_primary.vyos_config_block_tree.vpn_wireguard[0]: Modifying... [id=interfaces wireguard wg01]
module.vyos.module.common_secondary.vyos_config_block_tree.vpn_wireguard[0]: Modifying... [id=interfaces wireguard wg01]
module.vyos.module.common_secondary.vyos_config_block_tree.vpn_wireguard[0]: Modifications complete after 7s [id=interfaces wireguard wg01]
module.vyos.module.common_secondary.vyos_config_block_tree.static_routes: Modifying... [id=protocols static route]
module.vyos.module.common_primary.vyos_config_block_tree.vpn_wireguard[0]: Still modifying... [id=interfaces wireguard wg01, 10s elapsed]
module.vyos.module.common_secondary.vyos_config_block_tree.static_routes: Modifications complete after 7s [id=protocols static route]
module.vyos.module.common_primary.vyos_config_block_tree.vpn_wireguard[0]: Still modifying... [id=interfaces wireguard wg01, 20s elapsed]
...
module.vyos.module.common_primary.vyos_config_block_tree.vpn_wireguard[0]: Still modifying... [id=interfaces wireguard wg01, 3m0s elapsed]
module.vyos.module.common_primary.vyos_config_block_tree.vpn_wireguard[0]: Modifications complete after 3m6s [id=interfaces wireguard wg01]
module.vyos.module.common_primary.vyos_config_block_tree.static_routes: Modifying... [id=protocols static route]
module.vyos.module.common_primary.vyos_config_block_tree.static_routes: Still modifying... [id=protocols static route, 10s elapsed]
...
module.vyos.module.common_primary.vyos_config_block_tree.static_routes: Still modifying... [id=protocols static route, 2m30s elapsed]
module.vyos.module.common_primary.vyos_config_block_tree.static_routes: Modifications complete after 2m40s [id=protocols static route]

Nikolay · May 30, 2022, 5:33am

The command to strip private-info from config:

show config commands | strip-private

zsdc · May 30, 2022, 7:13pm

Can you share the cat /proc/cpuinfo output from the host with Celeron? Maybe with a list of CPU flags, it will be possible to create a VM with similar CPU features to see if something can affect the performance so critically.
For example, aes flag improves encryption operations a lot. Maybe we have here something similar that is unavailable in Celeron CPU.

angelnu · May 31, 2022, 12:17am

Processing: config-strip.txt…

Thanks @Nikolay - as I am not able to share configs yet here is my stripped configuration: slow_config - JustPaste.it

I am returning my Nitrowall as for very similar price I can get an 11th gen i5 which is what my NUC runs with (which has no problems with my config). @zsdc - the speed for connections, including wireguard was quite ok. The issue is (was) with the unionfs - is it using any special instructions?

Here is the output of cpuinfo before I send the Nitrowall back. It has 4 cpus like this:

processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 76
model name : Intel(R) Celeron(R) CPU J3160 @ 1.60GHz
stepping : 4
microcode : 0x408
cpu MHz : 480.000
cache size : 1024 KB
physical id : 0
siblings : 4
core id : 0
cpu cores : 4
apicid : 0
initial apicid : 0
fpu : yes
fpu_exception : yes
cpuid level : 11
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm sse4_1 sse4_2 movbe popcnt tsc_deadline_timer aes rdrand lahf_lm 3dnowprefetch ida arat epb dtherm tpr_shadow vnmi flexpriority ept vpid tsc_adjust smep erms
bogomips : 3199.70
clflush size : 64
cache_alignment : 64
address sizes : 36 bits physical, 48 bits virtual
power management:

zsdc · May 31, 2022, 2:27pm

I created a VM with similar CPU (-cpu base,fpu=on,vme=on,de=on,pse=on,tsc=on,msr=on,pae=on,mce=on,cx8=on,apic=on,sep=on,mtrr=on,pge=on,mca=on,cmov=on,pat=on,pse36=on,clflush=on,acpi=on,mmx=on,fxsr=on,sse=on,sse2=on,ss=on,ht=on,tm=on,pbe=on,syscall=on,nx=on,rdtscp=on,lm=on,pni=on,pclmulqdq=on,dtes64=on,monitor=on,ds_cpl=on,vmx=on,est=on,tm2=on,ssse3=on,cx16=on,xtpr=on,pdcm=on,sse4_1=on,sse4_2=on,movbe=on,popcnt=on,aes=on,rdrand=on,lahf_lm=on,3dnowprefetch=on,arat=on,tsc_adjust=on,erms=on) but did not get even close results to what you got… The config attached here took ~25 seconds to commit and show commands are fast enough to not pay attention to the speed. Maybe there was something with storage that slowed-down I/O operations.

config_test.txt (60.1 KB)

angelnu · June 5, 2022, 12:54pm

As I do not longer have access to my nitrowall I will not be able to provide additional info @zsdc - what is weird is that the CPU consumption was so high - in case of IO overhead I would have expected the kernel time or wait time to be the largest contributors…