Unable to start flow-accounting for sub-interfaces - e.g. eth0.2 and eth1.10

Hi all,

Issue description:

  • uacctd process crashes when enabling flow-accounting on sub-interfaces such as eth0.2 and eth0.99.
  • uacctd process doesn’t crash if flow-accounting is configured only for base interfaces such as eth0, eth1
  • uacctd process crashes if both base interfaces and sub-interfaces are configured (in any combination of interfaces)

Error event from the /var/log/message log:

Dec 15 23:18:31 snow-rtr vyos-configd[743]: Received message: {"type": "init"}
Dec 15 23:18:31 snow-rtr vyos-configd[743]: config session pid is 7841
Dec 15 23:18:31 snow-rtr vyos-configd[743]: Received message: {"type": "node", "data": "/usr/libexec/vyos/conf_mode/flow_accounting_conf.py"}
Dec 15 23:18:31 snow-rtr vyos-configd[743]: Sending response 8
Dec 15 23:18:33 snow-rtr systemd[1]: Starting ulog accounting daemon...
Dec 15 23:18:33 snow-rtr systemd[1]: Started ulog accounting daemon.
Dec 15 23:18:33 snow-rtr systemd[2746]: opt-vyatta-config-tmp-new_config_7841.mount: Succeeded.
Dec 15 23:18:33 snow-rtr systemd[1]: opt-vyatta-config-tmp-new_config_7841.mount: Succeeded.
Dec 15 23:18:34 snow-rtr commit: Successful change to active configuration by user vyos on /dev/pts/0
Dec 15 23:18:34 snow-rtr kernel: [ 5801.984313] uacctd[8220]: segfault at 6 ip 00007fe92aa9c77e sp 00007ffc0a0c5ef8 error 4 in libc-2.31.so[7fe92aa1a000+14b000]
Dec 15 23:18:34 snow-rtr kernel: [ 5801.984358] Code: 4c 8d 0c 16 4c 39 cf 0f 82 63 01 00 00 48 89 d1 f3 a4 c3 80 fa 08 73 12 80 fa 04 73 1e 80 fa 01 77 26 72 05 0f b6 0e 88 0f c3 <48> 8b 4c 16 f8 48 8b 36 48 89 4c 17 f8 48 89 37 c3 8b 4c 16 fc 8b
Dec 15 23:18:34 snow-rtr systemd[1]: uacctd.service: Main process exited, code=killed, status=11/SEGV
Dec 15 23:18:34 snow-rtr systemd[1]: uacctd.service: Failed with result 'signal'.

vyos build:

Version:          VyOS 1.4-rolling-202112150318
Release train:    sagitta

Built by:         [email protected]
Built on:         Wed 15 Dec 2021 03:18 UTC
Build UUID:       8549e513-cc55-41e0-afdc-b7aba3eb4a23
Build commit ID:  30422e3042965d

Architecture:     x86_64
Boot via:         installed image
System type:      VMware guest

Hardware vendor:  VMware, Inc.
Hardware model:   VMware Virtual Platform
Hardware S/N:     VMware-56 4d 85 75 f1 17 b8 96-30 ef 52 9e fb 38 fe 3a
Hardware UUID:    75854d56-17f1-96b8-30ef-529efb38fe3a

Copyright:        VyOS maintainers and contributors

Interface Configuration

set interfaces ethernet eth0 hw-id '00:0c:29:38:fe:3a'
set interfaces ethernet eth0 vif 2 address 'dhcp'
set interfaces ethernet eth0 vif 99 address 'x.x.x.x/x'
set interfaces ethernet eth0 vif 99 description 'WAN-x.x.x.x/x'
set interfaces ethernet eth1 hw-id '00:0c:29:38:fe:44'
set interfaces ethernet eth1 vif 10 address 'x.x.x.x/x'
set interfaces ethernet eth1 vif 10 description 'Mgmt-x.x.x.x/x'
set interfaces ethernet eth1 vif 15 address 'x.x.x.x/x'
set interfaces ethernet eth1 vif 15 description 'Home-x.x.x.x/x'
set interfaces ethernet eth1 vif 20 address 'x.x.x.x/x'
set interfaces ethernet eth1 vif 20 description 'Storage-A-x.x.x.x/x'
set interfaces ethernet eth1 vif 21 address 'x.x.x.x/x'
set interfaces ethernet eth1 vif 21 description 'Storage-B-x.x.x.x/x'
set interfaces ethernet eth2 hw-id '00:0c:29:38:fe:4e'
set interfaces ethernet eth2 vif 98 address 'x.x.x.x/x'
set interfaces ethernet eth2 vif 98 description 'DMZ-x.x.x.x/x'
set interfaces ethernet eth2 vif 100 address 'x.x.x.x/x'
set interfaces ethernet eth2 vif 100 description 'GameVM - x.x.x.x/x'

flow-accounting configuration

set system flow-accounting buffer-size '256'
set system flow-accounting interface 'eth0.2'
set system flow-accounting netflow engine-id '100'
set system flow-accounting netflow max-flows '640000'
set system flow-accounting netflow sampling-rate '1000'
set system flow-accounting netflow server x.x.x.x port '2055'
set system flow-accounting netflow source-ip 'x.x.x.x'
set system flow-accounting netflow timeout expiry-interval '30'
set system flow-accounting netflow timeout flow-generic '3600'
set system flow-accounting netflow timeout icmp '300'
set system flow-accounting netflow timeout max-active-life '604800'
set system flow-accounting netflow timeout tcp-fin '300'
set system flow-accounting netflow timeout tcp-generic '3600'
set system flow-accounting netflow timeout tcp-rst '120'
set system flow-accounting netflow timeout udp '300'
set system flow-accounting netflow version '5'

/boot/rw/etc/pmacct/uacctd.conf

# Genereated from VyOS configuration    <<<<<<<<<<<<<<< btw, you have a typo here "GenerEated"
daemonize: true
promisc: false
pidfile: /var/run/uacctd.pid
uacctd_group: 2
uacctd_nl_size: 2097152
snaplen: 128
aggregate: in_iface,src_mac,dst_mac,vlan,src_host,dst_host,src_port,dst_port,proto,tos,flows
plugin_pipe_size: 268435456
plugin_buffer_size: 268435
imt_path: /tmp/uacctd.pipe
imt_mem_pools_number: 169
plugins: nfprobe[nf_x.x.x.x],memory
nfprobe_receiver[nf_x.x.x.x]: x.x.x.x:2055
nfprobe_version[nf_x.x.x.x]: 5
nfprobe_engine[nf_x.x.x.x]: 100:0
nfprobe_maxflows[nf_x.x.x.x]: 640000
sampling_rate[nf_x.x.x.x]: 1000
nfprobe_source_ip[nf_x.x.x.x]: 10.0.10.254
nfprobe_timeouts[nf_x.x.x.x]: expint=30:general=3600:icmp=300:maxlife=604800:tcp.fin=300:tcp=3600:tcp.rst=120:udp=3000

‘Hardware’ Config

VMware VM, 4 vCPU, 2GB ram, 16GB+ SSD, vmxnet3 adapters.

vNIC info

13:00.0 Ethernet controller: VMware VMXNET3 Ethernet Controller (rev 01)
        DeviceName: Ethernet1
        Subsystem: VMware VMXNET3 Ethernet Controller
        Physical Slot: 224
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0, Cache Line Size: 32 bytes
        Interrupt: pin A routed to IRQ 16
        Region 0: Memory at fd2fc000 (32-bit, non-prefetchable) [size=4K]
        Region 1: Memory at fd2fd000 (32-bit, non-prefetchable) [size=4K]
        Region 2: Memory at fd2fe000 (32-bit, non-prefetchable) [size=8K]
        Region 3: I/O ports at 6000 [size=16]
        Expansion ROM at fd200000 [virtual] [disabled] [size=64K]
        Capabilities: <access denied>
        Kernel driver in use: vmxnet3
        Kernel modules: vmxnet3

Storage space

Filesystem      Size  Used Avail Use% Mounted on
overlay          16G  1.3G   14G   9% /

Memory:

MemTotal:        2041776 kB
MemFree:          988476 kB
MemAvailable:    1102816 kB
Buffers:           13812 kB
Cached:           321508 kB
SwapCached:            0 kB
Active:           120112 kB
Inactive:         378636 kB
Active(anon):       2160 kB
Inactive(anon):   168132 kB
Active(file):     117952 kB
Inactive(file):   210504 kB
Unevictable:       10700 kB
Mlocked:           10700 kB
SwapTotal:             0 kB
SwapFree:              0 kB
Dirty:               468 kB
Writeback:             0 kB
AnonPages:        173432 kB
Mapped:            74024 kB
Shmem:              2868 kB
KReclaimable:      82924 kB
Slab:             152656 kB
SReclaimable:      82924 kB
SUnreclaim:        69732 kB
KernelStack:        4064 kB
PageTables:         3008 kB
NFS_Unstable:          0 kB
Bounce:                0 kB
WritebackTmp:          0 kB
CommitLimit:     1020888 kB
Committed_AS:     530000 kB
VmallocTotal:   34359738367 kB
VmallocUsed:       13048 kB
VmallocChunk:          0 kB
Percpu:             4544 kB
HardwareCorrupted:     0 kB
AnonHugePages:     53248 kB
ShmemHugePages:        0 kB
ShmemPmdMapped:        0 kB
FileHugePages:         0 kB
FilePmdMapped:         0 kB
DirectMap4k:      100224 kB
DirectMap2M:     1996800 kB

Troubleshooting:

  • Tried with different vifs on different interfaces with same result
  • Tried changing all the values, buffer sizes, samples sizes, disable-imt - nothing helped
  • Rebooted the router, reinstalled fresh, tried manually restarting the services (if this were a physical server, I’d have probably thrown holy water at it, too). :slight_smile:
  • Cleared config, started fresh - no change.
  • Checked the forums for similar errors - found a bug with VRFs but that particular individual was able to start netflow for sub interfaces that were not associated with VRF (and i don’t have VRFs set). Considering finding out which version he was running on and rolling back - but then again, I’d prefer to help fix a bug if this is one and run on the latest build if possible.
  • Yes, I updated to the latest and greatest via add system image VyOS Community - but no cigar :frowning:

I hope this info helps a bit - and thank you for helping out, it is much appreciated.

Kind regards,
Mate

Hello, @gustisok!

Thank you for such detailed feedback. Can I ask you to provide a config that triggers the problem without masking any of the elements? Also as the ip -s a output. Something that we can just load to the freshly installed system and reproduce.
I tried to do this with a configuration filled with fake data and was not see any errors.

Actually, this seems to be very strange, because uacctd does not interact with interfaces directly and should not be affected by any changes there. Maybe it just tries to bind to an invalid interface to send flows.

Hi @zsdc,

Thank you for such a prompt response.

Configuration:

set interfaces ethernet eth0 hw-id '00:0c:29:38:fe:3a'
set interfaces ethernet eth0 vif 2 address 'dhcp'
set interfaces ethernet eth0 vif 99 address '10.0.99.254/24'
set interfaces ethernet eth0 vif 99 description 'WAN-10.0.99.0/24'
set interfaces ethernet eth1 hw-id '00:0c:29:38:fe:44'
set interfaces ethernet eth1 vif 10 address '10.0.10.254/24'
set interfaces ethernet eth1 vif 10 description 'Mgmt-10.0.10.0/24'
set interfaces ethernet eth1 vif 15 address '10.0.15.254/24'
set interfaces ethernet eth1 vif 15 description 'Home-10.0.15.0/24'
set interfaces ethernet eth1 vif 20 address '10.0.20.254/24'
set interfaces ethernet eth1 vif 20 description 'Storage-A-10.0.20.0/24'
set interfaces ethernet eth1 vif 21 address '10.0.21.254/24'
set interfaces ethernet eth1 vif 21 description 'Storage-B-10.0.21.0/24'
set interfaces ethernet eth2 hw-id '00:0c:29:38:fe:4e'
set interfaces ethernet eth2 vif 98 address '10.0.98.254/24'
set interfaces ethernet eth2 vif 98 description 'DMZ-10.0.98.0/24'
set interfaces ethernet eth2 vif 100 address '10.0.100.254/24'
set interfaces ethernet eth2 vif 100 description 'GameVM - 10.0.100.0/24'
set interfaces loopback lo
set nat destination rule 100 destination port '3395'
set nat destination rule 100 inbound-interface 'eth0.2'
set nat destination rule 100 protocol 'tcp'
set nat destination rule 100 translation address '10.0.100.10'
set nat destination rule 110 destination port '30000'
set nat destination rule 110 inbound-interface 'eth0.2'
set nat destination rule 110 protocol 'tcp'
set nat destination rule 110 translation address '10.0.100.10'
set nat destination rule 115 destination port '25595'
set nat destination rule 115 inbound-interface 'eth0.2'
set nat destination rule 115 protocol 'tcp'
set nat destination rule 115 translation address '10.0.100.10'
set nat destination rule 200 destination port '65022'
set nat destination rule 200 inbound-interface 'eth0.2'
set nat destination rule 200 protocol 'tcp'
set nat destination rule 200 translation address '10.0.10.5'
set nat destination rule 200 translation port '22'
set nat destination rule 500 destination port '30001'
set nat destination rule 500 inbound-interface 'eth0.2'
set nat destination rule 500 protocol 'tcp'
set nat destination rule 500 translation address '10.0.100.20'
set nat destination rule 500 translation port '30000'
set nat destination rule 501 destination port '30002'
set nat destination rule 501 inbound-interface 'eth0.2'
set nat destination rule 501 protocol 'tcp'
set nat destination rule 501 translation address '10.0.100.20'
set nat destination rule 501 translation port '25595'
set nat source rule 10 outbound-interface 'eth0.2'
set nat source rule 10 source address '10.0.10.0/24'
set nat source rule 10 translation address 'masquerade'
set nat source rule 15 outbound-interface 'eth0.2'
set nat source rule 15 source address '10.0.15.0/24'
set nat source rule 15 translation address 'masquerade'
set nat source rule 98 outbound-interface 'eth0.2'
set nat source rule 98 source address '10.0.98.0/24'
set nat source rule 98 translation address 'masquerade'
set nat source rule 99 outbound-interface 'eth0.2'
set nat source rule 99 source address '10.0.99.0/24'
set nat source rule 99 translation address 'masquerade'
set nat source rule 100 outbound-interface 'eth0.2'
set nat source rule 100 source address '10.0.100.0/24'
set nat source rule 100 translation address 'masquerade'
set protocols static
set service dhcp-relay interface 'eth2.98'
set service dhcp-relay interface 'eth1.10'
set service dhcp-relay interface 'eth1.15'
set service dhcp-relay interface 'eth2.100'
set service dhcp-relay interface 'eth1.20'
set service dhcp-relay interface 'eth1.21'
set service dhcp-relay relay-options max-size '1400'
set service dhcp-relay server '10.0.10.1'
set service dns forwarding allow-from '192.168.1.0/24'
set service dns forwarding domain testing.lan server '10.0.10.1'
set service dns forwarding listen-address '192.168.1.254'
set service snmp community testingsnmpro authorization 'ro'
set service snmp community testingsnmpro client '10.0.10.8'
set service snmp community testingsnmpro client '10.0.10.252'
set service snmp community testsnmpro client '10.0.10.252'
set service snmp community testsnmpro client '10.0.10.8'
set service snmp listen-address 10.0.10.254
set service snmp protocol 'udp'
set service ssh listen-address '192.168.1.254'
set service ssh listen-address '10.0.10.254'
set system config-management commit-revisions '100'
set system conntrack modules ftp
set system conntrack modules h323
set system conntrack modules nfs
set system conntrack modules pptp
set system conntrack modules sip
set system conntrack modules sqlnet
set system conntrack modules tftp
set system console device ttyS0 speed '115200'
set system flow-accounting buffer-size '256'
set system flow-accounting interface 'eth0.2'
set system flow-accounting netflow engine-id '100'
set system flow-accounting netflow max-flows '640000'
set system flow-accounting netflow sampling-rate '1000'
set system flow-accounting netflow server 10.0.10.12 port '2055'
set system flow-accounting netflow source-ip '10.0.10.254'
set system flow-accounting netflow timeout expiry-interval '30'
set system flow-accounting netflow timeout flow-generic '3600'
set system flow-accounting netflow timeout icmp '300'
set system flow-accounting netflow timeout max-active-life '604800'
set system flow-accounting netflow timeout tcp-fin '300'
set system flow-accounting netflow timeout tcp-generic '3600'
set system flow-accounting netflow timeout tcp-rst '120'
set system flow-accounting netflow timeout udp '300'
set system flow-accounting netflow version '5'
set system host-name 'test-rtr.testing.lan'
set system login user testuser authentication plaintext-password testpassword123
set system name-server '10.0.10.1'
set system name-server '10.0.10.2'
set system ntp allow-clients address '10.0.0.0/16'
set system ntp server time1.vyos.net
set system ntp server time2.vyos.net
set system ntp server time3.vyos.net
set system syslog global facility all level 'info'
set system syslog global facility protocols level 'debug'

Please note I’ve made minor changes to the original configuration before posting here:

  • test username/password has been set
  • hostname has been changed
  • domain name has been changed
  • snmp community names have been changed
  • pki ca, cert and dh private bits have been removed
  • flow-accounting service crash can still be reproduced
  • everything else has been left as is

Please let me know if you would like me to export the whole VM as an image and share it with you - these are rather small and it might help :slight_smile:

Kind regards,
Mate

Just to add a bit of clarity around 192.168.1.0/24 subnet in the config that is referred at some points (dns forwarder, ssh, etc).
That subnet was previously used on this vyos router for testing WAN access before it got attached to the ISP modem directly. That means there used to be another router upstream connected to that ISP modem which previously provided a static DhCP lease with ip 192.168.1.254 to this vyos router via eth0.2.

I have just gone and removed those bits of config referring 192.168.1.x from the vyos router, and the uacctd service still crashes, so it doesn’t seem to be related, as can be seen from logs a few moments ago when I committed the changes of removing those bits.

[email protected]# grep -B10 -A1 uacctd /var/log/messages
Dec 16 14:06:36 test-rtr systemd[1]: Started ulog accounting daemon.
Dec 16 14:06:36 test-rtr systemd[2746]: opt-vyatta-config-tmp-new_config_27515.mount: Succeeded.
Dec 16 14:06:36 test-rtr systemd[27297]: opt-vyatta-config-tmp-new_config_27515.mount: Succeeded.
Dec 16 14:06:36 test-rtr systemd[1]: opt-vyatta-config-tmp-new_config_27515.mount: Succeeded.
Dec 16 14:06:37 test-rtr commit: Successful change to active configuration by user testuser on /dev/pts/2
Dec 16 14:06:44 test-rtr systemd[1]: [email protected]: Succeeded.
Dec 16 14:06:45 test-rtr systemd[1]: [email protected]: Scheduled restart job, restart counter is at 5755.
Dec 16 14:06:45 test-rtr systemd[1]: Stopped Serial Getty on ttyS0.
Dec 16 14:06:45 test-rtr systemd[1]: Started Serial Getty on ttyS0.
Dec 16 14:06:45 test-rtr agetty[29325]: /dev/ttyS0: not a tty
Dec 16 14:06:49 test-rtr kernel: [59094.858229] uacctd[29269]: segfault at 6 ip 00007f15e9dd277e sp 00007ffdc483dd58 error 4 in libc-2.31.so[7f15e9d50000+14b000]
Dec 16 14:06:49 test-rtr kernel: [59094.858240] Code: 4c 8d 0c 16 4c 39 cf 0f 82 63 01 00 00 48 89 d1 f3 a4 c3 80 fa 08 73 12 80 fa 04 73 1e 80 fa 01 77 26 72 05 0f b6 0e 88 0f c3 <48> 8b 4c 16 f8 48 8b 36 48 89 4c 17 f8 48 89 37 c3 8b 4c 16 fc 8b
Dec 16 14:06:49 test-rtr systemd[1]: uacctd.service: Main process exited, code=killed, status=11/SEGV
Dec 16 14:06:49 test-rtr systemd[1]: uacctd.service: Failed with result 'signal'.
Dec 16 14:06:55 test-rtr systemd[1]: [email protected]: Succeeded.

[edit]
[email protected]# show system flow-accounting interface
 interface eth0.2

[edit]
[email protected]# show service dns
Configuration under specified path is empty

[edit]
[email protected]# show service ssh
 listen-address 10.0.10.254

The VM image would be very helpful because I cannot reproduce the problem even with the full config.

Where would you like me to upload it / how would you like me to share it with you?
It will take me a few minutes to export it. :slight_smile:

Cheers,
Mate

Google Drive, One Drive, etc. - where this will be comfortable for you. You may share it via message here or by sending an email to [email protected].

Hi @zsdc ,

Download link and access password have been provided via support e-mail under the same subject as this thread.
Thank you for taking your time to look into this peculiar issue, it is truly appreciated.

Kind regards,
Mate

Thanks for the VM!
I tried to reproduce the problem multiple times with different VM configs, but every time everything works as should be.
Most likely, the problem should be related to the environment somehow. I can assume now only possible memory-related issues in a hypervisor. For example - enabled memory sharing or dynamic memory allocation.

Wow, that was a quick response - thank you for your prompt feedback!

With regards to memory point:

Balooning and other memory page sharing, compression, etc are not used according to esxtop.
There is plenty of physical memory available and there is no memory overcommitment reported as can be seen below (please note I have removed all other VMs from the below list).

 8:13:43pm up 9 days  6:07, 993 worlds, 14 VMs, 43 vCPUs; **MEM overcommit avg: 0.00, 0.00, 0.00**
PMEM  /MB: 196595   total: 2552     vmk,126306 other, 67735 free
VMKMEM/MB: 196209 managed:  2576 minfree, 41796 rsvd, 154412 ursvd,  high state
NUMA  /MB: 98304 (17483), 98289 (49868)
PSHARE/MB:      0  shared,      0  common:       0 saving
SWAP  /MB:       0    curr,       0 rclmtgt:                 0.00 r/s,   0.00 w/s
ZIP   /MB:       0  zipped,       0   saved
MEMCTL/MB:       0    curr,       0  target,   87936 max

     GID NAME               MEMSZ    GRANT     CNSM    SZTGT     TCHD   TCHD_W    SWCUR    SWTGT   SWR/s   SWW/s  LLSWR/s  LLSWW/s   OVHDUW     OVHD  OVHDMAX
  128300 vyos             2119.86  2059.57  2048.00  2082.56   229.23    43.55     0.00     0.00    0.00    0.00     0.00     0.00     0.00    33.41    59.17

Considering this is a virtualized router, in order to ensure lowest latency VM’s memory has been fully reserved upfront - but it made no difference in behavior. I have also tried enabling/disabling latency sensitivity feature, but it did not help either - so guess this does not appear to be related to memory allocation.

vyosvm

While there is slight vCPU overcommitment reported by allocation, most cores are idling and CSTOP value is 0.00 across all processes. Based on my experience, reserving 4vCPU for the VM would likely make no difference under the circumstances.
I would have attached the screenshot, but apparently I can only attach one object.

VM is configured to utilize cores on a single NUMA node - thus VM has been configured with 1 socket with 4 cores.

Additional checks/troubleshooting done

  • Host logs are clear and so are its physical NIC counters.
    vyos own nic counters also showin no errors.
[email protected]# ethtool -S eth0 | grep -E "drop|fail|err|crc"
       pkts tx err: 0
       drv dropped tx total: 0
          hdr err: 0
       pkts tx err: 0
       drv dropped tx total: 0
          hdr err: 0
       pkts tx err: 0
       drv dropped tx total: 0
          hdr err: 0
       pkts tx err: 0
       drv dropped tx total: 0
          hdr err: 0
       pkts rx err: 0
       drv dropped rx total: 0
          err: 0
       rx buf alloc fail: 0
       pkts rx err: 0
       drv dropped rx total: 0
          err: 0
       rx buf alloc fail: 0
       pkts rx err: 0
       drv dropped rx total: 0
          err: 0
       rx buf alloc fail: 0
       pkts rx err: 0
       drv dropped rx total: 0
          err: 0
       rx buf alloc fail: 0
  • RSS on physical NIC is disabled - so there is only one queue, and hardware NIC LRO offloading has been turned off temporarily - but no change in flow accounting behavior has been observed (so I re-enabled it).
    vsish output for RSS queues:
    /net/pNics/> cat /net/pNics/vmnic0/rxqueues/queueCount
    1
    /net/pNics/> cat /net/pNics/vmnic1/rxqueues/queueCount
    1
    /net/pNics/> cat /net/pNics/vmnic2/rxqueues/queueCount
    1
    /net/pNics/> cat /net/pNics/vmnic3/rxqueues/queueCount
    1

  • Using non-trunked port groups to attach vnics, e.g. having one VLAN per interface (avoiding subinterfaces), doesn’t appear to be triggering trigger the uacctd service crash.

  • I have tried using port groups with both ephemereal binding and static binding on the distributed switch - but both triggered the issue when enabling flow accounting on the sub-interface. Unfortunately I don’t have a standard switch configured on the ESXi host to test this on (at the moment at least - will see if I can do something about it)

Next step:

  • I will see if I can temporarily deploy another ESXi host and try to move this VM to a different physical host to see if there’s a change in behavior with different physical NIC/driver/firmware, but so far I have had no issues with anything else on this server so it might be a long shot attempt.

Once again, thank you for helping out - if you have any other ideas you’d like me to try, please do let me know.

Kind regards,
Mate

How is the progress? Were you able to run flow accounting in another hypervisor?

Meanwhile, I can suggest starting the uacctd daemon with any type of interface that works and wait. If it will run without problem, try to add any VLAN manually and see what happens:

sudo nft add rule ip raw VYATTA_CT_PREROUTING_HOOK iifname eth0.2 counter log snaplen 128 queue-threshold 100 group 2
sudo nft add rule ip6 raw VYATTA_CT_PREROUTING_HOOK iifname eth0.2 counter log snaplen 128 queue-threshold 100 group 2

I am almost sure that the problem is not in the interface type. Maybe, the daemon receives a magic packet that crashes it. Or it cannot match packets to incoming VLANs, but in this case, the problem will be reproducible in our lab too.

I am still waiting for hardware to become available for the new hypervisor for testing. It is holiday season so its hard to get it sorted in a timely manner - but I expect to have it deployed and ready for testing either at some point during next week or the week after.

I will most certainly update you with testing results.