NAT + conntrack-sync = No buffer space available

Hi all!
We’re facing pretty interesting behavior when NAT and conntrack-sync enabled together. When we do some benchmark tests (trex astf/video_stream.py test with TCP traffic) we see lots of such errors in logs:

Aug 08 20:12:54 conntrack-tools[2508]: failed to receive message: No buffer space available
Aug 08 20:12:54 conntrackd[2508]: [Tue Aug  8 20:12:54 2023] (pid=2508) [ERROR] failed to receive message: No buffer space available
Aug 08 20:12:54 conntrack-tools[2508]: failed to receive message: No buffer space available
Aug 08 20:12:54 conntrackd[2508]: [Tue Aug  8 20:12:54 2023] (pid=2508) [ERROR] failed to receive message: No buffer space available
Aug 08 20:12:54 conntrack-tools[2508]: failed to receive message: No buffer space available

Our topology is pretty simple: TREX TX ----> eth3.3807 VYOS eth4.3808 -----> TREX RX
Vyos and TREX are virtualized and hosted on different VmWare ESXI hosts.
Error messages are not there when NAT OR Conntrack-sync turned off.
Conntrack parameters:

set service conntrack-sync accept-protocol 'tcp'
set service conntrack-sync accept-protocol 'udp'
set service conntrack-sync accept-protocol 'icmp'
set service conntrack-sync disable-external-cache
set service conntrack-sync event-listen-queue-size '100000'
set service conntrack-sync expect-sync 'all'
set service conntrack-sync failover-mechanism vrrp sync-group 'MAIN'
set service conntrack-sync interface eth2 peer '10.33.106.4'
set service conntrack-sync sync-queue-size '100000'
set system conntrack expect-table-size '50000000'
set system conntrack hash-size '50000000'
set system conntrack log tcp
set system conntrack modules ftp
set system conntrack modules h323
set system conntrack modules nfs
set system conntrack modules pptp
set system conntrack modules sip
set system conntrack modules sqlnet
set system conntrack modules tftp
set system conntrack table-size '50000000'
set system sysctl parameter net.netfilter.nf_conntrack_buckets value '2097152'

NAT rules are basic:

set nat source rule 10 outbound-interface 'eth4.3808'
set nat source rule 10 source address '16.0.0.0/8'
set nat source rule 10 translation address 'masquerade'

Any thoughts on how it can be troubleshooted more throughly? Maybe some kernel parameters should be corrected?
Our SW is VyOS 1.4-rolling-202307120317
Resources:
4 vCPU
8 GB vRAM
32 GB vHDD

Hi @Kefear,

To address the “No buffer space available” error in the context of NAT and conntrack-sync, you can try these steps:

  1. Increase the conntrack resources to handle the load generated by your benchmark tests. Consider increasing the values for your ‘expect-table-size’, ‘table-size’, and ‘hash-size’.

  2. Monitor memory usage: The “No buffer space available” error can sometimes indicate memory exhaustion. If memory is getting depleted, you might need to allocate more memory to your VyOS instance.

  3. Kernel Parameters: You can experiment with increasing specific kernel parameters related to connection tracking, such as net.netfilter.nf_conntrack_max and net.core.netdev_max_backlog.

I hope this helps!

Cheers :beer:,
Joe

Things to try:

  1. Update to latest 1.4-rolling (to rule things out but also since lates 1.4-rolling contains latest linux kernel).

  2. Disable conntrack modules unless you really use them (if so only keep the ones you actually use):

delete system conntrack modules
  1. Remove that custom sysctl since that will conflict with “set system conntrack hash-size”.

  2. Try one or more of these values and see if the error returns:

set firewall state-policy established action 'accept'
set firewall state-policy invalid action 'drop'
set firewall state-policy related action 'accept'
set firewall syn-cookies 'enable'

set interfaces ethernet ethX offload gro
set interfaces ethernet ethX offload gso
set interfaces ethernet ethX offload lro
set interfaces ethernet ethX offload rfs
set interfaces ethernet ethX offload rps
set interfaces ethernet ethX offload sg
set interfaces ethernet ethX offload tso
set interfaces ethernet ethX ring-buffer rx '4096'
set interfaces ethernet ethX ring-buffer tx '4096'

set system conntrack expect-table-size '10485760'
set system conntrack hash-size '10485760'
set system conntrack log icmp new
set system conntrack log other new
set system conntrack log tcp new
set system conntrack log udp new
set system conntrack table-size '10485760'
set system conntrack timeout icmp '10'
set system conntrack timeout other '600'
set system conntrack timeout tcp close '10'
set system conntrack timeout tcp close-wait '30'
set system conntrack timeout tcp established '600'
set system conntrack timeout tcp fin-wait '30'
set system conntrack timeout tcp last-ack '30'
set system conntrack timeout tcp syn-recv '30'
set system conntrack timeout tcp syn-sent '30'
set system conntrack timeout tcp time-wait '30'
set system conntrack timeout udp other '600'
set system conntrack timeout udp stream '600'

set system option performance 'throughput'
  1. After how long uptime do you get the “no buffer space available”?

@Apachez @JoeN , thanks for your input!
Unfortunately, i’ve already tried it all, but with no result.
My SW image is VyOS 1.4-rolling-202307120317. Looks like a fresh one, but i’ll definitely try the newest a bit later :slight_smile:
My HW resource is:
4 vCPU
16 GB vRAM
32 GB HDD
Should be fine
As for the test, we use this one:

From the traffic perspective, it’s TCP traffic with a 19-20 CPs and 5.5 Kpps in total. We see a problem immediately after the test start. If i turn either NAT or conntrack-sync off, errors are not there and bandwith is higher up to 5-7 times (300 Mbps to 3.5 Gbps)
Are there any means of how to troubleshoot conntrackd process events?

Config looks like:

set firewall all-ping 'enable'
set firewall broadcast-ping 'disable'
set firewall config-trap 'disable'
set firewall interface eth3.3807 in name 'INT-T1-IN'
set firewall interface eth4.3808 in name 'INT-T2-IN'
set firewall ip-src-route 'disable'
set firewall ipv6-receive-redirects 'disable'
set firewall ipv6-src-route 'disable'
set firewall log-martians 'enable'
set firewall name INT-T1-IN default-action 'drop'
set firewall name INT-T1-IN rule 1 action 'accept'
set firewall name INT-T2-IN default-action 'drop'
set firewall name INT-T2-IN rule 1 action 'accept'
set firewall receive-redirects 'disable'
set firewall send-redirects 'enable'
set firewall source-validation 'disable'
set firewall state-policy established action 'accept'
set firewall state-policy established log
set firewall state-policy invalid action 'drop'
set firewall state-policy invalid log enable
set firewall state-policy related action 'accept'
set firewall state-policy related log enable
set firewall syn-cookies 'enable'
set firewall twa-hazards-protection 'disable'
set interfaces ethernet eth3 hw-id '00:50:56:b3:a4:6b'
set interfaces ethernet eth3 offload gro
set interfaces ethernet eth3 offload gso
set interfaces ethernet eth3 offload lro
set interfaces ethernet eth3 offload rps
set interfaces ethernet eth3 offload sg
set interfaces ethernet eth3 offload tso
set interfaces ethernet eth3 ring-buffer rx '4096'
set interfaces ethernet eth3 ring-buffer tx '4096'
set interfaces ethernet eth3 vif 3807 address '10.33.77.1/24'
set interfaces ethernet eth4 hw-id '00:50:56:b3:08:66'
set interfaces ethernet eth4 offload gro
set interfaces ethernet eth4 offload gso
set interfaces ethernet eth4 offload lro
set interfaces ethernet eth4 offload rps
set interfaces ethernet eth4 offload sg
set interfaces ethernet eth4 offload tso
set interfaces ethernet eth4 ring-buffer rx '4096'
set interfaces ethernet eth4 ring-buffer tx '4096'
set interfaces ethernet eth4 vif 3808 address '10.33.72.1/24'
set interfaces loopback lo
set nat source rule 10 outbound-interface 'eth4.3808'
set nat source rule 10 source address '16.0.0.0/8'
set nat source rule 10 translation address 'masquerade'
set protocols static route 16.0.0.0/8 next-hop 10.33.77.10
set protocols static route 48.0.0.0/8 next-hop 10.33.72.10
set service conntrack-sync accept-protocol 'tcp'
set service conntrack-sync accept-protocol 'udp'
set service conntrack-sync accept-protocol 'icmp'
set service conntrack-sync disable-external-cache
set service conntrack-sync event-listen-queue-size '100000'
set service conntrack-sync expect-sync 'all'
set service conntrack-sync interface eth2 peer '10.33.106.4'
set service conntrack-sync sync-queue-size '100000'
set service ntp allow-client address '0.0.0.0/0'
set service ntp allow-client address '::/0'
set service ntp server time1.vyos.net
set service ntp server time2.vyos.net
set service ntp server time3.vyos.net
set service ssh port '22'
set service ssh vrf 'mgmt'
set system config-management commit-revisions '100'
set system conntrack expect-table-size '10485760'
set system conntrack hash-size '10485760'
set system conntrack log icmp new
set system conntrack log other new
set system conntrack log tcp new
set system conntrack log udp new
set system conntrack modules ftp
set system conntrack modules h323
set system conntrack modules nfs
set system conntrack modules pptp
set system conntrack modules sip
set system conntrack modules sqlnet
set system conntrack modules tftp
set system conntrack table-size '10485760'
set system conntrack timeout icmp '10'
set system conntrack timeout other '600'
set system conntrack timeout tcp close '10'
set system conntrack timeout tcp close-wait '30'
set system conntrack timeout tcp established '600'
set system conntrack timeout tcp fin-wait '30'
set system conntrack timeout tcp last-ack '30'
set system conntrack timeout tcp syn-recv '30'
set system conntrack timeout tcp syn-sent '30'
set system conntrack timeout tcp time-wait '30'
set system conntrack timeout udp other '600'
set system conntrack timeout udp stream '600'
set system console device ttyS0 speed '115200'
set system host-name 'vyos-3-lab'
set system login user vyos authentication plaintext-password ''
set system option performance 'throughput'
set system sysctl parameter net.core.netdev_max_backlog value '10000'
set system sysctl parameter net.netfilter.nf_conntrack_buckets value '2097152'
set system syslog global facility all level 'info'
set system syslog global facility local7 level 'debug'
set vrf bind-to-all
set vrf name mgmt protocols static route 0.0.0.0/0 next-hop 10.16.54.1
set vrf name mgmt table '154

Try using conntrackd-sync over a dedicated interface (just to rule things out). That is an interface that only does conntrackd-sync.

Yep, i’ve already done it. I have a separate SYNC link for that purpose. No success

Yet another interesting observation. Even if i configure NAT with rules not relevant to traffic i’m pushing to the box,“No buffer” messages will still be there. It looks like when conntrack-sync and NAT turned on simultaneously and we send some aggressive (video stream) traffic, it lacks of internal resources. The main question is on the bottle neck in such cases.

Even if traffic isn’t hitting a NAT rule, new sessions will be added to conntrack table.