VyOS HTTP API becomes unresponsive (504 Gateway Timeout) during concurrent requests – need multi-threaded/concurrency support

Version: VyOS 1.5 Circinus (2025.11 – build Tue 11 Nov 2025)

Issue Summary When multiple configuration requests (POST to /configure, /config-file, /commit, etc.) are sent simultaneously, the API server becomes blocked and nginx returns HTTP 504 Gateway Timeout. Read-only op-mode calls work fine in parallel, but any request that touches the configuration session locks the entire API process.

Example error (from creating a user in parallel with other ops):

text

client error: unable to create '[system login user vyosadmin]', got error: http error [504 Gateway Time-out]: <html>
<head><title>504 Gateway Time-out</title></head>
<body>
<center><h1>504 Gateway Time-out</h1></center>
<hr><center>nginx/1.22.1</center>
</body>
</html>

Root Cause (observed) The vyos-http-api-server (Flask + uWSGI) handles configuration endpoints in a blocking/single-threaded manner. Each request acquires a global lock on the running config (vyos.config), so only one configuration operation can run at a time. Concurrent requests queue behind nginx and eventually time out.

Impact This severely limits automation tools that run in parallel by default:

  • Ansible (multiple plays/tasks)
  • Terraform (parallel resource creation)
  • Custom scripts / orchestration

Related fixed issue T6069 fixed segfaults/crashes under concurrent load, but the intentional blocking/locking behavior remains unchanged.

Feature Request Make the HTTP API properly support concurrent configuration requests without 504 timeouts. Possible approaches:

  1. Run the API with multiple worker processes/threads (e.g., uWSGI --threads/–processes, Gunicorn workers).
  2. Implement fine-grained locking (only lock during actual commit/save, allow parallel session setup).
  3. Add a queuing mechanism or async handling for configuration sessions.
  4. Provide official support for running multiple API workers behind a load balancer (with shared config lock).

This is a frequently requested capability for scalable/automated deployments.

Thank you!