Very slow commit performance on VyOS with large BGP configuration and VRF route-leak

Hello,

I am testing VyOS and recently migrated my configuration from Debian 12 + FRR to VyOS.

My current configuration is about 1600 commands and includes:

  • 51 route-map
  • 51 prefix-list
  • 9 as-path-list
  • 2 community-list
  • 8 extcommunity-list
  • 14 BGP sessions in total
    • 3 full-view IPv4 peers
    • 3 full-view IPv6 peers
    • the rest are regular peering sessions
  • 2 VRFs:
    • the default VRF, which receives the full routing table
    • another VRF used for peerings

The problem is that commit takes a very long time, around 35 seconds.

During this time, the following processes are active and keep CPU usage at 100%:

  • unionfs-fuse
  • my_commit
  • cli-shell-api
  • python3

After I configured route leaking between the two VRFs, the commit time increased to about 90 seconds.
At that point, bgpd and zebra also started consuming CPU at 100% for a long time.
What is interesting is that even creating a simple prefix-list that does not affect any BGP sessions, followed by a commit, still takes around 90 seconds.
For comparison, I have the same configuration running on plain FRRouting on Debian 12, and I do not see this problem there. On FRR, creating a prefix-list takes less than one second.

  1. Why is commit so slow on VyOS with this configuration?
  2. Is this expected behavior?
  3. Is there any way to improve or fix this?

Version: VyOS 2026.02
Build commit ID: e4c4eddad9b984
Hardware: Dell R730 + 2 x Intel E5-2697A v4

Known issue that was reported for about 3 years ago.

Hopefully someone from the VyOS team can enlighten us on the progress here?

As I traced this last time I dug into this the main issue seems to be that each and every line is verified on its own through python with like subcall onto subcall etc. This makes both boot and commit times riddicilous large compared to what they could have been.

For example when an IPv4-address is about to be verified then /bin/sh is started each time and the value is fed to ipaddrcheck who then returns an error code if its ok or not:

And since you got IPv4 (and IPv6) addresses basically everywhere when you do static routing, configure dynamic routing through BGP etc or just firewall rules this means that the above validator is fired gazilion times on its own just to verify if 1.2.3.4 is an IPv4-address or not.

That is optimizing python on its own by running a different flavour or having it started as a TCP/pipe process to process the commands (to get rid of the overhead of starting python each line) will shrink the total time but Im guessing a total redesign needs to be made to avoid this subcall onto subcall to begin with.

For example if you take the very same config and just manually inject it into a text file loaded by FRR on the same installation this load will go in like below 1 second while doing it the regular way through the VyOS config the same will take +300 seconds.

This not only applies to FRR related stuff but also if you got large/many firewall rules. That is a large ruleset will take several minutes to boot or commit, but having the same ruleset in a static file and load it through the nft-command it gets loaded within a second or two.

I tried Grok to see what if any a hallucinating LLM could bring to this topic.

A workaround would be to create a post-hooks script such as:

/config/scripts/commit/post-hooks.d/50-bulk-routes.sh

Which contains something like:

#!/bin/bash
# Generate full FRR config (or just static section) from your source of truth
cat > /run/frr/config/frr.conf << EOF
! bulk static routes
$(your_script_to_generate_1000_ip_route_lines)
EOF
# Or use vtysh batch / frr-reload
systemctl reload frr   # or /usr/lib/frr/frr-reload.py --reload /run/frr/config/frr.conf
# Same for firewall:
# your_script > /etc/nftables/ruleset.nft && nft -f /etc/nftables/ruleset.nft

Basically removing static and dynamic routing and firewalling from the VyOS config itself and instead offload that to its own config-files being loaded by a post-hooks script.

Im guessing a plausible method would be to use your current config and then dump firewall rules using nft into nft.conf and look at generated frr.conf to have a baseline for the above script.

Then remove all this from the VyOS config and just manually alter ntf.conf and frr.conf after that to add/remove stuff.

The obvious drawback with above is lack of input control so what will happen when the nft.conf or frr.conf fails to load completely?

Another workaround was to disable the validators:

find /opt/vyatta/share/vyatta-cfg/templates -name node.def -exec sed -i 's/^syntax:expression:/#syntax:expression:/g' {} +
find /opt/vyatta/share/vyatta-op/templates -name node.def -exec sed -i 's/^syntax:expression:/#syntax:expression:/g' {} +

But then I would rather prefer the first suggestion (the first workaround above makes it more obvious that you are no longer “protected” by the vyos-configd).

Other than above workarounds it suggested following (dunno if its sane or not):

Edit src/conf_mode/protocols_static.py (and firewall.py):

* In generate(): Build the full FRR section (or nft ruleset) once using efficient list comprehensions + '\n'.join() or Jinja2 (already used elsewhere in data/templates). Avoid any per-route string +=.

* In apply(): Replace any per-route vtysh/staticd calls or incremental adds with:

# Example for static (adapt from existing FRR helpers)
write_config_file(full_frr_conf)
run(["/usr/lib/frr/frr-reload.py", "--reload", config_file])  # or systemctl reload frr / frr-reload

* Same pattern for firewall: single nft -f on the generated ruleset.

* Rebuild/install the vyos-1x package (or test in a container). This eliminates the per-item apply overhead.

The apply phase in protocols_static.py already has some cost even with few routes — batching removes the scaling problem.

* For boot: Enable squashfs optimizations in kernel config (CONFIG_SQUASHFS_FILE_DIRECT, larger fragment cache) — small win but easy.

* Consider overlayfs instead of unionfs-fuse for config sessions (advanced, discussed in T5388).

And finally as already mentioned :slight_smile:

Long-term: The VyOS project needs to finish the config backend refactor (T6209) and enforce batching in all high-volume conf_mode scripts. You can contribute a PR against protocols_static.py + firewall.py + vyos-configd handling — the manual FRR/nft timings prove it's achievable in 1–2s.