Slow vbash script execution

Hi, folks.

I’m testing a vbash script I’ve built to create a prefix-list with 1300+ entries, it comes out that running this script, takes around 8 minutes to complete.

Is someone experiencing the same long execution time as well ?

Yes,

The issue is that each line is added line by line to frr (with ALOT of overhead since it will fire up python and all kind of stuff) instead of create a single file and batch it into frr once all lines of the config have been processed.

Similar occurs for firewall rules aswell (but here the target is nft instead of frr).

See this task regarding this issue: ⚓ T5388 Something is fishy with commit and boot times when more than a few hundred static routes are being used

And this for a similar forumthread: Very long time commit

Hi @Apachez, how I could batch all of it into frr ? Do you have any examples of this ?

Theoretically edit /run/frr/config/frr.conf and have the appropriate frr process(es) restarted.

There is also the -tcli (transactional cli) syntax of zebra which frr uses that can be used for batchloading new configuration (without altering the frr.conf file since it will end up there anyway).

Note however if you do such sideloading your custom stuff might get lost next time you reboot the box or for other reasons have a config commit runned since that will regenerate all config.

For that case you could add your modifications to /config/scripts/commit/post-hooks.d directory (as a script).

For more information see Command Scripting — VyOS 1.4.x (sagitta) documentation

Once VyOS maintainers have addressed T5388 you will no longer need to sideload your stuff (as in it then can be part of the vyos config without this added processtime which exists today due to massive overhead).

Forgot to mention, there is also the vtysh binary which can accept a batchfile through -f (or for testing if you want to manually add the config line by line or through copypaste).

Hi @Apachez, btw my issue is not even in the commiting stage yet, but purely adding lines to the prefix-list until its complete, so in this early stage its already slow. Commiting is relatively fast, once all has been built up tho.

Hi @Apachez have you tried parallelism, to see if the processing time is reduced, splitting the execution through multiple cores ?

Yes but when you add stuff through VyOS config-mode there is a massive overhead of pythonscripts being fired etc instead of queue the configchanges, dump them into a single file and then import it (which in many cases also would be atomic) into frr or nft.

What do you mean by parallellism in this case?

The issue is that each line in VyOS config is sent through a large set of python scripts so your routing, prefixlist, firewall etc is built up like if you would trigger the same scripts manually in the cli line by line. So in order to fix task T5388 the way config-mode handles a commit must be rebuilt in the backend by a VyOS maintainer.

Once this is fixed loading a VyOS with 1000 rules, routes, prefixlists or routemaps will go in a few seconds rather than several minutes as it takes right now.

There’s sanity checking happening to help prevent misconfigurations. It’s then building the configuration for the system to work with internally.

Every time you set something, it’s going to check if it meets any constraints that exist for that node. Then when a commit happens, there will be some overall checks happening.

You have to consider that VyOS is essentially a big CLI wrapper for a number of underlying systems that all require configuration in different ways. The consolidation of a common configuration syntax makes things easier, and abstracts those underlying components should they ever need to change, like say switching from Quagga to FRR. The end user can be none the wiser to the underlying change because they see the same configuration. The wrapper handles the specific configuration for each daemon for you.

The system tries to help prevent accidental discontinuation that may otherwise break one of those underlying daemons by doing some of those sanity checks.

Bulk loading of text in a CLI has always been difficult no matter the product because there’s always some overhead of even just handling the screen echos after each line. I remember trying to load large iptables rules back before ipset existed, and dang did it take a while. The addition of ipset greatly increased load speeds for certain rule sets.

The maintainers might be able to introduce some optimizations, but I think there will always be a lag when trying to bulk “type” a number of configuration lines.

The problem seems to be that for EVERY line in the VyOS-config there is this flow of:

start python
compile stuff
run scripts
start more pythons
compile more stuff
run further scripts
return to previous script
do stuff
exit

And with for example 1000 lines of firewall, static routes, prefixlists or routemaps thats 8000 events that needs to complete (just as an example).

Instead of doing it like so:

start python
compile stuff
run scripts
start more pythons
compile more stuff
run further scripts
return to previous script
do stuff
inner loop of the current section
load batch into frr or nft
exit

and go from 8000 events down to 10 events for the same work.

The later edition can still perform the needed checks (since the config cannot be trusted since the user can just have replaced the /config/config.boot file) but without the current massive overhead of start python, do stuff, stop python, next line, start python, do stuff, stop python repeat.

That’s because there are different checks happening at different times; some when a set command is sent, and different checks when a commit is initiated. One could argue that mitigating an error when someone defines a configuration with set is more desirable than doing all the checking at the end and outputting potentially many errors.

The current scripting environment is essentially just the same as copying/pasting all the commands in a SSH session.

I get that you want to check this when typing the line in config-mode.

But why also do this when the config is commited or loaded during boot?

Its just wrong that it should take 10 minutes or more for a VyOS to boot just because it got a few hundred static routes configured along with some firewall rules.

Exactly, if the config has been validated previously, then during boot time, it should simply go up in the same state as it was previosuly, saving a lot of processing time.

Because there are different types of checks happening, not the exact same checks both at set and commit. Checks at set are mostly format related, like did you put in something in the format of a MAC address or is there a random J in there that doesn’t belong? Is that hex key 128bits, or did you only put in 120?

As you already pointed out the config.boot could be modified before the system is powered on so it still needs some validation on startup.

I’m not saying there’s no room for improvement, but you’re unfairly comparing a large wrapper written in Python to a native application written in (likely optimized) C. You’re further comparing commands typed, which not only are being validated, but text is then being sent to the screen (even without errors you’ll get the prompt of where you are in the config tree). This all takes processing time, and added together it does result in some lag compared to a C application that is loading and validating a text file/input with no output to the screen until the end.

If you have a better way to handle this, by all means submit a PR with the appropriate changes. The team is great to work with.

Well a basic sha/md5 validation, wouldn’t be enough to ensure it remains as it was before rebooting ? This way, saving much of the startup time ?

Where would that hash be stored? What’s to stop someone from just changing the hash? Or changing the config and updating the hash?

Do you agree with me, that if someone does that in purpose, that then its his responsibility if anything goes wrong in the process ? A sane user, like you and me, wouldn’t do it for sure. :wink:

But if I were a malicious actor that gained access to your system I could cause a lot of havoc just by changing a hash, no?

I’ve yet to be not surprised by some actions taken by users and administrators alike.

I don’t disagree that load time is a valid criticism. I just don’t think the solution is as simple as many may think.

If you gained access to the system, then you’ve answered yourself, period. You don’t even have to matter about what could be done, as if you have access, then that’s about it. :wink:

Could be for example solved by having a 2nd config called I dunno, “config.verified” or “config.cached” that has a cryptographically signed hash at bottom to verify that the content of this file hasnt been easily modified by accident by the admin so then the prechecks for EVERY line back and forth doesnt have to be done (since it already was done once).

This way we would avoid the scenario of an admin uploading a custom config which might contains errors.

And regarding creating PR’s, thanks - I have already created several for the past months.