For testing you can use for example https://trex-tgn.cisco.com/
The most critical thing to me when it comes to testing is to have reproducable tests at least if the goal is to compare one solution/vendor/model with another.
Versus tests you do for troubleshooting who doesnt have to be as scientific.
That is the test you did last week with with device A can be compared with the test you will do with device B the next week.
And by a test I mean multiple tests where you then remove the best and worst results and the remaining you do an average of.
If you have alot of money you can buy one of the hardware (ASIC/FPGA) based loadgenerators (Avalance, IXIA etc) which if you use a specific model and keep track of which firmware version you use the tests will be consistent over time.
If you do software based testing such as with TREX and other options the best in my opinion is to not change the software version being used during time of tests.
Common error when it comes to the GPU results which LTT and GN (over at Youtube) compares with over time where both OS, drivers and the game themselves have been changed between tests which means that the result of one card cannot be compared reliably with another card.
When it comes to the device under test also here would be nice to use a standardized config when comparing different vendors/models. For example enable all offloading options and document which of them wasnt supported by the hardware.
And finally when it comes to the tests themselves Im often interested in:
- Download only.
- Upload only.
- Both download and upload at the same time.
- Both Mbps (megabit per second) and pps (packets per second).
- Latency with and without load.
- Jitter with and without load.
- Testing both fullsize frames such as 1518 byte ethernet vs minimal size frames such as 64 byte ethernet.
- Testing between 2 interfaces of the device under test (like int1 + int2) vs all 24 interfaces (or how many the device have).
For the reporting dont just record CPU usage (since VyOS is a softwarebased router) but also usage per core. Incl. temperatures in case thermal throttling kicks in.
And if there is time and interest also add more “advanced” tests such as IPsec, Wireguard, QoS etc.
But also single vs multistream and IMIX as testpattern to get a result based on mixed packetsizes etc.
A quck and dirty test which isnt reproducable at all but better then nothing can also be to put your client on one interface and internet on another and just visit https://speed.cloudflare.com/