QoS/QoE support in VyOS?

Since it came up on this thread:

I am curious as to the state of VyOS’s QoS or QoE configurability today, what is missing/felt as needed?

1 Like

To me I would say documentation (well sort of :slight_smile:


Most of it is from linux kernel docs so it should be up2date and such.

However for a new user to QoS (which quickly can turn into a rabbithole) I would say to have a couple of examples along with WHY one method is better than the other.

Like a common use case of simple bandwidth throttling as another thread asking to throttle downloads from WAN to 20Mbps:

Which methods of shaper towards LAN, limiter towards WAN or rate-control towards WAN should be used or should they be used in combo?

But also tricks such as why priotizie ACK (or rather make sure they wont get dropped as in why RED/WRED is very bad for throughput) specially if you have an asymmetric link (more down than up).


Were it me, I would deprecate wred, sfq, sfb, drr all in favor of cake, or in some cases htb+cake.

1 Like

Keep in mind though Vyos is used in many different places, not just home/edge-of-the-network.

Given that would you still do that? (I’m not saying you’re wrong, I’m only asking because I realise you understand this stuff very well)

1 Like

I have a tendency to lecture in the hope that one day, with deeper universal understanding, I won’t have to work so hard!

Short answer is anywhere there is a fast->slow transition on any network, fq+aqm technologies are needed to manage it. (rfc7567). Some of the backstory behind that rfc (which updates one first published in 1992) is there is and has always been a large and vocal contingent in the IETF forever opposed to fair queuing, and prefer an endless quest for the perfect AQM, and infinitely fiddly means of rationing traffic in every direction. I start with just simple fq+aqm and then aim to deliberately tune that to meet the application workloads.

Long answers:

Flow Queuing (as opposed to fair queuing) has an important different property than fair queuing, in that it puts an incentive into the network for applications to use slightly less than their fair share at any given moment to observe zero delay. Basically:

“if a flow’s input rate is less than the sum of all the other flows output rate, it goes out first. Otherwise it is evenly mixed with packets (bytes, actually) of all the other flows.” This makes possible voip, videoconferencing, gaming, arp, DNS packets to fly through the fatter flows, Only fat flows observe queueing, and can take their own measures to cope with it - delay based tcps (such as BBR) “just work”, otherwise drop or ecn marking will take place based on the structure of a secondary AQM tech, or tail drop if none exists.

This part is now implemented pretty thoroughly throughout Linux and Apple products today, is part of the tcp scheduler for Linux also, and in fq-pie, cake, and fq-codel. It is not however, implemented in switches or hardware routers, which have a tendency to use strict priority queues, and the wred AQM at best. (Switches put in their own form of FQ by servicing different ports as rapidly as possible)


In looking over the QoS document, one flaw in RED was noted, in that it can indeed, starve acks. This flaw is not widely known, I was glad to read about it again here.

VJ, the author of RED, with Kathie Nichols, spent about 16 years or so trying to come up with something that behaved properly. “There is no information in the average”: https://pollere.net/Pdfdocs/QrantJul06.pdf

He couldn’t even get a paper published refuting his work, since it was considered so authoritative that he couldn’t get it past blind review! RED in a Different Light | jg's Ramblings

Anyway, after they published Codel - and based on our observed uptake in RED in the field (any of you using it?), zero, we had thought all the router makers would adopt it within one or two generations, but it requires a key yet elegant change to the data path of carrying a timestamp along with every packet. PIE appeared, but the rate estimator was unstable…

Anyway, Codel alone is very gentle and absolutely far, far less damaging than the wild swings RED would do to averages, and is ok with acks, and useful all by itself so long as traffic is well mixed.

Much traffic is not well mixed. So Eric Dumazet and I’s efforts to top SFQ came into play at roughly the same time, culminating in SFQRED, which was the best thing ever for about two months, before VJ and Kathie topped it with codel, and then Eric spent a weekend writing fq_codel… it was a fun couple months!! I wish more people had played with sfqred!

Anyway, that sort of gets to criticising prior approaches.

SFQ has scaling problems because the fixed length fifos need to be larger and larger as bandwidth grows. Because it is per packet it can also tail drop a lot of pure acks leading to stalling in the other direction, which is what led to the fairly common combination of sfq + ack prioritization.
RED was just plain broken in too many circumstances.
SFB - meh
DRR is like byte fair SFQ and you can weight it - a good building block
QFQ also

fq_codel is byte fair, like drr++, the AQM drops from head rather than tail so you always get the most recent data, turns off below a mtu, and aims by default for 5ms queueing delay while still accepting bursts.

In the general case, fq_codel all by itself without any fiddly bits, is just fine for the majority of internet traffic, and especially good for bidirectional traffic of all sorts, and diffserv underused. It can achieve full throughput in both directions at the same time, while still working well for low rate anything.

And then there was the elephant left in the corporate room. Doing FQ everywhere requires deeply peering into the packet and constructing a five tuple hash, which is difficult to do in real time, at high forwaring rates, in off the shelf hardware. (x86 boxes and their Ethernet cards are tending to do this offloaded, or really fast, and is used by the RSS systems to distribute packets to cores)

WRED is sometimes built on a strict diffserv priority queue, where the lower bands can be completely starved by the higher ones. If you are in an environment where differv is trusted and in use it has some nice theoretical, but few actual, properties. Some use ARED (much better than red, but still based on averages), but I still do not have any clue as to how much of it is out there.

Anyway, CAKE is as close as we could come to beating WRED’s last compelling feature, (traffic differentiation) with better AQM, FQ, diffserv handling, ack processing, an integral shaper better than htb, and so on.

except that it does not do strict priority queuing, instead allocating 3-4 bands of service with all classes able to borrow. Outright starvation of certain traffic classes we view as a very bad footgun and a disincentive to mark packets as background, for example. We still see demand for other forms of htb + some classifier + fq-codel or cake in the future, but I kind of would like a head to head comparison of WRED vs cake by someone that was experienced with configuring it and use cases to subject it to, because the codel AQM is so much better in the first place.

Whew! Hope that helps.


I had been writing custom queuing systems for many years for many applications


This does not mean all the design choices for cake were right. I have been collecting input on how to improve it over here:

Including, sigh, the often requested feature of a strict priority queue.

Cake is also intensely programmable (see the qosify project), and there is another great project called cake-autorate that uses active measurements to make starlink & 5G’s latencies less awful


and to finish up: byte-wise fq and a drop from head AQM means you do not have to have a priority queue for pure acks… or QUIC. In highly asymmetric scenarios it helps to drop them even more intelligently: Adding ack-filtering to cake - http://blog.cerowrt.org/
(this will never work for quic)


To me I think its good to leave the options available in VyOS so the admin can make the choice for whatever reason which method is prefered.

However the guidelines aka recommendations of which to use in which scenario incl example config would be helpful.

Right now the admin is up to his own to make that decision without any further input - for example the usecase I mentioned previously of someone wants to limit downloads through WAN - should then shaper, limiter or rate-control (or combo) be used and why?

One of the worst things we have done to the internet was making QoS so complicated and fiddly. Particularly the core techniques of FQ and AQM are straightfoward, if only we had figured out BQL’s insight of bytes = time earlier, and that it was overlarge device rings and napi processing at the root of the Bufferbloat problem, long ago! I spent many many months researching the literature and trying all the qdiscs we had back in 2011 and observing no effect until BQL arrived. Nothing lined up with the theories! It was maddening! Smart people were telling me X,Y, or Z algorithm worked, and none did, which sparked writing this paper for SIGCOMM: https://conferences.sigcomm.org/sigcomm/2014/doc/slides/137.pdf

FQ and AQM techniques are actually very simple and lightweight, and would have deployed sooner, had we understood the device ring problem.

My hope is that seeing these new features show up in the rolling releases, that folk will jump in and polish them. (I am assuming however that you are also compiling in cake?)

I am good at queuing, bad at UI. There has been a lot of demand for cake over in the pfsense universe, and perhaps you will pull people from there for both the fq_codel native line rate stuff and for cake, in the long run. BSD still does not have a BQL-like mechanism in play, and you cannot run fq_codel at line rate there.


Just reading this thread…

@Apachez, I think your suggestion of documenting use cases and the most appropriate choices is sound and would be a good addition.

Without speaking for @dtaht, my read of his comments is that the default QoS/QoE setting can and should be changed.

The current default_qdisc is pfifo_fast. Changing the default to cake or fq_codel would be a simple change that a) does not limit admin’s options and b) would be a general improvement.

For instance, we set the following on all of our VyOS boxes

set system sysctl custom net.core.default_qdisc value 'cake'

There are likely a few other similar networking defaults (that I’m not familiar with) that might be worthwhile to update as well.


There are other ways to accomplish a good default. I do not know if vyos’s use cases include nat. If so, patching cake to make that the default would be worthwhile. Also, at least in my world, we have found the diffserv4 setting more useful than the default diffserv3 since COVID, as that respects zoom and ms-teams preferences. Lastly you can compile it (and/or fq_codel) in the default kernel as a default and not as a module.

There are nuances. Nat support by default add overhead even if you are not natting. (but forgetting to turn on nat support is one of the most common cake errors). As the default qdisc, in a hardware multi-queued device the diffserv classification grows less useful as you add hardware queues.

Adding or changing options in a mq environment is a pita, you have to walk the tree with a tc change or replace command.

If you are using tc to track packet drop and mark stats, you will need to parse the tc output differently there (although drops do show up via the snmp and ifconfig stats, I keep trying to show people ECN marking (lossless congestion control) is a thing)

bqlmon is a useful tool.

Over here was a highly entertaining vyos user’s Bufferbloat story and patches ( https://blog.apnic.net/2024/02/12/unbloating-the-buffers/ )


The term “bufferbloat” is however often misused or misunderstood.

Having buffers in a switch/router/firewall is not a bad thing - rather the opposite.

There are however a few specific usecases where buffers (and delay) yet making sure the packet arrives to destination is a bad thing and thats when dealing with VoIP codecs who generally speaking dont want a packet thats more than 30ms late because the codec have already moved on and extrapolated the missing packets (at the time of decoding).