BBR and FQ As new Defaults

p252 · October 6, 2023, 7:07pm

Hello,

I just wanted to give my opinion/input on something I saw in the latest October 2023 blog update. Specifically these points:

The kernel-level TCP congestion control algorithm is now BBR (Bottleneck Bandwidth and RTT) instead of the default CUBIC algorithm (T5489).
Similarly, the queuing discipline for network devices is now FQ (fair queue) instead of the default pfifo_fast (T5489).

I really do not think it is sane to force BBR and FQ as the default. Good to have as a configuration option for those who need or want to use them but there is a reason that almost all network routers and firewalls default to FIFO (what VyOS also used to be). My understanding is BBR is only beneficial on the endpoints and not routers/forwarding devices. So unless you are running VyOS as a server terminating TCP services probably should not use it. So, please leave defaults to pfifo_fast and cubic with the option to change them if the end user so desires. Or if VyOS is dead set on forcing these defaults then make sure there is option to change them back to sane router defaults.

See other discussions:

https://forum.mikrotik.com/viewtopic.php?t=165325

tjh · October 6, 2023, 7:47pm

You can already change them using sysctl calls.

I set BBR and FQ using them on 1.3

set system sysctl custom net.core.default_qdisc value 'fq'

Apachez · October 6, 2023, 11:07pm

there is a reason that almost all network routers and firewalls default to

Well they also default to 2 weeks of TCP established. That is a TCP session will occupy an entry in the conntrack table for 2 weeks even if not a single packet have passed through during that time.

Im just saying that not all defaults are sane…

From a router point of view FIFO is used interface to interface but for stuff such as VPN encryption and tunneling, NAT etc then its the VyOS itself who must deal with congestion and here BBR will be helpful compared to CUBIC.

Also some of the claims in that Mikrotik forum thread is somewhat outdated (for example BBR in BBRv3 do have ECN support):

Another source is that systemd (which Debian and therefor also VyOS uses as backend) defaults to fq_codel (if we want to use defaults as an argument) rather than pfifo_fast:

github.com

systemd/systemd/blob/main/sysctl.d/50-default.conf#L48


      
          
          # ping(8) without CAP_NET_ADMIN and CAP_NET_RAW
          # The upper limit is set to 2^31-1. Values greater than that get rejected by
          # the kernel because of this definition in linux/include/net/ping.h:
          #   #define GID_T_MAX (((gid_t)~0U) >> 1)
          # That's not so bad because values between 2^31 and 2^32-1 are reserved on
          # systemd-based systems anyway: https://systemd.io/UIDS-GIDS#summary
          -net.ipv4.ping_group_range = 0 2147483647
          
          # Fair Queue CoDel packet scheduler to fight bufferbloat
          -net.core.default_qdisc = fq_codel
          
          # Enable hard and soft link protection
          fs.protected_hardlinks = 1
          fs.protected_symlinks = 1
          
          # Enable regular file and FIFO protection
          fs.protected_regular = 1
          fs.protected_fifos = 1

As described in ⚓ T5489 Change to BBR as TCP congestion control, or at least make it an config option, if you want to revert the changes you can do so through these config options (and then reboot the box - I dont think a reboot is needed but better safe than sorry):

set system sysctl parameter net.core.default_qdisc value 'pfifo_fast'
set system sysctl parameter net.ipv4.tcp_congestion_control value 'cubic'

p252 · October 7, 2023, 12:26am

Agree, the super long tcp estabhlished timeout in Linux is definitely not a sane default for a router/firewall. It brings up a good point that the defaults set by general use Linux distros (like Debian) as a general server do not always make good defaults for using Linux/Debian as a router.

As for BBR and VPN - tunneling protocols are using udp, esp, gre, etc, so not sure how BBR TCP congestion control benefits those; again, seems BBR would be better on the end hosts making connections through the VPN tunnels.

I still think FIFO is the better overall default QDISC for a router/firewall and let people change it to suite their particular environment. However, FQ_CODEL would be a better choice as a default QDISC for a router when compared to FQ, IMO; as the manpages state “FQ (Fair Queue) is a classless packet scheduler meant to be mostly used for locally generated traffic”. Seems to be more of a corner use case in a router then a candidate for default. But, to each their own. So long as we can change the defaults that VyOS decides for everyone we should all be happy.

dtaht · October 7, 2023, 7:05pm

There is a lot to unpack here! I was not aware until recently that vyos had never made the switch to fq_codel away from pfifo_fast in the first place. We long ago (in openwrt) ripped out support for pfifo_fast entirely, and it is the default now in most of the main Linux ecosystem due to being more general purpose than sch_fq.

Before I dive into arguments, (can you engage those on the bug? Viacheslav? Is that how I do that?) The best defaults for a router OS, particularly one targeted at powerful hardware, should be BBRv3 and CAKE. Second best would be BBR and fq_codel. nearly last would be the combination of sch_fq + BBR, which is in a couple demonstrable ways worse than pfifo_fast + cubic!

What I have always encouraged however is experimentation so folk can more deeply understand congestion control problems more intuitively. These days I tend to recommend the crusader tool, coupled with packet captures, and then to look at the RTTs in Wireshark. There is also flent. Normally when I post to forums I get censored by putting in links, but I will do that in my next post.

I (perversely) love that y’all have made this basic change to the core OS and propagated to your userbase as it will be so much easier to demonstrate what damage you have just done to network performance overall!!!

dtaht · October 7, 2023, 7:14pm

So here I go usually triggering some sort of bot, by supplying links.
A simple test topology for tests like these would be

server → 100Mbit → VyOS → 1Gbit → client. Easily setup with ethtool to control the link rates.

(Flent Script)[Flaws and features in the Flent network testing tool - http://blog.cerowrt.org/] - This has some ranty stuff and a set of simple scripts that I use to demonstrate fq and aqm and congestive control behaviors.

(Crusader)[Release Automated build of v0.0.9-testing · Zoxc/crusader · GitHub] is multiplatform, and has a nice plot of up, down, and up+down behaviors. However it has no direct insight into congestion control itself (just fq), for which I take a packet capture and use Wireshark to plot the RTTs.

OK, here goes!

dtaht · October 7, 2023, 7:23pm

OK, at it’s heart there are a ton of misunderstandings going on.

With the 6.1 release of Linux, a new feature for sch_fq entered that is really important and beneficial to those running containerized services such as k8. For the first time, back pressure exists in Linux across containers, so sch_fq + TCP facing inwards, or outwards across the containers works really, really well, especially with BBR. WIN!

However all the hoopla about this does not apply to multiple boxes or the rest of the internet. I have tried to get the people posting about this from cilium to clarify this but so far no luck.

It is still the case that:

“sch_fq is for TCP servers, fq_codel for routers” - Eric Dumazet (who is the author of both).

If your workload includes forwarding packets from vms, VPNs, non-locally hosted boxes, udp, quic, etc, etc, sch_fq treats those as second class citizens, and worse, has a fixed packet limit and no AQM.

In this enormous and sometimes testy debate, I hopefully managed to give a good set of tests and a successful conclusion to systemd choice of fq_codel by default: Change default default qdisc from fq_codel to sch_fq · Issue #9725 · systemd/systemd · GitHub - and the beauty of someone reproducing that experiment with containers and Linux 6.1 is that it should work right now!!

But the remainder of the internet won’t work that way.

It has been 5 years since that debate, and it would be my hope, actually, for a router OS targeted primarily at higher end platforms would test and evaluate CAKE as the default qdisc.

p252 · October 8, 2023, 2:13pm

Some good info here. Thanks for the posts!

Apachez · October 8, 2023, 3:13pm

Do I understand you correctly that in your opinion having this as default:

net.core.default_qdisc=fq_codel
net.ipv4.tcp_congestion_control=bbr

would be a better option than the current one of these?

net.core.default_qdisc=fq
net.ipv4.tcp_congestion_control=bbr

Also IMHO defaulting to qdisc=cake would be waste of CPU resources as it seems from various tests I have seen comparing cake to fq_codel.

However I do think having that as an easy vyos option (rather than having to go through the sysctl config path) would be prefered (that is switch between available qdiscs and congestion controls).

As suggested in ⚓ T5489 Change to BBR as TCP congestion control, or at least make it an config option regarding “set system option congestion-control” (but also add “set system option qdisc”).

dtaht · October 8, 2023, 3:36pm

yes, I am suggesting BBR + fq_codel as a good sysctl default (accompanied by measurements showing the differences between pfifo-fast, fq, fq_codel).

However I have another thing to unpack. Cake’s reputation as a CPU hog is a little undeserved, and comes from the most visible (and fiddly) use case of using it in conjunction with a shaper, where it is single threaded, and the embedded market would run out of CPU sooner than with the alternative of htb+fq_codel. IF you have the cpu cake manages packets better, particularly in the nat case, and where you have some needed diffserv in use.
It’s the htb that’s the CPU killer, darn it. htb + fifo sucks too!

However things are different when you run at a native line rate via the sysctl. 1) you typically get one instance of the qdisc per core. 2) w/BQL backpressure - nearly universal now on Ethernet ( bql: Byte Queue Limits [LWN.net] ) , fq_codel is oh, call it 20x lighter than a shaper (I am not kidding), and the amount of CPU overhead it adds is barely measurable on a kernel profile of a box pushing gbits. Cake is a little more visible in this case, but is doing a lot more - breaking up gso in particular, but a rather long list of other things including some flood protection, per host fq etc.

There are billions of totally non-finicky instances of fq_codel out there, doing their job, silently without fanfare.

One huge difference you get from fq_codel or cake vs a vs fifo is an enormous improvement on simultaneous bidirectional throughput, up + down can actually saturate the wire with traffic going in both directions. I am often quite frustratedly pointing out that too few test that very common scenario.

Apachez · October 8, 2023, 5:45pm

I have created a PR for ⚓ T5489 Change to BBR as TCP congestion control, or at least make it an config option to change from “fq” to “fq_codel” as default qdisc in VyOS (“bbr” as default congestion control introduced by T5489: Add sysctl TCP congestion control by default to BBR by sever-sever · Pull Request #2224 · vyos/vyos-1x · GitHub remains):

github.com/vyos/vyos-1x

T5489: Change default qdisc from 'fq' to 'fq_codel'

vyos:current ← Apachez-:T5489

opened 05:42PM - 08 Oct 23 UTC

Apachez-

+1 -2

## Change Summary Default qdisc should be "fq_codel" rather than "fq" with "b…br" as congestion control in modern Linux kernels. This is a followup on https://github.com/vyos/vyos-1x/pull/2224 ## Types of changes  - [x] Bug fix (non-breaking change which fixes an issue) - [ ] New feature (non-breaking change which adds functionality) - [ ] Code style update (formatting, renaming) - [ ] Refactoring (no functional changes) - [ ] Migration from an old Vyatta component to vyos-1x, please link to related PR inside obsoleted component - [ ] Other (please describe): ## Related Task(s) * https://vyos.dev/T5489 ## Related PR(s) * https://github.com/vyos/vyos-1x/pull/2224 ## Component(s) name sysctl ## Proposed changes Change following parameters in `src/etc/sysctl.d/30-vyos-router.conf` from: ``` net.core.default_qdisc=fq net.ipv4.tcp_congestion_control=bbr ``` to: ``` net.core.default_qdisc=fq_codel net.ipv4.tcp_congestion_control=bbr ``` Ref: https://www.bufferbloat.net/projects/codel/wiki/ ``` Either qdisc can be enabled by default via a single sysctl option in /etc/sysctl.conf: net.core.default_qdisc = fq_codel - best general purpose qdisc net.core.default_qdisc = fq - for fat servers, fq_codel for routers. ``` Also thread in https://forum.vyos.io/t/bbr-and-fq-as-new-defaults/12344/10 ## How to test  After install/upgrade verify default settings through: ``` # sudo cat /etc/sysctl.d/30-vyos-router.conf ... # Congestion control net.core.default_qdisc=fq_codel net.ipv4.tcp_congestion_control=bbr ``` or: ``` vyos@vyos:~$ sudo sysctl -a | grep net.core.default_qdisc net.core.default_qdisc = fq_codel ``` ## Smoketest result  ## Checklist: - [x] I have read the [**CONTRIBUTING**](https://github.com/vyos/vyos-1x/blob/current/CONTRIBUTING.md) document - [x] I have linked this PR to one or more Phabricator Task(s) - [ ] I have run the components [**SMOKETESTS**](https://github.com/vyos/vyos-1x/tree/current/smoketest/scripts/cli) if applicable - [x] My commit headlines contain a valid Task id - [ ] My change requires a change to the documentation - [ ] I have updated the documentation accordingly

Apachez · October 8, 2023, 6:29pm

PR2349 (change default qdisc from “fq” to “fq_codel”) have been merged so if/when the nightly build succeeds the change should exist in VyOS 1.5-rolling from 2023-10-09 or newer (it has also been backported to VyOS 1.4 sagitta):

https://vyos.net/get/nightly-builds/

dtaht · October 9, 2023, 5:37pm

Wonderful, thx! I hope people notice an immediate difference in fast > slow network behaviors. Sourcing a test from the VyoS box, while simultaneously doing another test through the box, should be a very different experience, as well. For added bennies, revert back to pfifo-fast limit 10000 and cringe at what happens while also trying to surf the web.

This is just for native rates - 10,100,1000,2500mbit, 10gbit+, with or without flow control.

If for example the bottleneck link is a 1gbit/40Mbit cable connection, you can see needs improvement by something as simple as:

tc qdisc replace dev whatever cake bandwidth 40Mbit docsis nat ack-filter diffserv4 # yes this now single threaded but 600Mbit shaping easy

There are options for other forms of transport such as dsl, different diffserv models, etc, etc. I wish the nat option was on by default (a kernel patch to change this in vyos would be nice - then it would also work automatically in sysctl mode) and during covid most cake users I know switched to diffserv4 as that better respects the (optional) markings most videoconferencing services use.

The older sqm-scripts have options to do this complicatedly with fq-codel, but the relative simplicity of cake here, is a boon.

Shaping inbound is 4 lines of code, if needed. In the cable case we currently recommend shaping inbound at these high rates at 85% of the measured native throughput, but to test first, as in the WISP world, especially, many ISPs have over the past few years added QoE solutions like LibreQos.io, Preseem, Paraqum, or Bequant, that handle that better.

The hope was, when we wrote this: RFC 7567: IETF Recommendations Regarding Active Queue Management - that in the end every link would gain some kind of fq+aqm technology and fifos vanish from the world of networking.