That is a remarkably good explanation and should end up on a website
or blog somewhere. A piece targetted
at gamers with the hz values....
Post by Jonathan Morton
- Are the latency spikes real? The fact that they disappear with sqm suggests so but what could cause such short spikes? Is it related to the powerboost?
I think I can explain what's going on here. TL;DR - the ISP is still using a dumb FIFO, but have resized it to something reasonable. The latency spikes result from an interaction between several systems.
"PowerBoost" is just a trade name for the increasingly-common practice of configuring a credit-mode shaper (typically a token bucket filter, or TBF for short) with a very large bucket. It refills that bucket at a fixed rate (100Mbps in your case), up to a specified maximum, and drains it proportionately for every byte sent over the link. Packets are sent when there are enough tokens in the bucket to transmit them in full, and not before. There may be a second TBF with a much smaller bucket, to enforce a limit on the burst rate (say 300Mbps) - but let's assume that isn't used here, so the true limit is the 1Gbps link rate.
Until the bucket empties, packets are sent over the link as soon as they arrive, so the observed latency is minimal and the throughput converges on a high value. The moment the bucket empties, however, packets are queued and throughput instantaneously drops to 100Mbps. The queue fills up quickly and overflows; packets are then dropped. TCP rightly interprets this as its cue to back off.
The maximum inter-flow induced latency is consistently about 125ms. This is roughly what you'd expect from a dumb FIFO that's been sized to 1x BDP, and *much* better than typical ISP configurations to date. I'd still much rather have the sub-millisecond induced latencies that Cake achieves, but this is a win for the average Joe Oblivious.
So why the spikes? Well, TCP backs off when it sees the packet losses, and it continues to do so until the queue drains enough to stop losing packets. However, this leaves TCP transmitting at less than the shaped rate, so the TBF starts filling its bucket with the leftovers. Latency returns to minimum because the queue is empty. TCP then gradually grows its congestion window again to probe the path; different TCPs do this in different patterns, but it'll generally take much more than one RTT at this bandwidth. Windows, I believe, increases cwnd by one packet per RTT (ie. is Reno-like).
So by the time TCP gets back up to 100Mbps, the TBF has stored quite a few spare tokens in its bucket. These now start to be spent, while TCP continues probing *past* 100Mbps, still not seeing the true limit. By the time the bucket empties, TCP is transmitting considerably faster than 100Mbps, such that the *average* throughput since the last loss is *exactly* 100Mbps - this last is what TBF is designed to enforce; it technically doesn't start with a full bucket but an empty one, but the bucket usually fills up before the user gets around to measuring it.
So then the TBF runs out of spare tokens and slams on the brakes, the queue rapidly fills and overflows, packets are lost, TCP reels, retransmits, and backs off again. Rinse, repeat.
- Would you enable sqm on this connection? By doing so I miss out on the higher rate for the first few seconds.
Yes, I would absolutely use SQM here. It'll both iron out those latency spikes and reduce packet loss, and what's more it'll prevent congestion-related latency and loss from affecting any but the provoking flow(s).
IMHO, the benefits of PowerBoost are illusory. When you've got 100Mbps steady-state, tripling that for a short period is simply not perceptible in most applications. Even Web browsing, which typically involves transfers smaller than the size of the bucket, is limited by latency not bandwidth, once you get above a couple of Mbps. For a real performance benefit - for example, speeding up large software updates - bandwidth increases need to be available for minutes, not seconds, so that a gigabyte or more can be transferred at the higher speed.
What are the actual downsides of not enabling sqm in this case?
Those latency spikes would be seen by latency-sensitive applications as jitter, which is one of the most insidious problems to cope with in a realtime interactive system. They coincide with momentary spikes in packet loss (which unfortunately is not represented in dslreports' graphs) which are also difficult to cope with.
That means your VoIP or videoconference session will glitch and drop out periodically, unless it has deliberately increased its own latency with internal buffering and redundant transmissions to compensate, if a bulk transfer is started up by some background application (Steam, Windows Update) or by someone else in your house (visiting niece bulk-uploads an SD card full of holiday photos to Instagram, wife cues up Netflix for the evening).
That also means your online game session, under similar circumstances, will occasionally fail to show you an enemy move in time for you to react to it, or even delay or fail to register your own actions because the packets notifying the server of them were queued and/or lost. Sure, 125ms is a far cry from the multiple seconds we often see, but it's still problematic for gamers - it corresponds to just 8Hz, when they're running their monitors at 144Hz and their mice at 1000Hz.
- Jonathan Morton
Bloat mailing list