Discussion:
powerboost and sqm
(too old to reply)
Jonas Mårtensson
2018-06-29 10:56:17 UTC
Permalink
Raw Message
Hi,

I have a 100/100 Mbit/s (advertised speed) connection over fiber (p2p, not
PON). The actual link rate is 1 Gbit/s. My ISP seems to be using
burst-tolerant shaping (similar to powerboost) as can be seen in this
speedtest where the download rate is 300+ Mbit/s and the upload rate is
around 150 Mbit/s for the first few seconds:

http://www.dslreports.com/speedtest/35205027

It can be discussed why they are doing this but my questions are more
related to the impact on the quality of my connection. The ISPs shaper used
to introduce some bufferbloat, especially on the downlink, and I've been
using sqm for a while to mitigate this. But recently they seem to have
changed some configuration since the bufferbloat is now almost zero, except
for some very short spikes which only show up when I check "Hi-Res
BufferBloat" in test preferences (see speedtest above). When I enable sqm
on my router with htb/fq_codel or cake the spikes disappear:

htb/fq_codel:
http://www.dslreports.com/speedtest/35205620

cake:
http://www.dslreports.com/speedtest/35205718

Another difference is that the "Re-xmit" percentage (which I guess is
related to packet loss) is much higher without sqm enabled. Intuitively
this makes sense since temporarily allowing a higher rate should result in
more buffer overflow when the rate is decreased.

So, what do you think:

- Are the latency spikes real? The fact that they disappear with sqm
suggests so but what could cause such short spikes? Is it related to the
powerboost?

- Would you enable sqm on this connection? By doing so I miss out on the
higher rate for the first few seconds. What are the actual downsides of not
enabling sqm in this case?

/Jonas
Sebastian Moeller
2018-06-29 11:23:02 UTC
Permalink
Raw Message
Hi Jonas,

nice data.

> On Jun 29, 2018, at 12:56, Jonas Mårtensson <***@gmail.com> wrote:
>
> Hi,
>
> I have a 100/100 Mbit/s (advertised speed) connection over fiber (p2p, not PON). The actual link rate is 1 Gbit/s. My ISP seems to be using burst-tolerant shaping (similar to powerboost) as can be seen in this speedtest where the download rate is 300+ Mbit/s and the upload rate is around 150 Mbit/s for the first few seconds:
>
> http://www.dslreports.com/speedtest/35205027
>
> It can be discussed why they are doing this but my questions are more related to the impact on the quality of my connection. The ISPs shaper used to introduce some bufferbloat, especially on the downlink, and I've been using sqm for a while to mitigate this. But recently they seem to have changed some configuration since the bufferbloat is now almost zero, except for some very short spikes which only show up when I check "Hi-Res BufferBloat" in test preferences (see speedtest above). When I enable sqm on my router with htb/fq_codel or cake the spikes disappear:
>
> htb/fq_codel:
> http://www.dslreports.com/speedtest/35205620
>
> cake:
> http://www.dslreports.com/speedtest/35205718
>
> Another difference is that the "Re-xmit" percentage (which I guess is related to packet loss) is much higher without sqm enabled. Intuitively this makes sense since temporarily allowing a higher rate should result in more buffer overflow when the rate is decreased.
>
> So, what do you think:
>
> - Are the latency spikes real?

I believe so, but I would try to use flent's rrul test to see whether theses also show up outside of a browser-based test.


> The fact that they disappear with sqm suggests so but what could cause such short spikes? Is it related to the powerboost?

I do not think so, as you have these spikes all over the upload test while the boost only last for around a third of the test.

>
> - Would you enable sqm on this connection?

Personally I would, but for me the per host isolation is one of the features I really like to have.

> By doing so I miss out on the higher rate for the first few seconds. What are the actual downsides of not enabling sqm in this case?

Well, you loose some bandwidth but that seems to be it, you gain all the other nice features that especially cake offers.

Best Regards
Sebastian

>
> /Jonas
> _______________________________________________
> Bloat mailing list
> ***@lists.bufferbloat.net
> https://lists.bufferbloat.net/listinfo/bloat
Jonas Mårtensson
2018-06-30 06:26:25 UTC
Permalink
Raw Message
>
> > - Are the latency spikes real?
>
> I believe so, but I would try to use flent's rrul test to see
> whether theses also show up outside of a browser-based test.
>

I played around with flent a bit, here are some example plots:

https://dl.dropbox.com/s/facariwkp5x5dh1/flent.zip?dl=1

The short spikes are not seen with flent so I'm led to believe these are
just a result of running the "Hi-Res" dslreports test in a browser. In the
flent rrul test, up to about 10 ms induced latency can be seen during the
"powerboost" phase but after that it is almost zero. I'm curious about how
this is implemented on the ISP side. If anything, sqm seems to induce a bit
more latency during the "steady-state" phase.

/Jonas
Jonathan Morton
2018-06-30 07:20:13 UTC
Permalink
Raw Message
> On 30 Jun, 2018, at 9:26 am, Jonas Mårtensson <***@gmail.com> wrote:
>
> In the flent rrul test, up to about 10 ms induced latency can be seen during the "powerboost" phase but after that it is almost zero. I'm curious about how this is implemented on the ISP side.

Now that is a completely different result. It looks as though the ISP has *some* kind of smart queue (could just be SFQ) attached to its shaper, but not to the link hardware itself. During PowerBoost, until the bucket fills up, you see the latter's dumb FIFO instead.

To understand whether the ISP is using SFQ or something that actually performs AQM, you'll need to get a packet dump and look for ECN marks. The pattern of these might also reveal whether it's a Codel-family or a RED-family AQM (the latter including PIE).

- Jonathan Morton
Pete Heist
2018-06-30 07:46:27 UTC
Permalink
Raw Message
> On Jun 30, 2018, at 8:26 AM, Jonas MÃ¥rtensson <***@gmail.com> wrote:
>
> I played around with flent a bit, here are some example plots:
>
> https://dl.dropbox.com/s/facariwkp5x5dh1/flent.zip?dl=1 <https://dl.dropbox.com/s/facariwkp5x5dh1/flent.zip?dl=1>
>
> The short spikes are not seen with flent so I'm led to believe these are just a result of running the "Hi-Res" dslreports test in a browser. In the flent rrul test, up to about 10 ms induced latency can be seen during the "powerboost" phase but after that it is almost zero. I'm curious about how this is implemented on the ISP side. If anything, sqm seems to induce a bit more latency during the "steady-state" phase.

You may also want to try running flent with --socket-stats and making a tcp_rtt plot. You should see a significant difference in TCP RTT between sfq and anything that uses CoDel.

Also, double check the basics- that you’re truly in control of the queue and the device running sqm isn’t running out of CPU and has solid device drivers that aren’t causing periodic pauses or other anomalies (which also follows for your client device). I’ve been sideswiped by such things before when testing sqm, and making theory and experiment fully agree can take science and time.

Pete
Dave Taht
2018-06-30 11:22:38 UTC
Permalink
Raw Message
or, like... just ask 'em?
On Sat, Jun 30, 2018 at 3:46 AM Pete Heist <***@heistp.net> wrote:
>
>
> On Jun 30, 2018, at 8:26 AM, Jonas Mårtensson <***@gmail.com> wrote:
>>
>>
> I played around with flent a bit, here are some example plots:
>
> https://dl.dropbox.com/s/facariwkp5x5dh1/flent.zip?dl=1
>
> The short spikes are not seen with flent so I'm led to believe these are just a result of running the "Hi-Res" dslreports test in a browser. In the flent rrul test, up to about 10 ms induced latency can be seen during the "powerboost" phase but after that it is almost zero. I'm curious about how this is implemented on the ISP side. If anything, sqm seems to induce a bit more latency during the "steady-state" phase.
>
>
> You may also want to try running flent with --socket-stats and making a tcp_rtt plot. You should see a significant difference in TCP RTT between sfq and anything that uses CoDel.
>
> Also, double check the basics- that you’re truly in control of the queue and the device running sqm isn’t running out of CPU and has solid device drivers that aren’t causing periodic pauses or other anomalies (which also follows for your client device). I’ve been sideswiped by such things before when testing sqm, and making theory and experiment fully agree can take science and time.
>
> Pete
>
> _______________________________________________
> Bloat mailing list
> ***@lists.bufferbloat.net
> https://lists.bufferbloat.net/listinfo/bloat



--

Dave Täht
CEO, TekLibre, LLC
http://www.teklibre.com
Tel: 1-669-226-2619
Jonas Mårtensson
2018-07-01 21:49:56 UTC
Permalink
Raw Message
On Sat, Jun 30, 2018 at 9:46 AM Pete Heist <***@heistp.net> wrote:

>
> On Jun 30, 2018, at 8:26 AM, Jonas MÃ¥rtensson <***@gmail.com>
> wrote:
>
>>
>> I played around with flent a bit, here are some example plots:
>
> https://dl.dropbox.com/s/facariwkp5x5dh1/flent.zip?dl=1
>
> The short spikes are not seen with flent so I'm led to believe these are
> just a result of running the "Hi-Res" dslreports test in a browser. In the
> flent rrul test, up to about 10 ms induced latency can be seen during the
> "powerboost" phase but after that it is almost zero. I'm curious about how
> this is implemented on the ISP side. If anything, sqm seems to induce a bit
> more latency during the "steady-state" phase.
>
>
> You may also want to try running flent with --socket-stats and making a
> tcp_rtt plot. You should see a significant difference in TCP RTT between
> sfq and anything that uses CoDel.
>

In case anyone is curious I tried this and the tcp rtt plot looks very
similar to the ping rtt plot, i.e. the latencies are the same.
Benjamin Cronce
2018-07-04 20:25:27 UTC
Permalink
Raw Message
Strict token bucket without fair queuing can cause packetloss bursts for
all flows. In my personal experience when dealing with a low(single digit)
RTT, I would find that my ex-50Mb connection would accept a 1Gb burst and
ACK all of the data. Then the sender would think I had a 1Gb link and keep
sending at 1Gb. Around the 200ms mark, there would be a steep slope where
all of my traffic would suddenly see ~5% loss for the rest of that second.
Once steady state was reached, it was fine. The issue seemed to have a
baseline relative to the ratio between the provisioned rate and the burst
rate, with a dynamic multiplier not-quite-linearly driven by the link's
current utilization. At ~0% average utilization, bursts that lasted longer
than the bucket could induce maximum, and not much of an issue past 80%.

I could reliably recreate the issue by loading a high bandwidth video on
youtube and jumping around the timeline to unbuffered segments. I had
anywhere from 6ms to 12ms latency to youtube CDNs depending on the route
and which datacenter. Not only could I measure the issue with icmp at 100
samples per second, but I could reliably see issues in-game with either UDP
or TCP based games. Simply shaping to 1-2Mb below my provisioned rate and
enabling Codel seemed to alleviate the issue into not-noticing.

On Sun, Jul 1, 2018 at 4:50 PM Jonas MÃ¥rtensson <***@gmail.com>
wrote:

>
>
> On Sat, Jun 30, 2018 at 9:46 AM Pete Heist <***@heistp.net> wrote:
>
>>
>> On Jun 30, 2018, at 8:26 AM, Jonas MÃ¥rtensson <***@gmail.com>
>> wrote:
>>
>>>
>>> I played around with flent a bit, here are some example plots:
>>
>> https://dl.dropbox.com/s/facariwkp5x5dh1/flent.zip?dl=1
>>
>> The short spikes are not seen with flent so I'm led to believe these are
>> just a result of running the "Hi-Res" dslreports test in a browser. In the
>> flent rrul test, up to about 10 ms induced latency can be seen during the
>> "powerboost" phase but after that it is almost zero. I'm curious about how
>> this is implemented on the ISP side. If anything, sqm seems to induce a bit
>> more latency during the "steady-state" phase.
>>
>>
>> You may also want to try running flent with --socket-stats and making a
>> tcp_rtt plot. You should see a significant difference in TCP RTT between
>> sfq and anything that uses CoDel.
>>
>
> In case anyone is curious I tried this and the tcp rtt plot looks very
> similar to the ping rtt plot, i.e. the latencies are the same.
> _______________________________________________
> Bloat mailing list
> ***@lists.bufferbloat.net
> https://lists.bufferbloat.net/listinfo/bloat
>
Jonathan Morton
2018-06-29 12:22:15 UTC
Permalink
Raw Message
> On 29 Jun, 2018, at 1:56 pm, Jonas Mårtensson <***@gmail.com> wrote:
>
> So, what do you think:
>
> - Are the latency spikes real? The fact that they disappear with sqm suggests so but what could cause such short spikes? Is it related to the powerboost?

I think I can explain what's going on here. TL;DR - the ISP is still using a dumb FIFO, but have resized it to something reasonable. The latency spikes result from an interaction between several systems.

"PowerBoost" is just a trade name for the increasingly-common practice of configuring a credit-mode shaper (typically a token bucket filter, or TBF for short) with a very large bucket. It refills that bucket at a fixed rate (100Mbps in your case), up to a specified maximum, and drains it proportionately for every byte sent over the link. Packets are sent when there are enough tokens in the bucket to transmit them in full, and not before. There may be a second TBF with a much smaller bucket, to enforce a limit on the burst rate (say 300Mbps) - but let's assume that isn't used here, so the true limit is the 1Gbps link rate.

Until the bucket empties, packets are sent over the link as soon as they arrive, so the observed latency is minimal and the throughput converges on a high value. The moment the bucket empties, however, packets are queued and throughput instantaneously drops to 100Mbps. The queue fills up quickly and overflows; packets are then dropped. TCP rightly interprets this as its cue to back off.

The maximum inter-flow induced latency is consistently about 125ms. This is roughly what you'd expect from a dumb FIFO that's been sized to 1x BDP, and *much* better than typical ISP configurations to date. I'd still much rather have the sub-millisecond induced latencies that Cake achieves, but this is a win for the average Joe Oblivious.

So why the spikes? Well, TCP backs off when it sees the packet losses, and it continues to do so until the queue drains enough to stop losing packets. However, this leaves TCP transmitting at less than the shaped rate, so the TBF starts filling its bucket with the leftovers. Latency returns to minimum because the queue is empty. TCP then gradually grows its congestion window again to probe the path; different TCPs do this in different patterns, but it'll generally take much more than one RTT at this bandwidth. Windows, I believe, increases cwnd by one packet per RTT (ie. is Reno-like).

So by the time TCP gets back up to 100Mbps, the TBF has stored quite a few spare tokens in its bucket. These now start to be spent, while TCP continues probing *past* 100Mbps, still not seeing the true limit. By the time the bucket empties, TCP is transmitting considerably faster than 100Mbps, such that the *average* throughput since the last loss is *exactly* 100Mbps - this last is what TBF is designed to enforce; it technically doesn't start with a full bucket but an empty one, but the bucket usually fills up before the user gets around to measuring it.

So then the TBF runs out of spare tokens and slams on the brakes, the queue rapidly fills and overflows, packets are lost, TCP reels, retransmits, and backs off again. Rinse, repeat.

> - Would you enable sqm on this connection? By doing so I miss out on the higher rate for the first few seconds.

Yes, I would absolutely use SQM here. It'll both iron out those latency spikes and reduce packet loss, and what's more it'll prevent congestion-related latency and loss from affecting any but the provoking flow(s).

IMHO, the benefits of PowerBoost are illusory. When you've got 100Mbps steady-state, tripling that for a short period is simply not perceptible in most applications. Even Web browsing, which typically involves transfers smaller than the size of the bucket, is limited by latency not bandwidth, once you get above a couple of Mbps. For a real performance benefit - for example, speeding up large software updates - bandwidth increases need to be available for minutes, not seconds, so that a gigabyte or more can be transferred at the higher speed.

> What are the actual downsides of not enabling sqm in this case?


Those latency spikes would be seen by latency-sensitive applications as jitter, which is one of the most insidious problems to cope with in a realtime interactive system. They coincide with momentary spikes in packet loss (which unfortunately is not represented in dslreports' graphs) which are also difficult to cope with.

That means your VoIP or videoconference session will glitch and drop out periodically, unless it has deliberately increased its own latency with internal buffering and redundant transmissions to compensate, if a bulk transfer is started up by some background application (Steam, Windows Update) or by someone else in your house (visiting niece bulk-uploads an SD card full of holiday photos to Instagram, wife cues up Netflix for the evening).

That also means your online game session, under similar circumstances, will occasionally fail to show you an enemy move in time for you to react to it, or even delay or fail to register your own actions because the packets notifying the server of them were queued and/or lost. Sure, 125ms is a far cry from the multiple seconds we often see, but it's still problematic for gamers - it corresponds to just 8Hz, when they're running their monitors at 144Hz and their mice at 1000Hz.

- Jonathan Morton
Dave Taht
2018-06-29 14:00:13 UTC
Permalink
Raw Message
That is a remarkably good explanation and should end up on a website
or blog somewhere. A piece targetted
at gamers with the hz values....
On Fri, Jun 29, 2018 at 8:22 AM Jonathan Morton <***@gmail.com> wrote:
>
> > On 29 Jun, 2018, at 1:56 pm, Jonas Mårtensson <***@gmail.com> wrote:
> >
> > So, what do you think:
> >
> > - Are the latency spikes real? The fact that they disappear with sqm suggests so but what could cause such short spikes? Is it related to the powerboost?
>
> I think I can explain what's going on here. TL;DR - the ISP is still using a dumb FIFO, but have resized it to something reasonable. The latency spikes result from an interaction between several systems.
>
> "PowerBoost" is just a trade name for the increasingly-common practice of configuring a credit-mode shaper (typically a token bucket filter, or TBF for short) with a very large bucket. It refills that bucket at a fixed rate (100Mbps in your case), up to a specified maximum, and drains it proportionately for every byte sent over the link. Packets are sent when there are enough tokens in the bucket to transmit them in full, and not before. There may be a second TBF with a much smaller bucket, to enforce a limit on the burst rate (say 300Mbps) - but let's assume that isn't used here, so the true limit is the 1Gbps link rate.
>
> Until the bucket empties, packets are sent over the link as soon as they arrive, so the observed latency is minimal and the throughput converges on a high value. The moment the bucket empties, however, packets are queued and throughput instantaneously drops to 100Mbps. The queue fills up quickly and overflows; packets are then dropped. TCP rightly interprets this as its cue to back off.
>
> The maximum inter-flow induced latency is consistently about 125ms. This is roughly what you'd expect from a dumb FIFO that's been sized to 1x BDP, and *much* better than typical ISP configurations to date. I'd still much rather have the sub-millisecond induced latencies that Cake achieves, but this is a win for the average Joe Oblivious.
>
> So why the spikes? Well, TCP backs off when it sees the packet losses, and it continues to do so until the queue drains enough to stop losing packets. However, this leaves TCP transmitting at less than the shaped rate, so the TBF starts filling its bucket with the leftovers. Latency returns to minimum because the queue is empty. TCP then gradually grows its congestion window again to probe the path; different TCPs do this in different patterns, but it'll generally take much more than one RTT at this bandwidth. Windows, I believe, increases cwnd by one packet per RTT (ie. is Reno-like).
>
> So by the time TCP gets back up to 100Mbps, the TBF has stored quite a few spare tokens in its bucket. These now start to be spent, while TCP continues probing *past* 100Mbps, still not seeing the true limit. By the time the bucket empties, TCP is transmitting considerably faster than 100Mbps, such that the *average* throughput since the last loss is *exactly* 100Mbps - this last is what TBF is designed to enforce; it technically doesn't start with a full bucket but an empty one, but the bucket usually fills up before the user gets around to measuring it.
>
> So then the TBF runs out of spare tokens and slams on the brakes, the queue rapidly fills and overflows, packets are lost, TCP reels, retransmits, and backs off again. Rinse, repeat.
>
> > - Would you enable sqm on this connection? By doing so I miss out on the higher rate for the first few seconds.
>
> Yes, I would absolutely use SQM here. It'll both iron out those latency spikes and reduce packet loss, and what's more it'll prevent congestion-related latency and loss from affecting any but the provoking flow(s).
>
> IMHO, the benefits of PowerBoost are illusory. When you've got 100Mbps steady-state, tripling that for a short period is simply not perceptible in most applications. Even Web browsing, which typically involves transfers smaller than the size of the bucket, is limited by latency not bandwidth, once you get above a couple of Mbps. For a real performance benefit - for example, speeding up large software updates - bandwidth increases need to be available for minutes, not seconds, so that a gigabyte or more can be transferred at the higher speed.
>
> > What are the actual downsides of not enabling sqm in this case?
>
>
> Those latency spikes would be seen by latency-sensitive applications as jitter, which is one of the most insidious problems to cope with in a realtime interactive system. They coincide with momentary spikes in packet loss (which unfortunately is not represented in dslreports' graphs) which are also difficult to cope with.
>
> That means your VoIP or videoconference session will glitch and drop out periodically, unless it has deliberately increased its own latency with internal buffering and redundant transmissions to compensate, if a bulk transfer is started up by some background application (Steam, Windows Update) or by someone else in your house (visiting niece bulk-uploads an SD card full of holiday photos to Instagram, wife cues up Netflix for the evening).
>
> That also means your online game session, under similar circumstances, will occasionally fail to show you an enemy move in time for you to react to it, or even delay or fail to register your own actions because the packets notifying the server of them were queued and/or lost. Sure, 125ms is a far cry from the multiple seconds we often see, but it's still problematic for gamers - it corresponds to just 8Hz, when they're running their monitors at 144Hz and their mice at 1000Hz.
>
> - Jonathan Morton
>
> _______________________________________________
> Bloat mailing list
> ***@lists.bufferbloat.net
> https://lists.bufferbloat.net/listinfo/bloat



--

Dave Täht
CEO, TekLibre, LLC
http://www.teklibre.com
Tel: 1-669-226-2619
Sebastian Moeller
2018-06-29 14:42:26 UTC
Permalink
Raw Message
Hi Dave,

> On Jun 29, 2018, at 16:00, Dave Taht <***@gmail.com> wrote:
>
> That is a remarkably good explanation and should end up on a website
> or blog somewhere.

+1

> A piece targetted
> at gamers with the hz values....

I do not believe that using Hz will make things easier to digest and understand; after all everybody understands time, but few people intuitively understand reciprocals. At that point it might be more intuitive to give the distance light will travel in fiber in the latency increase under load ("you could be playing your FPS against an opponent on the moon at that bufferbloat level").

Best Regards
Sebastian


> On Fri, Jun 29, 2018 at 8:22 AM Jonathan Morton <***@gmail.com> wrote:
>>
>>> On 29 Jun, 2018, at 1:56 pm, Jonas Mårtensson <***@gmail.com> wrote:
>>>
>>> So, what do you think:
>>>
>>> - Are the latency spikes real? The fact that they disappear with sqm suggests so but what could cause such short spikes? Is it related to the powerboost?
>>
>> I think I can explain what's going on here. TL;DR - the ISP is still using a dumb FIFO, but have resized it to something reasonable. The latency spikes result from an interaction between several systems.
>>
>> "PowerBoost" is just a trade name for the increasingly-common practice of configuring a credit-mode shaper (typically a token bucket filter, or TBF for short) with a very large bucket. It refills that bucket at a fixed rate (100Mbps in your case), up to a specified maximum, and drains it proportionately for every byte sent over the link. Packets are sent when there are enough tokens in the bucket to transmit them in full, and not before. There may be a second TBF with a much smaller bucket, to enforce a limit on the burst rate (say 300Mbps) - but let's assume that isn't used here, so the true limit is the 1Gbps link rate.
>>
>> Until the bucket empties, packets are sent over the link as soon as they arrive, so the observed latency is minimal and the throughput converges on a high value. The moment the bucket empties, however, packets are queued and throughput instantaneously drops to 100Mbps. The queue fills up quickly and overflows; packets are then dropped. TCP rightly interprets this as its cue to back off.
>>
>> The maximum inter-flow induced latency is consistently about 125ms. This is roughly what you'd expect from a dumb FIFO that's been sized to 1x BDP, and *much* better than typical ISP configurations to date. I'd still much rather have the sub-millisecond induced latencies that Cake achieves, but this is a win for the average Joe Oblivious.
>>
>> So why the spikes? Well, TCP backs off when it sees the packet losses, and it continues to do so until the queue drains enough to stop losing packets. However, this leaves TCP transmitting at less than the shaped rate, so the TBF starts filling its bucket with the leftovers. Latency returns to minimum because the queue is empty. TCP then gradually grows its congestion window again to probe the path; different TCPs do this in different patterns, but it'll generally take much more than one RTT at this bandwidth. Windows, I believe, increases cwnd by one packet per RTT (ie. is Reno-like).
>>
>> So by the time TCP gets back up to 100Mbps, the TBF has stored quite a few spare tokens in its bucket. These now start to be spent, while TCP continues probing *past* 100Mbps, still not seeing the true limit. By the time the bucket empties, TCP is transmitting considerably faster than 100Mbps, such that the *average* throughput since the last loss is *exactly* 100Mbps - this last is what TBF is designed to enforce; it technically doesn't start with a full bucket but an empty one, but the bucket usually fills up before the user gets around to measuring it.
>>
>> So then the TBF runs out of spare tokens and slams on the brakes, the queue rapidly fills and overflows, packets are lost, TCP reels, retransmits, and backs off again. Rinse, repeat.
>>
>>> - Would you enable sqm on this connection? By doing so I miss out on the higher rate for the first few seconds.
>>
>> Yes, I would absolutely use SQM here. It'll both iron out those latency spikes and reduce packet loss, and what's more it'll prevent congestion-related latency and loss from affecting any but the provoking flow(s).
>>
>> IMHO, the benefits of PowerBoost are illusory. When you've got 100Mbps steady-state, tripling that for a short period is simply not perceptible in most applications. Even Web browsing, which typically involves transfers smaller than the size of the bucket, is limited by latency not bandwidth, once you get above a couple of Mbps. For a real performance benefit - for example, speeding up large software updates - bandwidth increases need to be available for minutes, not seconds, so that a gigabyte or more can be transferred at the higher speed.
>>
>>> What are the actual downsides of not enabling sqm in this case?
>>
>>
>> Those latency spikes would be seen by latency-sensitive applications as jitter, which is one of the most insidious problems to cope with in a realtime interactive system. They coincide with momentary spikes in packet loss (which unfortunately is not represented in dslreports' graphs) which are also difficult to cope with.
>>
>> That means your VoIP or videoconference session will glitch and drop out periodically, unless it has deliberately increased its own latency with internal buffering and redundant transmissions to compensate, if a bulk transfer is started up by some background application (Steam, Windows Update) or by someone else in your house (visiting niece bulk-uploads an SD card full of holiday photos to Instagram, wife cues up Netflix for the evening).
>>
>> That also means your online game session, under similar circumstances, will occasionally fail to show you an enemy move in time for you to react to it, or even delay or fail to register your own actions because the packets notifying the server of them were queued and/or lost. Sure, 125ms is a far cry from the multiple seconds we often see, but it's still problematic for gamers - it corresponds to just 8Hz, when they're running their monitors at 144Hz and their mice at 1000Hz.
>>
>> - Jonathan Morton
>>
>> _______________________________________________
>> Bloat mailing list
>> ***@lists.bufferbloat.net
>> https://lists.bufferbloat.net/listinfo/bloat
>
>
>
> --
>
> Dave Täht
> CEO, TekLibre, LLC
> http://www.teklibre.com
> Tel: 1-669-226-2619
> _______________________________________________
> Bloat mailing list
> ***@lists.bufferbloat.net
> https://lists.bufferbloat.net/listinfo/bloat
Jonas Mårtensson
2018-06-29 15:45:09 UTC
Permalink
Raw Message
Hi Jonathan,

thanks for the detailed analysis. Some comments:

"PowerBoost" is just a trade name for the increasingly-common practice of
> configuring a credit-mode shaper (typically a token bucket filter, or TBF
> for short) with a very large bucket.


Do you have any data indicating this is increasingly common? I believe
Comcast, who introduced the trade name, stopped doing it.


> The maximum inter-flow induced latency is consistently about 125ms. This
> is roughly what you'd expect from a dumb FIFO that's been sized to 1x BDP,
> and *much* better than typical ISP configurations to date. I'd still much
> rather have the sub-millisecond induced latencies that Cake achieves, but
> this is a win for the average Joe Oblivious.
>

Where do you get 125 ms from my results? Do you mean 25 ms? That's what the
latency hovers around except for the short spikes maxing out at around 260
ms during upload. Compared to the idle latency, 25 ms means the induced
latency is close to the sub-milliseconds Cake achieves (again, except for
the spikes).

So why the spikes?


Yeah, I still do not fully get this from your explanation. Are you saying
that the buffer in the shaper is 260 ms? But this doesn't look like typical
bufferbloat since I don't even see it without high resolution.


> Windows, I believe, increases cwnd by one packet per RTT (ie. is
> Reno-like).
>

No, Windows 10 uses cubic tcp these days.


> Yes, I would absolutely use SQM here. It'll both iron out those latency
> spikes and reduce packet loss, and what's more it'll prevent
> congestion-related latency and loss from affecting any but the provoking
> flow(s).
>

Still not sure about the latency spikes but I probably agree about the
packet loss. Another thing I like about sqm is that I can set it to use ecn.


> IMHO, the benefits of PowerBoost are illusory. When you've got 100Mbps
> steady-state, tripling that for a short period is simply not perceptible in
> most applications. Even Web browsing, which typically involves transfers
> smaller than the size of the bucket, is limited by latency not bandwidth,
> once you get above a couple of Mbps. For a real performance benefit - for
> example, speeding up large software updates - bandwidth increases need to
> be available for minutes, not seconds, so that a gigabyte or more can be
> transferred at the higher speed.
>

I think I agree. For some things, like downloading a large photo or a not
so large software update, I guess it could make a small difference,
although I don't know how perceptible it is.


> Those latency spikes would be seen by latency-sensitive applications as
> jitter, which is one of the most insidious problems to cope with in a
> realtime interactive system. They coincide with momentary spikes in packet
> loss (which unfortunately is not represented in dslreports' graphs) which
> are also difficult to cope with.
>
> That means your VoIP or videoconference session will glitch and drop out
> periodically, unless it has deliberately increased its own latency with
> internal buffering and redundant transmissions to compensate, if a bulk
> transfer is started up by some background application (Steam, Windows
> Update) or by someone else in your house (visiting niece bulk-uploads an SD
> card full of holiday photos to Instagram, wife cues up Netflix for the
> evening).
>

Yes, this is a good motivation of course. Still, I will try to investigate
a bit more how real those latency spikes are and how big the impact of the
packet loss is.

/Jonas
Loading...