Discussion:
[Cerowrt-devel] DOCSIS 3+ recommendation?
(too old to reply)
Dave Taht
2015-03-19 17:11:11 UTC
Permalink
On Thu, Mar 19, 2015 at 6:53 AM, <***@reed.com> wrote:
> How many years has it been since Comcast said they were going to fix bufferbloat in their network within a year?

It is unfair to lump every individual in an organization together. All
orgs have people trying to do the right thing(s), and sometimes,
eventually, they win. All that is required for evil to triumph is for
good people to do nothing, and docsis 3.1 is entering trials. Some
competition still exists there for both modems (8? providers?) and
CMTSes (3). My hope is that if we can continue to poke at it,
eventually a better modem and cmts setup will emerge, from someone.

http://www-personal.umich.edu/~jlawler/aue/sig.html

Or one of the CMTS vendors will ship something that works better,
although the ARRIS study had many flaws (LRED was lousy, their SFQ
enhancement quite interesting):

preso: http://snapon.lab.bufferbloat.net/~d/trimfat/Cloonan_Presentation.pdf
paper: http://snapon.lab.bufferbloat.net/~d/trimfat/Cloonan_Paper.pdf

I have of the cynical view that it does help to have knowledgeable
people such as yourself rattling the cages, and certainly I was
pleased with the results of my recent explosion at virgin - 2000+ hits
on the web site! 150 +1s! So I do plan to start blogging again
(everyone tired of my long emails? wait til you see the blog!)

> And LTE operators haven't even started.

And we haven't worked our magic on them, nor conducted sufficient
research on how they could get it more right. That said, there has
been progress in that area as well, and certainly quite a few papers
demonstrating their problems.

> THat's a sign that the two dominant sectors of "Internet Access" business are refusing to support quality Internet service. (the old saying about monopoly AT&T: "we don't care. we don't have to." applies to these sectors).
>
> Have fun avoiding bufferbloat in places where there is no "home router" you can put fq_codel into.

Given the game theory here, this is why my own largest bet has been on
trying to resuscitate the home router and small business firewall
markets.

covering bets are on at least some ISPs (maybe not in the US) getting
it right, on regulation, etc.

Forces I am actively working against include the plans juniper and
cisco are pimping for moving ISP cpe into the cloud.

> It's almost as if the cable companies don't want OTT video or simultaneous FTP and interactive gaming to work. Of course not. They'd never do that.

I do understand there are strong forces against us, especially in the USA.

I ended up writing a MUCH longer blog entry for this, I do hope I get
around to getting that site up.

>
>
>
> On Wednesday, March 18, 2015 3:50pm, "Jonathan Morton" <***@gmail.com> said:
>
>> _______________________________________________
>> Cerowrt-devel mailing list
>> Cerowrt-***@lists.bufferbloat.net
>> https://lists.bufferbloat.net/listinfo/cerowrt-devel
>> Right, so until 3.1 modems actually become available, it's probably best to
>> stick with a modem that already supports your subscribed speed, and manage
>> the bloat separately with shaping and AQM.
>>
>> - Jonathan Morton
>>
>
>
> _______________________________________________
> Cerowrt-devel mailing list
> Cerowrt-***@lists.bufferbloat.net
> https://lists.bufferbloat.net/listinfo/cerowrt-devel



--
Dave Täht
Let's make wifi fast, less jittery and reliable again!

https://plus.google.com/u/0/107942175615993706558/posts/TVX3o84jjmb
Livingood, Jason
2015-03-19 19:58:20 UTC
Permalink
On 3/19/15, 1:11 PM, "Dave Taht" <***@gmail.com> wrote:

>On Thu, Mar 19, 2015 at 6:53 AM, <***@reed.com> wrote:
>> How many years has it been since Comcast said they were going to fix
>>bufferbloat in their network within a year?

I¹m not sure anyone ever said it¹d take a year. If someone did (even if it
was me) then it was in the days when the problem appeared less complicated
than it is and I apologize for that. Let¹s face it - the problem is
complex and the software that has to be fixed is everywhere. As I said
about IPv6: if it were easy, it¹d be done by now. ;-)

>>It's almost as if the cable companies don't want OTT video or
>>simultaneous FTP and interactive gaming to work. Of course not. They'd
>>never do that.

Sorry, but that seems a bit unfair. It flies in the face of what we have
done and are doing. We¹ve underwritten some of Dave¹s work, we got
CableLabs to underwrite AQM work, and I personally pushed like heck to get
AQM built into the default D3.1 spec (had CTO-level awareness & support,
and was due to Greg White¹s work at CableLabs). We are starting to field
test D3.1 gear now, by the way. We made some bad bets too, such as trying
to underwrite an OpenWRT-related program with ISC, but not every tactic
will always be a winner.

As for existing D3.0 gear, it¹s not for lack of trying. Has any DOCSIS
network of any scale in the world solved it? If so, I have something to
use to learn from and apply here at Comcast - and I¹d **love** an
introduction to someone who has so I can get this info.

But usually there are rational explanations for why something is still not
done. One of them is that the at-scale operational issues are more
complicated that some people realize. And there is always a case of
prioritization - meaning things like running out of IPv4 addresses and not
having service trump more subtle things like buffer bloat (and the effort
to get vendors to support v6 has been tremendous).

>I do understand there are strong forces against us, especially in the USA.

I¹m not sure there are any forces against this issue. It¹s more a question
of awareness - it is not apparent it is more urgent than other work in
everyone¹s backlog. For example, the number of ISP customers even aware of
buffer bloat is probably 0.001%; if customers aren¹t asking for it, the
product managers have a tough time arguing to prioritize buffer bloat work
over new feature X or Y.

One suggestion I have made to increase awareness is that there be a nice,
web-based, consumer-friendly latency under load / bloat test that you
could get people to run as they do speed tests today. (If someone thinks
they can actually deliver this, I will try to fund it - ping me off-list.)
I also think a better job can be done explaining buffer bloat - it¹s hard
to make an Œelevator pitch¹ about it.

It reminds me a bit of IPv6 several years ago. Rather than saying in
essence Œyou operators are dummies¹ for not already fixing this, maybe
assume the engineers all Œget it¹ and what to do it. Because we really do
get it and want to do something about it. Then ask those operators what
they need to convince their leadership and their suppliers and product
managers and whomever else that it needs to be resourced more effectively
(see above for example).

We¹re at least part of the way there in DOCSIS networks. It is in D3.1 by
default, and we¹re starting trials now. And probably within 18-24 months
we won¹t buy any DOCSIS CPE that is not 3.1.

The question for me is how and when to address it in DOCSIS 3.0.

- Jason
d***@reed.com
2015-03-19 20:29:21 UTC
Permalink
I do think engineers operating networks get it, and that Comcast's engineers really get it, as I clarified in my followup note.

The issue is indeed prioritization of investment, engineering resources and management attention. The teams at Comcast in the engineering side have been the leaders in "bufferbloat minimizing" work, and I think they should get more recognition for that.

I disagree a little bit about not having a test that shows the issue, and the value the test would have in demonstrating the issue to users. Netalyzer has been doing an amazing job on this since before the bufferbloat term was invented. Every time I've talked about this issue I've suggested running Netalyzer, so I have a personal set of comments from people all over the world who run Netalyzer on their home networks, on hotel networks, etc.

When I have brought up these measurements from Netalyzr (which are not aimed at showing the problem as users experience) I observe an interesting reaction from many industry insiders: the results are not "sexy enough for stupid users" and also "no one will care".

I think the reaction characterizes the problem correctly - but the second part is the most serious objection. People don't need a measurement tool, they need to know that this is why their home network sucks sometimes.





On Thursday, March 19, 2015 3:58pm, "Livingood, Jason" <***@cable.comcast.com> said:

> On 3/19/15, 1:11 PM, "Dave Taht" <***@gmail.com> wrote:
>
>>On Thu, Mar 19, 2015 at 6:53 AM, <***@reed.com> wrote:
>>> How many years has it been since Comcast said they were going to fix
>>>bufferbloat in their network within a year?
>
> I¹m not sure anyone ever said it¹d take a year. If someone did (even if
> it
> was me) then it was in the days when the problem appeared less complicated
> than it is and I apologize for that. Let¹s face it - the problem is
> complex and the software that has to be fixed is everywhere. As I said
> about IPv6: if it were easy, it¹d be done by now. ;-)
>
>>>It's almost as if the cable companies don't want OTT video or
>>>simultaneous FTP and interactive gaming to work. Of course not. They'd
>>>never do that.
>
> Sorry, but that seems a bit unfair. It flies in the face of what we have
> done and are doing. We¹ve underwritten some of Dave¹s work, we got
> CableLabs to underwrite AQM work, and I personally pushed like heck to get
> AQM built into the default D3.1 spec (had CTO-level awareness & support,
> and was due to Greg White¹s work at CableLabs). We are starting to field
> test D3.1 gear now, by the way. We made some bad bets too, such as trying
> to underwrite an OpenWRT-related program with ISC, but not every tactic
> will always be a winner.
>
> As for existing D3.0 gear, it¹s not for lack of trying. Has any DOCSIS
> network of any scale in the world solved it? If so, I have something to
> use to learn from and apply here at Comcast - and I¹d **love** an
> introduction to someone who has so I can get this info.
>
> But usually there are rational explanations for why something is still not
> done. One of them is that the at-scale operational issues are more
> complicated that some people realize. And there is always a case of
> prioritization - meaning things like running out of IPv4 addresses and not
> having service trump more subtle things like buffer bloat (and the effort
> to get vendors to support v6 has been tremendous).
>
>>I do understand there are strong forces against us, especially in the USA.
>
> I¹m not sure there are any forces against this issue. It¹s more a
> question
> of awareness - it is not apparent it is more urgent than other work in
> everyone¹s backlog. For example, the number of ISP customers even aware of
> buffer bloat is probably 0.001%; if customers aren¹t asking for it, the
> product managers have a tough time arguing to prioritize buffer bloat work
> over new feature X or Y.
>
> One suggestion I have made to increase awareness is that there be a nice,
> web-based, consumer-friendly latency under load / bloat test that you
> could get people to run as they do speed tests today. (If someone thinks
> they can actually deliver this, I will try to fund it - ping me off-list.)
> I also think a better job can be done explaining buffer bloat - it¹s hard
> to make an Œelevator pitch¹ about it.
>
> It reminds me a bit of IPv6 several years ago. Rather than saying in
> essence Œyou operators are dummies¹ for not already fixing this, maybe
> assume the engineers all Œget it¹ and what to do it. Because we really
> do
> get it and want to do something about it. Then ask those operators what
> they need to convince their leadership and their suppliers and product
> managers and whomever else that it needs to be resourced more effectively
> (see above for example).
>
> We¹re at least part of the way there in DOCSIS networks. It is in D3.1 by
> default, and we¹re starting trials now. And probably within 18-24 months
> we won¹t buy any DOCSIS CPE that is not 3.1.
>
> The question for me is how and when to address it in DOCSIS 3.0.
>
> - Jason
>
>
>
>
Greg White
2015-03-19 23:18:10 UTC
Permalink
Netalyzr is great for network geeks, hardly consumer-friendly, and even so
the "network buffer measurements" part is buried in 150 other statistics.
Why couldn't Ookla* add a simultaneous "ping" test to their throughput
test? When was the last time someone leaned on them?


*I realize not everyone likes the Ookla tool, but it is popular and about
as "sexy" as you are going to get with a network performance tool.

-Greg



On 3/19/15, 2:29 PM, "***@reed.com" <***@reed.com> wrote:

>I do think engineers operating networks get it, and that Comcast's
>engineers really get it, as I clarified in my followup note.
>
>The issue is indeed prioritization of investment, engineering resources
>and management attention. The teams at Comcast in the engineering side
>have been the leaders in "bufferbloat minimizing" work, and I think they
>should get more recognition for that.
>
>I disagree a little bit about not having a test that shows the issue, and
>the value the test would have in demonstrating the issue to users.
>Netalyzer has been doing an amazing job on this since before the
>bufferbloat term was invented. Every time I've talked about this issue
>I've suggested running Netalyzer, so I have a personal set of comments
>from people all over the world who run Netalyzer on their home networks,
>on hotel networks, etc.
>
>When I have brought up these measurements from Netalyzr (which are not
>aimed at showing the problem as users experience) I observe an
>interesting reaction from many industry insiders: the results are not
>"sexy enough for stupid users" and also "no one will care".
>
>I think the reaction characterizes the problem correctly - but the second
>part is the most serious objection. People don't need a measurement
>tool, they need to know that this is why their home network sucks
>sometimes.
>
>
>
>
>
>On Thursday, March 19, 2015 3:58pm, "Livingood, Jason"
><***@cable.comcast.com> said:
>
>> On 3/19/15, 1:11 PM, "Dave Taht" <***@gmail.com> wrote:
>>
>>>On Thu, Mar 19, 2015 at 6:53 AM, <***@reed.com> wrote:
>>>> How many years has it been since Comcast said they were going to fix
>>>>bufferbloat in their network within a year?
>>
>> I¹m not sure anyone ever said it¹d take a year. If someone did (even if
>> it
>> was me) then it was in the days when the problem appeared less
>>complicated
>> than it is and I apologize for that. Let¹s face it - the problem is
>> complex and the software that has to be fixed is everywhere. As I said
>> about IPv6: if it were easy, it¹d be done by now. ;-)
>>
>>>>It's almost as if the cable companies don't want OTT video or
>>>>simultaneous FTP and interactive gaming to work. Of course not. They'd
>>>>never do that.
>>
>> Sorry, but that seems a bit unfair. It flies in the face of what we have
>> done and are doing. We¹ve underwritten some of Dave¹s work, we got
>> CableLabs to underwrite AQM work, and I personally pushed like heck to
>>get
>> AQM built into the default D3.1 spec (had CTO-level awareness & support,
>> and was due to Greg White¹s work at CableLabs). We are starting to field
>> test D3.1 gear now, by the way. We made some bad bets too, such as
>>trying
>> to underwrite an OpenWRT-related program with ISC, but not every tactic
>> will always be a winner.
>>
>> As for existing D3.0 gear, it¹s not for lack of trying. Has any DOCSIS
>> network of any scale in the world solved it? If so, I have something to
>> use to learn from and apply here at Comcast - and I¹d **love** an
>> introduction to someone who has so I can get this info.
>>
>> But usually there are rational explanations for why something is still
>>not
>> done. One of them is that the at-scale operational issues are more
>> complicated that some people realize. And there is always a case of
>> prioritization - meaning things like running out of IPv4 addresses and
>>not
>> having service trump more subtle things like buffer bloat (and the
>>effort
>> to get vendors to support v6 has been tremendous).
>>
>>>I do understand there are strong forces against us, especially in the
>>>USA.
>>
>> I¹m not sure there are any forces against this issue. It¹s more a
>> question
>> of awareness - it is not apparent it is more urgent than other work in
>> everyone¹s backlog. For example, the number of ISP customers even aware
>>of
>> buffer bloat is probably 0.001%; if customers aren¹t asking for it, the
>> product managers have a tough time arguing to prioritize buffer bloat
>>work
>> over new feature X or Y.
>>
>> One suggestion I have made to increase awareness is that there be a
>>nice,
>> web-based, consumer-friendly latency under load / bloat test that you
>> could get people to run as they do speed tests today. (If someone thinks
>> they can actually deliver this, I will try to fund it - ping me
>>off-list.)
>> I also think a better job can be done explaining buffer bloat - it¹s
>>hard
>> to make an Œelevator pitch¹ about it.
>>
>> It reminds me a bit of IPv6 several years ago. Rather than saying in
>> essence Œyou operators are dummies¹ for not already fixing this, maybe
>> assume the engineers all Œget it¹ and what to do it. Because we really
>> do
>> get it and want to do something about it. Then ask those operators what
>> they need to convince their leadership and their suppliers and product
>> managers and whomever else that it needs to be resourced more
>>effectively
>> (see above for example).
>>
>> We¹re at least part of the way there in DOCSIS networks. It is in D3.1
>>by
>> default, and we¹re starting trials now. And probably within 18-24 months
>> we won¹t buy any DOCSIS CPE that is not 3.1.
>>
>> The question for me is how and when to address it in DOCSIS 3.0.
>>
>> - Jason
>>
>>
>>
>>
>
>
>_______________________________________________
>Bloat mailing list
>***@lists.bufferbloat.net
>https://lists.bufferbloat.net/listinfo/bloat
MUSCARIELLO Luca IMT/OLN
2015-03-20 08:18:35 UTC
Permalink
I agree. Having that ping included in Ookla would help a lot more

Luca


On 03/20/2015 12:18 AM, Greg White wrote:
> Netalyzr is great for network geeks, hardly consumer-friendly, and even so
> the "network buffer measurements" part is buried in 150 other statistics.
> Why couldn't Ookla* add a simultaneous "ping" test to their throughput
> test? When was the last time someone leaned on them?
>
>
> *I realize not everyone likes the Ookla tool, but it is popular and about
> as "sexy" as you are going to get with a network performance tool.
>
> -Greg
>
>
>
> On 3/19/15, 2:29 PM, "***@reed.com" <***@reed.com> wrote:
>
>> I do think engineers operating networks get it, and that Comcast's
>> engineers really get it, as I clarified in my followup note.
>>
>> The issue is indeed prioritization of investment, engineering resources
>> and management attention. The teams at Comcast in the engineering side
>> have been the leaders in "bufferbloat minimizing" work, and I think they
>> should get more recognition for that.
>>
>> I disagree a little bit about not having a test that shows the issue, and
>> the value the test would have in demonstrating the issue to users.
>> Netalyzer has been doing an amazing job on this since before the
>> bufferbloat term was invented. Every time I've talked about this issue
>> I've suggested running Netalyzer, so I have a personal set of comments
> >from people all over the world who run Netalyzer on their home networks,
>> on hotel networks, etc.
>>
>> When I have brought up these measurements from Netalyzr (which are not
>> aimed at showing the problem as users experience) I observe an
>> interesting reaction from many industry insiders: the results are not
>> "sexy enough for stupid users" and also "no one will care".
>>
>> I think the reaction characterizes the problem correctly - but the second
>> part is the most serious objection. People don't need a measurement
>> tool, they need to know that this is why their home network sucks
>> sometimes.
>>
>>
>>
>>
>>
>> On Thursday, March 19, 2015 3:58pm, "Livingood, Jason"
>> <***@cable.comcast.com> said:
>>
>>> On 3/19/15, 1:11 PM, "Dave Taht" <***@gmail.com> wrote:
>>>
>>>> On Thu, Mar 19, 2015 at 6:53 AM, <***@reed.com> wrote:
>>>>> How many years has it been since Comcast said they were going to fix
>>>>> bufferbloat in their network within a year?
>>> I¹m not sure anyone ever said it¹d take a year. If someone did (even if
>>> it
>>> was me) then it was in the days when the problem appeared less
>>> complicated
>>> than it is and I apologize for that. Let¹s face it - the problem is
>>> complex and the software that has to be fixed is everywhere. As I said
>>> about IPv6: if it were easy, it¹d be done by now. ;-)
>>>
>>>>> It's almost as if the cable companies don't want OTT video or
>>>>> simultaneous FTP and interactive gaming to work. Of course not. They'd
>>>>> never do that.
>>> Sorry, but that seems a bit unfair. It flies in the face of what we have
>>> done and are doing. We¹ve underwritten some of Dave¹s work, we got
>>> CableLabs to underwrite AQM work, and I personally pushed like heck to
>>> get
>>> AQM built into the default D3.1 spec (had CTO-level awareness & support,
>>> and was due to Greg White¹s work at CableLabs). We are starting to field
>>> test D3.1 gear now, by the way. We made some bad bets too, such as
>>> trying
>>> to underwrite an OpenWRT-related program with ISC, but not every tactic
>>> will always be a winner.
>>>
>>> As for existing D3.0 gear, it¹s not for lack of trying. Has any DOCSIS
>>> network of any scale in the world solved it? If so, I have something to
>>> use to learn from and apply here at Comcast - and I¹d **love** an
>>> introduction to someone who has so I can get this info.
>>>
>>> But usually there are rational explanations for why something is still
>>> not
>>> done. One of them is that the at-scale operational issues are more
>>> complicated that some people realize. And there is always a case of
>>> prioritization - meaning things like running out of IPv4 addresses and
>>> not
>>> having service trump more subtle things like buffer bloat (and the
>>> effort
>>> to get vendors to support v6 has been tremendous).
>>>
>>>> I do understand there are strong forces against us, especially in the
>>>> USA.
>>> I¹m not sure there are any forces against this issue. It¹s more a
>>> question
>>> of awareness - it is not apparent it is more urgent than other work in
>>> everyone¹s backlog. For example, the number of ISP customers even aware
>>> of
>>> buffer bloat is probably 0.001%; if customers aren¹t asking for it, the
>>> product managers have a tough time arguing to prioritize buffer bloat
>>> work
>>> over new feature X or Y.
>>>
>>> One suggestion I have made to increase awareness is that there be a
>>> nice,
>>> web-based, consumer-friendly latency under load / bloat test that you
>>> could get people to run as they do speed tests today. (If someone thinks
>>> they can actually deliver this, I will try to fund it - ping me
>>> off-list.)
>>> I also think a better job can be done explaining buffer bloat - it¹s
>>> hard
>>> to make an Œelevator pitch¹ about it.
>>>
>>> It reminds me a bit of IPv6 several years ago. Rather than saying in
>>> essence Œyou operators are dummies¹ for not already fixing this, maybe
>>> assume the engineers all Œget it¹ and what to do it. Because we really
>>> do
>>> get it and want to do something about it. Then ask those operators what
>>> they need to convince their leadership and their suppliers and product
>>> managers and whomever else that it needs to be resourced more
>>> effectively
>>> (see above for example).
>>>
>>> We¹re at least part of the way there in DOCSIS networks. It is in D3.1
>>> by
>>> default, and we¹re starting trials now. And probably within 18-24 months
>>> we won¹t buy any DOCSIS CPE that is not 3.1.
>>>
>>> The question for me is how and when to address it in DOCSIS 3.0.
>>>
>>> - Jason
>>>
>>>
>>>
>>>
>>
David P. Reed
2015-03-20 13:31:29 UTC
Permalink
The mystery in most users' minds is that ping at a time when there is no load does tell them anything at all about why the network connection will such when their kid is uploading to youtube.

So giving them ping time is meaningless.
I think most network engineers think ping time is a useful measure of a badly bufferbloated system. It is not.

The only measure is ping time under maximum load of raw packets.

And that requires a way to test maximum load rtt.

There is no problem with that ... other than that to understand why and how that is relevant you have to understand Internet congestion control.

Having had to testify before CRTC about this, I learned that most access providers (the Canadian ones) claim that such measurements are never made as a measure of quality, and that you can calculate expected latency by using Little's lemma from average throughput. And that dropped packets are the right measure of quality of service.

Ookla ping time is useless in a context where even the "experts" wearing ties from the top grossing Internet firms are so confused. And maybe deliberately misleading on purpose... they had to be forced to provide any data they had about congestion in their networks by a ruling during the proceeding and then responded that they had no data - they never measured queueing delay and disputed that it mattered. The proper measure of congestion was throughput.

I kid you not.

So Ookla ping time is useless against such public ignorance.



That's completely wrong for

On Mar 20, 2015, MUSCARIELLO Luca IMT/OLN <***@orange.com> wrote:
>I agree. Having that ping included in Ookla would help a lot more
>
>Luca
>
>
>On 03/20/2015 12:18 AM, Greg White wrote:
>> Netalyzr is great for network geeks, hardly consumer-friendly, and
>even so
>> the "network buffer measurements" part is buried in 150 other
>statistics.
>> Why couldn't Ookla* add a simultaneous "ping" test to their
>throughput
>> test? When was the last time someone leaned on them?
>>
>>
>> *I realize not everyone likes the Ookla tool, but it is popular and
>about
>> as "sexy" as you are going to get with a network performance tool.
>>
>> -Greg
>>
>>
>>
>> On 3/19/15, 2:29 PM, "***@reed.com" <***@reed.com> wrote:
>>
>>> I do think engineers operating networks get it, and that Comcast's
>>> engineers really get it, as I clarified in my followup note.
>>>
>>> The issue is indeed prioritization of investment, engineering
>resources
>>> and management attention. The teams at Comcast in the engineering
>side
>>> have been the leaders in "bufferbloat minimizing" work, and I think
>they
>>> should get more recognition for that.
>>>
>>> I disagree a little bit about not having a test that shows the
>issue, and
>>> the value the test would have in demonstrating the issue to users.
>>> Netalyzer has been doing an amazing job on this since before the
>>> bufferbloat term was invented. Every time I've talked about this
>issue
>>> I've suggested running Netalyzer, so I have a personal set of
>comments
>> >from people all over the world who run Netalyzer on their home
>networks,
>>> on hotel networks, etc.
>>>
>>> When I have brought up these measurements from Netalyzr (which are
>not
>>> aimed at showing the problem as users experience) I observe an
>>> interesting reaction from many industry insiders: the results are
>not
>>> "sexy enough for stupid users" and also "no one will care".
>>>
>>> I think the reaction characterizes the problem correctly - but the
>second
>>> part is the most serious objection. People don't need a measurement
>>> tool, they need to know that this is why their home network sucks
>>> sometimes.
>>>
>>>
>>>
>>>
>>>
>>> On Thursday, March 19, 2015 3:58pm, "Livingood, Jason"
>>> <***@cable.comcast.com> said:
>>>
>>>> On 3/19/15, 1:11 PM, "Dave Taht" <***@gmail.com> wrote:
>>>>
>>>>> On Thu, Mar 19, 2015 at 6:53 AM, <***@reed.com> wrote:
>>>>>> How many years has it been since Comcast said they were going to
>fix
>>>>>> bufferbloat in their network within a year?
>>>> I¹m not sure anyone ever said it¹d take a year. If someone did
>(even if
>>>> it
>>>> was me) then it was in the days when the problem appeared less
>>>> complicated
>>>> than it is and I apologize for that. Let¹s face it - the problem is
>>>> complex and the software that has to be fixed is everywhere. As I
>said
>>>> about IPv6: if it were easy, it¹d be done by now. ;-)
>>>>
>>>>>> It's almost as if the cable companies don't want OTT video or
>>>>>> simultaneous FTP and interactive gaming to work. Of course not.
>They'd
>>>>>> never do that.
>>>> Sorry, but that seems a bit unfair. It flies in the face of what we
>have
>>>> done and are doing. We¹ve underwritten some of Dave¹s work, we got
>>>> CableLabs to underwrite AQM work, and I personally pushed like heck
>to
>>>> get
>>>> AQM built into the default D3.1 spec (had CTO-level awareness &
>support,
>>>> and was due to Greg White¹s work at CableLabs). We are starting to
>field
>>>> test D3.1 gear now, by the way. We made some bad bets too, such as
>>>> trying
>>>> to underwrite an OpenWRT-related program with ISC, but not every
>tactic
>>>> will always be a winner.
>>>>
>>>> As for existing D3.0 gear, it¹s not for lack of trying. Has any
>DOCSIS
>>>> network of any scale in the world solved it? If so, I have
>something to
>>>> use to learn from and apply here at Comcast - and I¹d **love** an
>>>> introduction to someone who has so I can get this info.
>>>>
>>>> But usually there are rational explanations for why something is
>still
>>>> not
>>>> done. One of them is that the at-scale operational issues are more
>>>> complicated that some people realize. And there is always a case of
>>>> prioritization - meaning things like running out of IPv4 addresses
>and
>>>> not
>>>> having service trump more subtle things like buffer bloat (and the
>>>> effort
>>>> to get vendors to support v6 has been tremendous).
>>>>
>>>>> I do understand there are strong forces against us, especially in
>the
>>>>> USA.
>>>> I¹m not sure there are any forces against this issue. It¹s more a
>>>> question
>>>> of awareness - it is not apparent it is more urgent than other work
>in
>>>> everyone¹s backlog. For example, the number of ISP customers even
>aware
>>>> of
>>>> buffer bloat is probably 0.001%; if customers aren¹t asking for it,
>the
>>>> product managers have a tough time arguing to prioritize buffer
>bloat
>>>> work
>>>> over new feature X or Y.
>>>>
>>>> One suggestion I have made to increase awareness is that there be a
>>>> nice,
>>>> web-based, consumer-friendly latency under load / bloat test that
>you
>>>> could get people to run as they do speed tests today. (If someone
>thinks
>>>> they can actually deliver this, I will try to fund it - ping me
>>>> off-list.)
>>>> I also think a better job can be done explaining buffer bloat -
>it¹s
>>>> hard
>>>> to make an Œelevator pitch¹ about it.
>>>>
>>>> It reminds me a bit of IPv6 several years ago. Rather than saying
>in
>>>> essence Œyou operators are dummies¹ for not already fixing this,
>maybe
>>>> assume the engineers all Œget it¹ and what to do it. Because we
>really
>>>> do
>>>> get it and want to do something about it. Then ask those operators
>what
>>>> they need to convince their leadership and their suppliers and
>product
>>>> managers and whomever else that it needs to be resourced more
>>>> effectively
>>>> (see above for example).
>>>>
>>>> We¹re at least part of the way there in DOCSIS networks. It is in
>D3.1
>>>> by
>>>> default, and we¹re starting trials now. And probably within 18-24
>months
>>>> we won¹t buy any DOCSIS CPE that is not 3.1.
>>>>
>>>> The question for me is how and when to address it in DOCSIS 3.0.
>>>>
>>>> - Jason
>>>>
>>>>
>>>>
>>>>
>>>

-- Sent with K-@ Mail - the evolution of emailing.
Sebastian Moeller
2015-03-20 13:46:29 UTC
Permalink
Hi David,

On Mar 20, 2015, at 14:31 , David P. Reed <***@reed.com> wrote:

> The mystery in most users' minds is that ping at a time when there is no load does tell them anything at all about why the network connection will such when their kid is uploading to youtube.

But it does, by giving a baseline to compare the ping tim under load against ;)

>
> So giving them ping time is meaningless.
> I think most network engineers think ping time is a useful measure of a badly bufferbloated system. It is not.
>
> The only measure is ping time under maximum load of raw packets.

Why raw packets? But yes I agree; I think “ping” in this discussion here is short hand for "latency measurement under load” which writes a bit unwieldy. The typical speed tests are almost there as they already perform (half of) the create maximum load requirement for the additional measurements we need (as well as already measuring unloaded latency, they all already report a “ping” number back, but that is best case RTT, so the baseline with which to compare the latency under load number (well obviously both numbers should be measured exactly the same)). Measuring latency under simultaneous saturation of both up- and downlink would be even better, but measuring it during simplex saturation should already give meaningful numbers.
I think it would be great if speedtest sites could agree to measure and report such a number, so that end customers had data to base their ISP selection on (at least those fortunate few that actually have ISP choice…).

>
> And that requires a way to test maximum load rtt.
>
> There is no problem with that ... other than that to understand why and how that is relevant you have to understand Internet congestion control.
>
> Having had to testify before CRTC about this, I learned that most access providers (the Canadian ones) claim that such measurements are never made as a measure of quality, and that you can calculate expected latency by using Little's lemma from average throughput. And that dropped packets are the right measure of quality of service.
>
> Ookla ping time is useless in a context where even the "experts" wearing ties from the top grossing Internet firms are so confused. And maybe deliberately misleading on purpose... they had to be forced to provide any data they had about congestion in their networks by a ruling during the proceeding and then responded that they had no data - they never measured queueing delay and disputed that it mattered. The proper measure of congestion was throughput.
>
> I kid you not.
>
> So Ookla ping time is useless against such public ignorance.

But, if people make their choice of (higher/ more expensive) service tiers dependent on its behavior at “capacity” as approximated by a speedtest latency under (full) load test that would make it much easier for ISPs to actually respond to it; even marketing can realize that this can be monetized ;)

Best Regards
Sebastian


[...]
MUSCARIELLO Luca IMT/OLN
2015-03-20 14:05:36 UTC
Permalink
I don't know.
From my personal experience, I feel like the "expert" wearing ties
watch the speed meter and the needle moving across the red bar.

We just need to be sure about the colors: when the latency goes into the
crazy region
the needle has to cross a RED bar! GREEN is good, RED is bad (exceptions
apply in case of daltonism).

Maybe I'm oversimplifying... but not that much...

If your solution is to educate people with ties on Internet congestion
control I feel bad...

Luca


On 03/20/2015 02:31 PM, David P. Reed wrote:
> The mystery in most users' minds is that ping at a time when there is
> no load does tell them anything at all about why the network
> connection will such when their kid is uploading to youtube.
>
> So giving them ping time is meaningless.
> I think most network engineers think ping time is a useful measure of
> a badly bufferbloated system. It is not.
>
> The only measure is ping time under maximum load of raw packets.
>
> And that requires a way to test maximum load rtt.
>
> There is no problem with that ... other than that to understand why
> and how that is relevant you have to understand Internet congestion
> control.
>
> Having had to testify before CRTC about this, I learned that most
> access providers (the Canadian ones) claim that such measurements are
> never made as a measure of quality, and that you can calculate
> expected latency by using Little's lemma from average throughput. And
> that dropped packets are the right measure of quality of service.
>
> Ookla ping time is useless in a context where even the "experts"
> wearing ties from the top grossing Internet firms are so confused. And
> maybe deliberately misleading on purpose... they had to be forced to
> provide any data they had about congestion in their networks by a
> ruling during the proceeding and then responded that they had no data
> - they never measured queueing delay and disputed that it mattered.
> The proper measure of congestion was throughput.
>
> I kid you not.
>
> So Ookla ping time is useless against such public ignorance.
>
>
>
> That's completely wrong for
>
> On Mar 20, 2015, MUSCARIELLO Luca IMT/OLN
> <***@orange.com> wrote:
>
> I agree. Having that ping included in Ookla would help a lot more
>
> Luca
>
>
> On 03/20/2015 12:18 AM, Greg White wrote:
>
> Netalyzr is great for network geeks, hardly consumer-friendly,
> and even so
> the "network buffer measurements" part is buried in 150 other
> statistics.
> Why couldn't Ookla* add a simultaneous "ping" test to their
> throughput
> test? When was the last time someone leaned on them?
>
>
> *I realize not everyone likes the Ookla tool, but it is
> popular and about
> as "sexy" as you are going to get with a network performance tool.
>
> -Greg
>
>
>
> On 3/19/15, 2:29 PM, "***@reed.com" <***@reed.com> wrote:
>
> I do think engineers operating networks get it, and that
> Comcast's
> engineers really get it, as I clarified in my followup note.
>
> The issue is indeed prioritization of investment,
> engineering resources
> and management attention. The teams at Comcast in the
> engineering side
> have been the leaders in "bufferbloat minimizing" work,
> and I think they
> should get more recognition for that.
>
> I disagree a little bit about not having a test that shows
> the issue, and
> the value the test would have in demonstrating the issue
> to users.
> Netalyzer has been doing an amazing job on this since
> before the
> bufferbloat term was invented. Every time I've talked
> about this issue
> I've suggested running Netalyzer, so I have a personal set
> of comments
> from people all over the world who run Netalyzer on their
> home networks,
> on hotel networks, etc.
>
> When I have brought up these measurements from Netalyzr
> (which are not
> aimed at showing the problem as users experience) I observe an
> interesting reaction from many industry insiders: the
> results are not
> "sexy enough for stupid users" and also "no one will care".
>
> I think the reaction characterizes the problem correctly -
> but the second
> part is the most serious objection. People don't need a
> measurement
> tool, they need to know that this is why their home
> network sucks
> sometimes.
>
>
>
>
>
> On Thursday, March 19, 2015 3:58pm, "Livingood, Jason"
> <***@cable.comcast.com> said:
>
> On 3/19/15, 1:11 PM, "Dave Taht" <***@gmail.com>
> wrote:
>
> On Thu, Mar 19, 2015 at 6:53 AM, <***@reed.com>
> wrote:
>
> How many years has it been since Comcast said
> they were going to fix
> bufferbloat in their network within a year?
>
> I¹m not sure anyone ever said it¹d take a year. If
> someone did (even if
> it
> was me) then it was in the days when the problem
> appeared less
> complicated
> than it is and I apologize for that. Let¹s face it -
> the problem is
> complex and the software that has to be fixed is
> everywhere. As I said
> about IPv6: if it were easy, it¹d be done by now. ;-)
>
> It's almost as if the cable companies don't
> want OTT video or
> simultaneous FTP and interactive gaming to
> work. Of course not. They'd
> never do that.
>
> Sorry, but that seems a bit unfair. It flies in the
> face of what we have
> done and are doing. We¹ve underwritten some of Dave¹s
> work, we got
> CableLabs to underwrite AQM work, and I personally
> pushed like heck to
> get
> AQM built into the default D3.1 spec (had CTO-level
> awareness & support,
> and was due to Greg White¹s work at CableLabs). We are
> starting to field
> test D3.1 gear now, by the way. We made some bad bets
> too, such as
> trying
> to underwrite an OpenWRT-related program with ISC, but
> not every tactic
> will always be a winner.
>
> As for existing D3.0 gear, it¹s not for lack of
> trying. Has any DOCSIS
> network of any scale in the world solved it? If so, I
> have something to
> use to learn from and apply here at Comcast - and I¹d
> **love** an
> introduction to someone who has so I can get this info.
>
> But usually there are rational explanations for why
> something is still
> not
> done. One of them is that the at-scale operational
> issues are more
> complicated that some people realize. And there is
> always a case of
> prioritization - meaning things like running out of
> IPv4 addresses and
> not
> having service trump more subtle things like buffer
> bloat (and the
> effort
> to get vendors to support v6 has been tremendous).
>
> I do understand there are strong forces against
> us, especially in the
> USA.
>
> I¹m not sure there are any forces against this issue.
> It¹s more a
> question
> of awareness - it is not apparent it is more urgent
> than other work in
> everyone¹s backlog. For example, the number of ISP
> customers even aware
> of
> buffer bloat is probably 0.001%; if customers aren¹t
> asking for it, the
> product managers have a tough time arguing to
> prioritize buffer bloat
> work
> over new feature X or Y.
>
> One suggestion I have made to increase awareness is
> that there be a
> nice,
> web-based, consumer-friendly latency under load /
> bloat test that you
> could get people to run as they do speed tests today.
> (If someone thinks
> they can actually deliver this, I will try to fund it
> - ping me
> off-list.)
> I also think a better job can be done explaining
> buffer bloat - it¹s
> hard
> to make an Œelevator pitch¹ about it.
>
> It reminds me a bit of IPv6 several years ago. Rather
> than saying in
> essence Œyou operators are dummies¹ for not already
> fixing this, maybe
> assume the engineers all Œget it¹ and what to do it.
> Because we really
> do
> get it and want to do something about it. Then ask
> those operators what
> they need to convince their leadership and their
> suppliers and product
> managers and whomever else that it needs to be
> resourced more
> effectively
> (see above for example).
>
> We¹re at least part of the way there in DOCSIS
> networks. It is in D3.1
> by
> default, and we¹re starting trials now. And probably
> within 18-24 months
> we won¹t buy any DOCSIS CPE that is not 3.1.
>
> The question for me is how and when to address it in
> DOCSIS 3.0.
>
> - Jason
>
>
>
>
>
>
>
>
> -- Sent with *K-@ Mail
> <https://play.google.com/store/apps/details?id=com.onegravity.k10.pro2>*
> - the evolution of emailing.
Sebastian Moeller
2015-03-20 10:07:19 UTC
Permalink
Hi All,

I guess I have nothing to say that most of you don’t know already, but...

On Mar 20, 2015, at 00:18 , Greg White <***@CableLabs.com> wrote:

> Netalyzr is great for network geeks, hardly consumer-friendly, and even so
> the "network buffer measurements" part is buried in 150 other statistics.

The bigger issue with netalyzr is that it is a worst case probe with an unrelenting UDP “flood” that does not measure the “responsiveness/latency” of unrelated flows concurrently. In all fairness it not even tests the worst case as it floods up- and downlink sequentially and it seems to use the same port for all packets. This kind of traffic is well suited to measure the worst case buffering for misbehaving ((D)DOS) flows, not necessarily the amount of effective buffering well behaved flows encounter.
And then the help text related to “network buffer measurements” section in the results report seems to be actually misleading in that the used DOS traffic is assumed to be representative of normal traffic (also it does not allow for AQMs that manage normal responsive traffic better).
It would be so sweet, if they could also measure the ICMP RTT (or another type of timestamped tcp or udp flow) to say a well connected CDN concurrently to give a first approximation about the effect of link saturation on other competing flows; and then report the amount of change in that number caused by link saturation as the actual indicator of effective buffering...


> Why couldn't Ookla* add a simultaneous "ping" test to their throughput
> test? When was the last time someone leaned on them?
>
>
> *I realize not everyone likes the Ookla tool, but it is popular and about
> as "sexy" as you are going to get with a network performance tool.

I think you are right; instead of trying to get better tools out we might have a better chance of getting small modifications into existing tools.

Best Regards
Sebastian

>
> -Greg
>
>
>
> On 3/19/15, 2:29 PM, "***@reed.com" <***@reed.com> wrote:
>
>> I do think engineers operating networks get it, and that Comcast's
>> engineers really get it, as I clarified in my followup note.
>>
>> The issue is indeed prioritization of investment, engineering resources
>> and management attention. The teams at Comcast in the engineering side
>> have been the leaders in "bufferbloat minimizing" work, and I think they
>> should get more recognition for that.
>>
>> I disagree a little bit about not having a test that shows the issue, and
>> the value the test would have in demonstrating the issue to users.
>> Netalyzer has been doing an amazing job on this since before the
>> bufferbloat term was invented. Every time I've talked about this issue
>> I've suggested running Netalyzer, so I have a personal set of comments
>> from people all over the world who run Netalyzer on their home networks,
>> on hotel networks, etc.
>>
>> When I have brought up these measurements from Netalyzr (which are not
>> aimed at showing the problem as users experience) I observe an
>> interesting reaction from many industry insiders: the results are not
>> "sexy enough for stupid users" and also "no one will care".
>>
>> I think the reaction characterizes the problem correctly - but the second
>> part is the most serious objection. People don't need a measurement
>> tool, they need to know that this is why their home network sucks
>> sometimes.
>>
>>
>>
>>
>>
>> On Thursday, March 19, 2015 3:58pm, "Livingood, Jason"
>> <***@cable.comcast.com> said:
>>
>>> On 3/19/15, 1:11 PM, "Dave Taht" <***@gmail.com> wrote:
>>>
>>>> On Thu, Mar 19, 2015 at 6:53 AM, <***@reed.com> wrote:
>>>>> How many years has it been since Comcast said they were going to fix
>>>>> bufferbloat in their network within a year?
>>>
>>> I¹m not sure anyone ever said it¹d take a year. If someone did (even if
>>> it
>>> was me) then it was in the days when the problem appeared less
>>> complicated
>>> than it is and I apologize for that. Let¹s face it - the problem is
>>> complex and the software that has to be fixed is everywhere. As I said
>>> about IPv6: if it were easy, it¹d be done by now. ;-)
>>>
>>>>> It's almost as if the cable companies don't want OTT video or
>>>>> simultaneous FTP and interactive gaming to work. Of course not. They'd
>>>>> never do that.
>>>
>>> Sorry, but that seems a bit unfair. It flies in the face of what we have
>>> done and are doing. We¹ve underwritten some of Dave¹s work, we got
>>> CableLabs to underwrite AQM work, and I personally pushed like heck to
>>> get
>>> AQM built into the default D3.1 spec (had CTO-level awareness & support,
>>> and was due to Greg White¹s work at CableLabs). We are starting to field
>>> test D3.1 gear now, by the way. We made some bad bets too, such as
>>> trying
>>> to underwrite an OpenWRT-related program with ISC, but not every tactic
>>> will always be a winner.
>>>
>>> As for existing D3.0 gear, it¹s not for lack of trying. Has any DOCSIS
>>> network of any scale in the world solved it? If so, I have something to
>>> use to learn from and apply here at Comcast - and I¹d **love** an
>>> introduction to someone who has so I can get this info.
>>>
>>> But usually there are rational explanations for why something is still
>>> not
>>> done. One of them is that the at-scale operational issues are more
>>> complicated that some people realize. And there is always a case of
>>> prioritization - meaning things like running out of IPv4 addresses and
>>> not
>>> having service trump more subtle things like buffer bloat (and the
>>> effort
>>> to get vendors to support v6 has been tremendous).
>>>
>>>> I do understand there are strong forces against us, especially in the
>>>> USA.
>>>
>>> I¹m not sure there are any forces against this issue. It¹s more a
>>> question
>>> of awareness - it is not apparent it is more urgent than other work in
>>> everyone¹s backlog. For example, the number of ISP customers even aware
>>> of
>>> buffer bloat is probably 0.001%; if customers aren¹t asking for it, the
>>> product managers have a tough time arguing to prioritize buffer bloat
>>> work
>>> over new feature X or Y.
>>>
>>> One suggestion I have made to increase awareness is that there be a
>>> nice,
>>> web-based, consumer-friendly latency under load / bloat test that you
>>> could get people to run as they do speed tests today. (If someone thinks
>>> they can actually deliver this, I will try to fund it - ping me
>>> off-list.)
>>> I also think a better job can be done explaining buffer bloat - it¹s
>>> hard
>>> to make an Œelevator pitch¹ about it.
>>>
>>> It reminds me a bit of IPv6 several years ago. Rather than saying in
>>> essence Œyou operators are dummies¹ for not already fixing this, maybe
>>> assume the engineers all Œget it¹ and what to do it. Because we really
>>> do
>>> get it and want to do something about it. Then ask those operators what
>>> they need to convince their leadership and their suppliers and product
>>> managers and whomever else that it needs to be resourced more
>>> effectively
>>> (see above for example).
>>>
>>> We¹re at least part of the way there in DOCSIS networks. It is in D3.1
>>> by
>>> default, and we¹re starting trials now. And probably within 18-24 months
>>> we won¹t buy any DOCSIS CPE that is not 3.1.
>>>
>>> The question for me is how and when to address it in DOCSIS 3.0.
>>>
>>> - Jason
>>>
>>>
>>>
>>>
>>
>>
>> _______________________________________________
>> Bloat mailing list
>> ***@lists.bufferbloat.net
>> https://lists.bufferbloat.net/listinfo/bloat
>
> _______________________________________________
> Cerowrt-devel mailing list
> Cerowrt-***@lists.bufferbloat.net
> https://lists.bufferbloat.net/listinfo/cerowrt-devel
Rich Brown
2015-03-20 13:50:13 UTC
Permalink
On Mar 19, 2015, at 7:18 PM, Greg White <***@CableLabs.com> wrote:

> Netalyzr is great for network geeks, hardly consumer-friendly, and even so
> the "network buffer measurements" part is buried in 150 other statistics.
> Why couldn't Ookla* add a simultaneous "ping" test to their throughput
> test? When was the last time someone leaned on them?
>
>
> *I realize not everyone likes the Ookla tool, but it is popular and about
> as "sexy" as you are going to get with a network performance tool.
>
> -Greg

Back in July, I contacted the support groups at Ookla, speedof.me, and testmy.net, and all three responded, "Hmmm... We'll refer that to our techies for review." and I never heard back.

It seems to be hard to attract attention when there's only one voice crying in the wilderness. It might be worth sending a note to:

- Speedtest.net <***@speedtest.net> or open a ticket at: https://www.ookla.com/support
- SpeedOfMe <***@speedof.me>
- TestMyNet <***@testmy.net>

I append my (somewhat edited) note from July for your email drafting pleasure.

Rich

--- Sample Letter ---

Subject: Add latency measurements (min/max)

I have been using NAME-OF-SERVICE for quite a while to measure my network's performance. I had a couple thoughts that could make it more useful to me and others who want to test their network.

Your page currently displays a single "latency" value of the ping time before the data transfers begin. It would be really helpful to report real-time min/max latency measurements made *during the up and downloads*.

Why is latency interesting? Because when it's not well controlled, it completely destroys people's internet for voice, gaming, other time-sensitive traffic, and even everyday web browsing. As you may know, many routers (home and otherwise) buffer more data than can be sent, and this can dramatically affect latency for everyone using that router.

I'm asking you to consider implementing the web-equivalent of the "Quick Test for Bufferbloat" that's on the Bufferbloat site. (I'm a member of the Bufferbloat team.) http://www.bufferbloat.net/projects/cerowrt/wiki/Quick_Test_for_Bufferbloat

Please get back to me if you have questions.

Many thanks!

YOUR NAME
YOUR SIG

--- end of sample ---

> On 3/19/15, 2:29 PM, "***@reed.com" <***@reed.com> wrote:
>
>> I do think engineers operating networks get it, and that Comcast's
>> engineers really get it, as I clarified in my followup note.
>>
>> The issue is indeed prioritization of investment, engineering resources
>> and management attention. The teams at Comcast in the engineering side
>> have been the leaders in "bufferbloat minimizing" work, and I think they
>> should get more recognition for that.
>>
>> I disagree a little bit about not having a test that shows the issue, and
>> the value the test would have in demonstrating the issue to users.
>> Netalyzer has been doing an amazing job on this since before the
>> bufferbloat term was invented. Every time I've talked about this issue
>> I've suggested running Netalyzer, so I have a personal set of comments
>> from people all over the world who run Netalyzer on their home networks,
>> on hotel networks, etc.
>>
>> When I have brought up these measurements from Netalyzr (which are not
>> aimed at showing the problem as users experience) I observe an
>> interesting reaction from many industry insiders: the results are not
>> "sexy enough for stupid users" and also "no one will care".
>>
>> I think the reaction characterizes the problem correctly - but the second
>> part is the most serious objection. People don't need a measurement
>> tool, they need to know that this is why their home network sucks
>> sometimes.
>>
>>
>>
>>
>>
>> On Thursday, March 19, 2015 3:58pm, "Livingood, Jason"
>> <***@cable.comcast.com> said:
>>
>>> On 3/19/15, 1:11 PM, "Dave Taht" <***@gmail.com> wrote:
>>>
>>>> On Thu, Mar 19, 2015 at 6:53 AM, <***@reed.com> wrote:
>>>>> How many years has it been since Comcast said they were going to fix
>>>>> bufferbloat in their network within a year?
>>>
>>> I¹m not sure anyone ever said it¹d take a year. If someone did (even if
>>> it
>>> was me) then it was in the days when the problem appeared less
>>> complicated
>>> than it is and I apologize for that. Let¹s face it - the problem is
>>> complex and the software that has to be fixed is everywhere. As I said
>>> about IPv6: if it were easy, it¹d be done by now. ;-)
>>>
>>>>> It's almost as if the cable companies don't want OTT video or
>>>>> simultaneous FTP and interactive gaming to work. Of course not. They'd
>>>>> never do that.
>>>
>>> Sorry, but that seems a bit unfair. It flies in the face of what we have
>>> done and are doing. We¹ve underwritten some of Dave¹s work, we got
>>> CableLabs to underwrite AQM work, and I personally pushed like heck to
>>> get
>>> AQM built into the default D3.1 spec (had CTO-level awareness & support,
>>> and was due to Greg White¹s work at CableLabs). We are starting to field
>>> test D3.1 gear now, by the way. We made some bad bets too, such as
>>> trying
>>> to underwrite an OpenWRT-related program with ISC, but not every tactic
>>> will always be a winner.
>>>
>>> As for existing D3.0 gear, it¹s not for lack of trying. Has any DOCSIS
>>> network of any scale in the world solved it? If so, I have something to
>>> use to learn from and apply here at Comcast - and I¹d **love** an
>>> introduction to someone who has so I can get this info.
>>>
>>> But usually there are rational explanations for why something is still
>>> not
>>> done. One of them is that the at-scale operational issues are more
>>> complicated that some people realize. And there is always a case of
>>> prioritization - meaning things like running out of IPv4 addresses and
>>> not
>>> having service trump more subtle things like buffer bloat (and the
>>> effort
>>> to get vendors to support v6 has been tremendous).
>>>
>>>> I do understand there are strong forces against us, especially in the
>>>> USA.
>>>
>>> I¹m not sure there are any forces against this issue. It¹s more a
>>> question
>>> of awareness - it is not apparent it is more urgent than other work in
>>> everyone¹s backlog. For example, the number of ISP customers even aware
>>> of
>>> buffer bloat is probably 0.001%; if customers aren¹t asking for it, the
>>> product managers have a tough time arguing to prioritize buffer bloat
>>> work
>>> over new feature X or Y.
>>>
>>> One suggestion I have made to increase awareness is that there be a
>>> nice,
>>> web-based, consumer-friendly latency under load / bloat test that you
>>> could get people to run as they do speed tests today. (If someone thinks
>>> they can actually deliver this, I will try to fund it - ping me
>>> off-list.)
>>> I also think a better job can be done explaining buffer bloat - it¹s
>>> hard
>>> to make an Œelevator pitch¹ about it.
>>>
>>> It reminds me a bit of IPv6 several years ago. Rather than saying in
>>> essence Œyou operators are dummies¹ for not already fixing this, maybe
>>> assume the engineers all Œget it¹ and what to do it. Because we really
>>> do
>>> get it and want to do something about it. Then ask those operators what
>>> they need to convince their leadership and their suppliers and product
>>> managers and whomever else that it needs to be resourced more
>>> effectively
>>> (see above for example).
>>>
>>> We¹re at least part of the way there in DOCSIS networks. It is in D3.1
>>> by
>>> default, and we¹re starting trials now. And probably within 18-24 months
>>> we won¹t buy any DOCSIS CPE that is not 3.1.
>>>
>>> The question for me is how and when to address it in DOCSIS 3.0.
>>>
>>> - Jason
>>>
>>>
>>>
>>>
>>
>>
>> _______________________________________________
>> Bloat mailing list
>> ***@lists.bufferbloat.net
>> https://lists.bufferbloat.net/listinfo/bloat
>
> _______________________________________________
> Bloat mailing list
> ***@lists.bufferbloat.net
> https://lists.bufferbloat.net/listinfo/bloat
Pedro Tumusok
2015-03-29 17:36:11 UTC
Permalink
Dslreports got a new speedtester up, anybody know Justin or some of the
other people over there?

http://www.dslreports.com/speedtest

Maybe somebody on here could even lend a hand in getting them to implement
features like ping under load etc.


Pedro



On Fri, Mar 20, 2015 at 2:50 PM, Rich Brown <***@gmail.com> wrote:

> On Mar 19, 2015, at 7:18 PM, Greg White <***@CableLabs.com> wrote:
>
> > Netalyzr is great for network geeks, hardly consumer-friendly, and even
> so
> > the "network buffer measurements" part is buried in 150 other statistics.
> > Why couldn't Ookla* add a simultaneous "ping" test to their throughput
> > test? When was the last time someone leaned on them?
> >
> >
> > *I realize not everyone likes the Ookla tool, but it is popular and about
> > as "sexy" as you are going to get with a network performance tool.
> >
> > -Greg
>
> Back in July, I contacted the support groups at Ookla, speedof.me, and
> testmy.net, and all three responded, "Hmmm... We'll refer that to our
> techies for review." and I never heard back.
>
> It seems to be hard to attract attention when there's only one voice
> crying in the wilderness. It might be worth sending a note to:
>
> - Speedtest.net <***@speedtest.net> or open a ticket at:
> https://www.ookla.com/support
> - SpeedOfMe <***@speedof.me>
> - TestMyNet <***@testmy.net>
>
> I append my (somewhat edited) note from July for your email drafting
> pleasure.
>
> Rich
>
> --- Sample Letter ---
>
> Subject: Add latency measurements (min/max)
>
> I have been using NAME-OF-SERVICE for quite a while to measure my
> network's performance. I had a couple thoughts that could make it more
> useful to me and others who want to test their network.
>
> Your page currently displays a single "latency" value of the ping time
> before the data transfers begin. It would be really helpful to report
> real-time min/max latency measurements made *during the up and downloads*.
>
> Why is latency interesting? Because when it's not well controlled, it
> completely destroys people's internet for voice, gaming, other
> time-sensitive traffic, and even everyday web browsing. As you may know,
> many routers (home and otherwise) buffer more data than can be sent, and
> this can dramatically affect latency for everyone using that router.
>
> I'm asking you to consider implementing the web-equivalent of the "Quick
> Test for Bufferbloat" that's on the Bufferbloat site. (I'm a member of the
> Bufferbloat team.)
> http://www.bufferbloat.net/projects/cerowrt/wiki/Quick_Test_for_Bufferbloat
>
> Please get back to me if you have questions.
>
> Many thanks!
>
> YOUR NAME
> YOUR SIG
>
> --- end of sample ---
>
> > On 3/19/15, 2:29 PM, "***@reed.com" <***@reed.com> wrote:
> >
> >> I do think engineers operating networks get it, and that Comcast's
> >> engineers really get it, as I clarified in my followup note.
> >>
> >> The issue is indeed prioritization of investment, engineering resources
> >> and management attention. The teams at Comcast in the engineering side
> >> have been the leaders in "bufferbloat minimizing" work, and I think they
> >> should get more recognition for that.
> >>
> >> I disagree a little bit about not having a test that shows the issue,
> and
> >> the value the test would have in demonstrating the issue to users.
> >> Netalyzer has been doing an amazing job on this since before the
> >> bufferbloat term was invented. Every time I've talked about this issue
> >> I've suggested running Netalyzer, so I have a personal set of comments
> >> from people all over the world who run Netalyzer on their home networks,
> >> on hotel networks, etc.
> >>
> >> When I have brought up these measurements from Netalyzr (which are not
> >> aimed at showing the problem as users experience) I observe an
> >> interesting reaction from many industry insiders: the results are not
> >> "sexy enough for stupid users" and also "no one will care".
> >>
> >> I think the reaction characterizes the problem correctly - but the
> second
> >> part is the most serious objection. People don't need a measurement
> >> tool, they need to know that this is why their home network sucks
> >> sometimes.
> >>
> >>
> >>
> >>
> >>
> >> On Thursday, March 19, 2015 3:58pm, "Livingood, Jason"
> >> <***@cable.comcast.com> said:
> >>
> >>> On 3/19/15, 1:11 PM, "Dave Taht" <***@gmail.com> wrote:
> >>>
> >>>> On Thu, Mar 19, 2015 at 6:53 AM, <***@reed.com> wrote:
> >>>>> How many years has it been since Comcast said they were going to fix
> >>>>> bufferbloat in their network within a year?
> >>>
> >>> I¹m not sure anyone ever said it¹d take a year. If someone did (even if
> >>> it
> >>> was me) then it was in the days when the problem appeared less
> >>> complicated
> >>> than it is and I apologize for that. Let¹s face it - the problem is
> >>> complex and the software that has to be fixed is everywhere. As I said
> >>> about IPv6: if it were easy, it¹d be done by now. ;-)
> >>>
> >>>>> It's almost as if the cable companies don't want OTT video or
> >>>>> simultaneous FTP and interactive gaming to work. Of course not.
> They'd
> >>>>> never do that.
> >>>
> >>> Sorry, but that seems a bit unfair. It flies in the face of what we
> have
> >>> done and are doing. We¹ve underwritten some of Dave¹s work, we got
> >>> CableLabs to underwrite AQM work, and I personally pushed like heck to
> >>> get
> >>> AQM built into the default D3.1 spec (had CTO-level awareness &
> support,
> >>> and was due to Greg White¹s work at CableLabs). We are starting to
> field
> >>> test D3.1 gear now, by the way. We made some bad bets too, such as
> >>> trying
> >>> to underwrite an OpenWRT-related program with ISC, but not every tactic
> >>> will always be a winner.
> >>>
> >>> As for existing D3.0 gear, it¹s not for lack of trying. Has any DOCSIS
> >>> network of any scale in the world solved it? If so, I have something to
> >>> use to learn from and apply here at Comcast - and I¹d **love** an
> >>> introduction to someone who has so I can get this info.
> >>>
> >>> But usually there are rational explanations for why something is still
> >>> not
> >>> done. One of them is that the at-scale operational issues are more
> >>> complicated that some people realize. And there is always a case of
> >>> prioritization - meaning things like running out of IPv4 addresses and
> >>> not
> >>> having service trump more subtle things like buffer bloat (and the
> >>> effort
> >>> to get vendors to support v6 has been tremendous).
> >>>
> >>>> I do understand there are strong forces against us, especially in the
> >>>> USA.
> >>>
> >>> I¹m not sure there are any forces against this issue. It¹s more a
> >>> question
> >>> of awareness - it is not apparent it is more urgent than other work in
> >>> everyone¹s backlog. For example, the number of ISP customers even aware
> >>> of
> >>> buffer bloat is probably 0.001%; if customers aren¹t asking for it, the
> >>> product managers have a tough time arguing to prioritize buffer bloat
> >>> work
> >>> over new feature X or Y.
> >>>
> >>> One suggestion I have made to increase awareness is that there be a
> >>> nice,
> >>> web-based, consumer-friendly latency under load / bloat test that you
> >>> could get people to run as they do speed tests today. (If someone
> thinks
> >>> they can actually deliver this, I will try to fund it - ping me
> >>> off-list.)
> >>> I also think a better job can be done explaining buffer bloat - it¹s
> >>> hard
> >>> to make an Œelevator pitch¹ about it.
> >>>
> >>> It reminds me a bit of IPv6 several years ago. Rather than saying in
> >>> essence Œyou operators are dummies¹ for not already fixing this, maybe
> >>> assume the engineers all Œget it¹ and what to do it. Because we really
> >>> do
> >>> get it and want to do something about it. Then ask those operators what
> >>> they need to convince their leadership and their suppliers and product
> >>> managers and whomever else that it needs to be resourced more
> >>> effectively
> >>> (see above for example).
> >>>
> >>> We¹re at least part of the way there in DOCSIS networks. It is in D3.1
> >>> by
> >>> default, and we¹re starting trials now. And probably within 18-24
> months
> >>> we won¹t buy any DOCSIS CPE that is not 3.1.
> >>>
> >>> The question for me is how and when to address it in DOCSIS 3.0.
> >>>
> >>> - Jason
> >>>
> >>>
> >>>
> >>>
> >>
> >>
> >> _______________________________________________
> >> Bloat mailing list
> >> ***@lists.bufferbloat.net
> >> https://lists.bufferbloat.net/listinfo/bloat
> >
> > _______________________________________________
> > Bloat mailing list
> > ***@lists.bufferbloat.net
> > https://lists.bufferbloat.net/listinfo/bloat
>
> _______________________________________________
> Bloat mailing list
> ***@lists.bufferbloat.net
> https://lists.bufferbloat.net/listinfo/bloat
>



--
Best regards / Mvh
Jan Pedro Tumusok
Jonathan Morton
2015-03-30 07:06:59 UTC
Permalink
> On 29 Mar, 2015, at 20:36, Pedro Tumusok <***@gmail.com> wrote:
>
> Dslreports got a new speedtester up, anybody know Justin or some of the other people over there?
>
> http://www.dslreports.com/speedtest
>
> Maybe somebody on here could even lend a hand in getting them to implement features like ping under load etc.

I gave that test a quick try. It measured my download speed well enough, but the upload…

Let’s just say it effectively measured the speed to my local webcache, not to the server itself.

- Jonathan Morton
Pedro Tumusok
2015-03-30 13:56:15 UTC
Permalink
On Mon, Mar 30, 2015 at 9:06 AM, Jonathan Morton <***@gmail.com>
wrote:

>
> > On 29 Mar, 2015, at 20:36, Pedro Tumusok <***@gmail.com>
> wrote:
> >
> > Dslreports got a new speedtester up, anybody know Justin or some of the
> other people over there?
> >
> > http://www.dslreports.com/speedtest
> >
> > Maybe somebody on here could even lend a hand in getting them to
> implement features like ping under load etc.
>
> I gave that test a quick try. It measured my download speed well enough,
> but the upload

>
> Let’s just say it effectively measured the speed to my local webcache, not
> to the server itself.
>
>
Hi Jonathan,

I forwarded your observation to the Justin over at dslreports and I got the
following answer. I a bit in the middle here, but I'll see if I can get him
to join the mailing list, he seemed interested in bufferbloat at least.
Also would you mind doing another test and share the link for result, so he
can check closer?

""Let’s just say it effectively measured the speed to my local webcache,
not to the server itself."

If he runs an outbound squid, as the upload speed is measured continuously,
it will measure the speed to the squid, although why would configure it to
cache POSTs when the URL is unique and includes all the no-cache headers?
it doesn't make them any faster we still have to wait for the result.. I
believe his situation to be rather unique. 190,000 tests nobody else has
had that case.

Either way if he re-runs and sends me a link, I can look at the log and
work out how to"patch" the upload speed at the end to the right number
because it waits for the real reply from the server, with the amount
uploaded, and no web cache can fake that.

on the other things, yes I'd like to hear about buffer bloat ideas.
The simultaneous up+down mode would also be interesting.

Since I control the servers and don't use a CDN, I'm thinking of running
ng-capture constantly, then inspecting the tcptraces in real time. If there
is stuff that can drop out of that that is revealing it can be rolled into
the results.. Even without any buffer bloat issues, being able to see
packet loss directly would be a win for people with bad lines. It would
also pickup those TCP disasters between client and server when they can't
agree on something or things are fragmented etc etc."



What do you guys think about his packet capture ideas? There are a lot of
users over on dslreports, so if they put up a speedtest with bufferbloat
"detectors" it could be another thing to get the snowball to roll.

I also removed cerowrt dev list, do not think we need to cross post to that
one.
--
Best regards / Mvh
Jan Pedro Tumusok
Jonathan Morton
2015-03-30 14:18:03 UTC
Permalink
> On 30 Mar, 2015, at 16:56, Pedro Tumusok <***@gmail.com> wrote:
>
> Also would you mind doing another test and share the link for result, so he can check closer?

I still have the result link for the earlier one:

http://www.dslreports.com/speedtest/193440

Here’s another result without using the webcache, but keeping everything else the same:

http://www.dslreports.com/speedtest/199524

For context, this is through the PowerBook which is running cake, 8Mbps down and 1Mbps up. This is somewhat less than the capacity of the link, but since it’s 3G it varies quite a lot anyway.

> on the other things, yes I'd like to hear about buffer bloat ideas.

The number-one desirable feature is to carry on measuring the latency while the throughput test is ongoing. This is relevant to anyone who wants to make a VoIP call or play an online game while using the connection for something else. If you have several people in one household, it’s often hard to coordinate their activity to avoid such multitasking.

- Jonathan Morton
Pedro Tumusok
2015-03-30 14:20:51 UTC
Permalink
On Mon, Mar 30, 2015 at 4:18 PM, Jonathan Morton <***@gmail.com>
wrote:

>
> > On 30 Mar, 2015, at 16:56, Pedro Tumusok <***@gmail.com>
> wrote:
> >
> > Also would you mind doing another test and share the link for result, so
> he can check closer?
>
> I still have the result link for the earlier one:
>
> http://www.dslreports.com/speedtest/193440
>
> Here’s another result without using the webcache, but keeping everything
> else the same:
>
> http://www.dslreports.com/speedtest/199524
>
> For context, this is through the PowerBook which is running cake, 8Mbps
> down and 1Mbps up. This is somewhat less than the capacity of the link,
> but since it’s 3G it varies quite a lot anyway.
>
> > on the other things, yes I'd like to hear about buffer bloat ideas.
>
> The number-one desirable feature is to carry on measuring the latency
> while the throughput test is ongoing. This is relevant to anyone who wants
> to make a VoIP call or play an online game while using the connection for
> something else. If you have several people in one household, it’s often
> hard to coordinate their activity to avoid such multitasking.
>
> - Jonathan Morton
>
>
I will forward it to him, hopefully he will signup here also, so the smart
people can talk to each other and not having me mess it up :)


--
Best regards / Mvh
Jan Pedro Tumusok
Dave Taht
2015-03-30 14:55:03 UTC
Permalink
I think the most effective thing would be to add bufferbloat testing
infrastructure to the web browsers themselves. There are already
plenty of tools for measuring web performance (whyslow for firefox,
the successor to chrome web page benchmarker) more or less built in...
measuring actual network performance under load is not much of a
reach.

The issues this would resolve are:

1) speed - the test(s) could use native apis within the browser and
thus achieve higher rates of speed than is possible with javascript
(and monitor cpu usage)
2) we could rigorously define the tests to have similar features to
netperf-wrapper
3) we could get much better tcp statistics as in with TCP_INFO
4) output formats could still be json as we do today, but plotted better
5) ?

Problems are:

0) Convincing users to use (and believe) them
1) Suitable server targets for the tests themselves
2) Although the browsers are basically in a nearly quarterly update
cycle, it would still take time for the tests to be widely available
even if they were ready today
3) Convincing the browser makers that they could use such tests
4) Writing the tests (in C and C++)
5) The outcry at speedtest, et al, for obsoleting their tools
(think microsoft vs "stacker")
6) Bloating up the browsers still further
Pedro Tumusok
2015-03-30 16:05:19 UTC
Permalink
On Mon, Mar 30, 2015 at 4:55 PM, Dave Taht <***@gmail.com> wrote:

> I think the most effective thing would be to add bufferbloat testing
> infrastructure to the web browsers themselves. There are already
> plenty of tools for measuring web performance (whyslow for firefox,
> the successor to chrome web page benchmarker) more or less built in...
> measuring actual network performance under load is not much of a
> reach.
>
>
Dave,

That is feature creep, we originally discussed having continuous ping
measurement under load.
New ideas not so welcome ;)

--
Best regards / Mvh
Jan Pedro Tumusok
Jesper Dangaard Brouer
2015-03-31 05:07:35 UTC
Permalink
On Mon, 30 Mar 2015 18:05:19 +0200 Pedro Tumusok <***@gmail.com> wrote:

[...]
> That is feature creep, we originally discussed having continuous ping
> measurement under load.
> New ideas not so welcome ;)

I agree, we just want Justin (http://www.dslreports.com/speedtest) to
also measure and report on ping/latency under load.

After we have this basic step, we can refine it further, e.g. with
Jonathan HZ measurement.

--
Best regards,
Jesper Dangaard Brouer
MSc.CS, Sr. Network Kernel Developer at Red Hat
Author of http://www.iptv-analyzer.org
LinkedIn: http://www.linkedin.com/in/brouer
Pedro Tumusok
2015-04-01 17:21:54 UTC
Permalink
Justin,

If you got any questions, don't feel shy :)
If there is any testing I can help with etc, let me know.

Pedro

On Tue, Mar 31, 2015 at 7:07 AM, Jesper Dangaard Brouer <***@redhat.com>
wrote:

>
> On Mon, 30 Mar 2015 18:05:19 +0200 Pedro Tumusok <***@gmail.com>
> wrote:
>
> [...]
> > That is feature creep, we originally discussed having continuous ping
> > measurement under load.
> > New ideas not so welcome ;)
>
> I agree, we just want Justin (http://www.dslreports.com/speedtest) to
> also measure and report on ping/latency under load.
>
> After we have this basic step, we can refine it further, e.g. with
> Jonathan HZ measurement.
>
> --
> Best regards,
> Jesper Dangaard Brouer
> MSc.CS, Sr. Network Kernel Developer at Red Hat
> Author of http://www.iptv-analyzer.org
> LinkedIn: http://www.linkedin.com/in/brouer
>



--
Best regards / Mvh
Jan Pedro Tumusok
Pedro Tumusok
2015-04-07 17:16:49 UTC
Permalink
I managed to remove a "t" on the end there....

Pedro
---------- Forwarded message ----------
From: Pedro Tumusok <***@gmail.com>
Date: Tue, Apr 7, 2015 at 7:15 PM
Subject: Re: [Bloat] Latency Measurements in Speed Test suites (was: DOCSIS
3+ recommendation?)
To: jb <***@gmail.com>
Cc: bloat <***@lists.bufferbloat.ne>


On Sat, Apr 4, 2015 at 2:03 PM, jb <***@gmail.com> wrote:

> Hi,
> Thanks for the indirect invitation to the list I'm looking forward to
> following it and learning stuff.
>
> If I had one question now, it would be this. Should I put effort into
> measuring ongoing RTT during the "full" download and upload phases, or
> should I put effort into running packet captures at some of the test
> locations, storing them, and processing them later, with the idea that this
> will reveal in conjunction with speed test result files, actual TCP RTT
> times during times when a residential connection is full. Even packet
> capturing at one location would capture a lot of individuals because of the
> many to one setup of the server-client.
>

From earlier discussions on here, I think the RTT during "full" download
and upload phases. is what most people on here want to see. Since it will
tell us how any latency sensitive service will be affected by the buffers
in the equipment when the connection is under heavy load. In tune with your
"We are not interested in making your ISP look good", on here we are
interested in making your device vendor look "bad". So that they actually
can fix this, because its about them fixing their hardware, the solutions
are there, they just need to do it.
Having full captures is icing on the cake, but to just show the induced
delay by the equipment is 90% of the way. But as all projects, the last 10%
is the toughest part.

Personally I think that if your speed test shows people the "true"
experience, ie bad, they can expect from their setup, it will be immensely
valuable for you and the bufferbloat project. And of course you will be
copied on it, as before, by other speed test sites/tools.

Bad analogies wise, having the best speed is like inserting a Porsche high
performance car engine into a VW Beetle. You have the potential for going
very fast, but it will be a very scary and uncomfortable ride, especially
if you compare it to using that engine in a "real" Porsche.





> Anything that is produced from this project might have to be crunched by
> an interested person other than me, because of the number of different
> devices and browsers and undocumented limits I've still got my hands full
> making sure the maximum number of people get a "competitive" speed reading.
> Making sure the majority drive their connection to capacity is probably a
> necessary condition for accurate conclusions on anything else anyway.
>
>
I assume that a lot of the people on here can use it for their daily work
and I think there are some academics/scientists on here, that would love to
have a copy of the date for research.

Pedro


> On Thu, Apr 2, 2015 at 4:21 AM, Pedro Tumusok <***@gmail.com>
> wrote:
>
>> Justin,
>>
>> If you got any questions, don't feel shy :)
>> If there is any testing I can help with etc, let me know.
>>
>> Pedro
>>
>> On Tue, Mar 31, 2015 at 7:07 AM, Jesper Dangaard Brouer <
>> ***@redhat.com> wrote:
>>
>>>
>>> On Mon, 30 Mar 2015 18:05:19 +0200 Pedro Tumusok <
>>> ***@gmail.com> wrote:
>>>
>>> [...]
>>> > That is feature creep, we originally discussed having continuous ping
>>> > measurement under load.
>>> > New ideas not so welcome ;)
>>>
>>> I agree, we just want Justin (http://www.dslreports.com/speedtest) to
>>> also measure and report on ping/latency under load.
>>>
>>> After we have this basic step, we can refine it further, e.g. with
>>> Jonathan HZ measurement.
>>>
>>> --
>>> Best regards,
>>> Jesper Dangaard Brouer
>>> MSc.CS, Sr. Network Kernel Developer at Red Hat
>>> Author of http://www.iptv-analyzer.org
>>> LinkedIn: http://www.linkedin.com/in/brouer
>>>
>>
>>
>>
>> --
>> Best regards / Mvh
>> Jan Pedro Tumusok
>>
>>
>> _______________________________________________
>> Bloat mailing list
>> ***@lists.bufferbloat.net
>> https://lists.bufferbloat.net/listinfo/bloat
>>
>>
>


--
Best regards / Mvh
Jan Pedro Tumusok




--
Best regards / Mvh
Jan Pedro Tumusok
Livingood, Jason
2015-03-30 16:20:03 UTC
Permalink
Could be via a browser plugin?

Regards,
Jason

Jason Livingood
Comcast - Internet Services

> On Mar 30, 2015, at 11:55, Dave Taht <***@gmail.com> wrote:
>
> I think the most effective thing would be to add bufferbloat testing
> infrastructure to the web browsers themselves. There are already
> plenty of tools for measuring web performance (whyslow for firefox,
> the successor to chrome web page benchmarker) more or less built in...
> measuring actual network performance under load is not much of a
> reach.
>
> The issues this would resolve are:
>
> 1) speed - the test(s) could use native apis within the browser and
> thus achieve higher rates of speed than is possible with javascript
> (and monitor cpu usage)
> 2) we could rigorously define the tests to have similar features to
> netperf-wrapper
> 3) we could get much better tcp statistics as in with TCP_INFO
> 4) output formats could still be json as we do today, but plotted better
> 5) ?
>
> Problems are:
>
> 0) Convincing users to use (and believe) them
> 1) Suitable server targets for the tests themselves
> 2) Although the browsers are basically in a nearly quarterly update
> cycle, it would still take time for the tests to be widely available
> even if they were ready today
> 3) Convincing the browser makers that they could use such tests
> 4) Writing the tests (in C and C++)
> 5) The outcry at speedtest, et al, for obsoleting their tools
> (think microsoft vs "stacker")
> 6) Bloating up the browsers still further
> _______________________________________________
> Bloat mailing list
> ***@lists.bufferbloat.net
> https://lists.bufferbloat.net/listinfo/bloat
>
Pedro Tumusok
2015-03-31 01:30:41 UTC
Permalink
Justin has signed up for this list, I will just copy the question he sent
me directly over at DSLReports and let you chime in with your views.

--

I've subscribed to the list.

Currently the 'saturation' meter is pinging away at an unrelated server (
dslreports.com)
probably it should ping away, and with higher frequency, at one of the
servers streaming data in? because then there are more likely to be filled
buffers en-route?
or are the bloated buffers mainly at the customer end of the experience.

I can make the saturation meter display RTT directly, and continue during
the test as an preference. I don't really want to have it pinging away
during the test because it probably slows the result down. Actually I'll
have to check that. Definitely on slow lines it would (like GPRS and 3G).

tcptrace on the server side of one stream would immediately reveal average
and peak RTT and more. I wonder if that is the goal to be shooting for
rather than these more indirect measurements.

What is the buffer bloat opinion on the ESNet page?

»fasterdata.es.net/network-tuning ··· r-bloat/

they say more not less buffers are needed for 10gig, and its only a problem
with residential.

On Mon, Mar 30, 2015 at 4:20 PM, Pedro Tumusok <***@gmail.com>
wrote:

>
>
> On Mon, Mar 30, 2015 at 4:18 PM, Jonathan Morton <***@gmail.com>
> wrote:
>
>>
>> > On 30 Mar, 2015, at 16:56, Pedro Tumusok <***@gmail.com>
>> wrote:
>> >
>> > Also would you mind doing another test and share the link for result,
>> so he can check closer?
>>
>> I still have the result link for the earlier one:
>>
>> http://www.dslreports.com/speedtest/193440
>>
>> Here’s another result without using the webcache, but keeping everything
>> else the same:
>>
>> http://www.dslreports.com/speedtest/199524
>>
>> For context, this is through the PowerBook which is running cake, 8Mbps
>> down and 1Mbps up. This is somewhat less than the capacity of the link,
>> but since it’s 3G it varies quite a lot anyway.
>>
>> > on the other things, yes I'd like to hear about buffer bloat ideas.
>>
>> The number-one desirable feature is to carry on measuring the latency
>> while the throughput test is ongoing. This is relevant to anyone who wants
>> to make a VoIP call or play an online game while using the connection for
>> something else. If you have several people in one household, it’s often
>> hard to coordinate their activity to avoid such multitasking.
>>
>> - Jonathan Morton
>>
>>
> I will forward it to him, hopefully he will signup here also, so the smart
> people can talk to each other and not having me mess it up :)
>
>
> --
> Best regards / Mvh
> Jan Pedro Tumusok
>
>


--
Best regards / Mvh
Jan Pedro Tumusok
Jonathan Morton
2015-03-31 04:14:31 UTC
Permalink
> Currently the 'saturation' meter is pinging away at an unrelated server (dslreports.com)
> probably it should ping away, and with higher frequency, at one of the servers streaming data in? because then there are more likely to be filled buffers en-route?
> or are the bloated buffers mainly at the customer end of the experience.

Mostly at the customer end (see below). The core is generally built with sufficient capacity and small buffers (compared to the link capacity). I recommend pinging a topologically-nearby server rather than a central one.

> I can make the saturation meter display RTT directly, and continue during the test as an preference. I don't really want to have it pinging away during the test because it probably slows the result down. Actually I'll have to check that. Definitely on slow lines it would (like GPRS and 3G).

A simple ping, without artificially added payload, is about 64 bytes. A small UDP packet (whose payload is just a unique cookie) can be used for the same purpose, and is less likely to experience artificial prioritisation. Four of those a second makes a quarter of a kilobyte per second each way. That’ll be noticeable on GPRS and analogue modems, but not to anyone else. I say that as someone who regularly uses 3G.

A concept I’d like to introduce you to is “network responsiveness”, which is measured in Hz rather than ms, and thus goes down when latency goes up. A responsiveness of 10.0Hz corresponds to a 100ms latency, and that’s a useful, rule-of-thumb baseline for acceptable VoIP and gaming performance. It can be compared fairly directly to the framerate of a video or a graphics card.

> tcptrace on the server side of one stream would immediately reveal average and peak RTT and more. I wonder if that is the goal to be shooting for rather than these more indirect measurements.
>
> What is the buffer bloat opinion on the ESNet page?
>
> »fasterdata.es.net/network-tuning ··· r-bloat/
>
> they say more not less buffers are needed for 10gig, and its only a problem with residential.

Datacentres and the public Internet are very different places. You can’t generalise from one to the other. The RTTs are very different, for a start - LAN vs WAN scales.

At 10Gbps, a megabyte of buffer will drain in about a millisecond. What’s more, a megabyte might be enough, because chances are such a fat link is being used by lots of TCP sessions in parallel, so you only need to worry about one or two of those bursting at a given instant. Since buffers (after the first couple of packets) are used to absorb bursts, that’s all you might need.

Frankly, one of our present problems is getting consumer-grade router hardware to work reliably at 100Mbps or so, which is just starting to become widely available. There’s only so much you can do with a wimpy, single-core, cost-optimised MIPS, even if it’s attached to lots of GigE and 802.11ac hardware; I’m using an ancient Pentium-MMX as a surprisingly accurate model for these things. Sufficient buffering isn’t the problem here - it just can’t turn packets around fast enough.

On a more typical rural consumer connection, at 1Mbps, a megabyte of buffer will take about 10 seconds to drain, and is therefore obviously oversized. Even at 10Mbps, it’ll take a whole second to drain, which is painful. The AQM systems we’re working on are an answer to that problem - they will automatically act to keep the buffers at a more sensible fill level. They also isolate flows from each other, so that one bursting or otherwise misbehaving flow won’t interfere (adding that draining latency) with a sparse, latency-sensitive one like VoIP or gaming.

It is that last scenario, which the great majority of consumers experience in practice, which we’d like you to address by measuring latency under load.

- Jonathan Morton
Toke Høiland-Jørgensen
2015-03-30 14:42:03 UTC
Permalink
Jonathan Morton <***@gmail.com> writes:

> The number-one desirable feature is to carry on measuring the latency
> while the throughput test is ongoing. This is relevant to anyone who
> wants to make a VoIP call or play an online game while using the
> connection for something else. If you have several people in one
> household, it’s often hard to coordinate their activity to avoid such
> multitasking.

+1 on this!

Otherwise pretty cool test; no plugins and manages to push quite a bit
of data, at least in the upload direction:
http://www.dslreports.com/speedtest/199468

-Toke
Livingood, Jason
2015-03-20 13:57:01 UTC
Permalink
>*I realize not everyone likes the Ookla tool, but it is popular and about
>as "sexy" as you are going to get with a network performance tool.

Ookla has recently been acquired by Ziff-Davis
(http://finance.yahoo.com/news/ziff-davis-acquires-ookla-120100454.html).
I am not sure have that may influence their potential involvement. I have
suggested they add this test previously. I also suggested it be added to
the FCC¹s SamKnows / Measuring Broadband American platform and that the
FCC potentially does a one-off special report on the results.

- Jason
David P. Reed
2015-03-20 14:08:27 UTC
Permalink
SamKnows is carefully constructed politically to claim that everyone has great service and no problems are detected. They were constructed by opponents of government supervision - the corporate FCC lobby.

Don't believe they have any incentive to measure customer relevant measures

M-Lab is better by far. But control by Google automatically discredits it's data. As well as the claims by operators that measurements by independent parties violate their trade secrets. Winning that battle requires a group that can measure while supporting a very expensive defense against lawsuits by operators making such claim of trade secrecy.

Criticizing M-LAB is just fodder fir the operators' lobby in DC.

On Mar 20, 2015, "Livingood, Jason" <***@cable.comcast.com> wrote:
>>*I realize not everyone likes the Ookla tool, but it is popular and
>about
>>as "sexy" as you are going to get with a network performance tool.
>
>Ookla has recently been acquired by Ziff-Davis
>(http://finance.yahoo.com/news/ziff-davis-acquires-ookla-120100454.html).
>I am not sure have that may influence their potential involvement. I
>have
>suggested they add this test previously. I also suggested it be added
>to
>the FCC¹s SamKnows / Measuring Broadband American platform and that the
>FCC potentially does a one-off special report on the results.
>
>- Jason

-- Sent with K-@ Mail - the evolution of emailing.
MUSCARIELLO Luca IMT/OLN
2015-03-20 14:14:48 UTC
Permalink
FYI, we have this in France.

http://www.arcep.fr/index.php?id=8571&tx_gsactualite_pi1[uid]=1701&tx_gsactualite_pi1[annee]=&tx_gsactualite_pi1[theme]=&tx_gsactualite_pi1[motscle]=&tx_gsactualite_pi1[backID]=26&cHash=f558832b5af1b8e505a77860f9d555f5&L=1

ARCEP is the equivalent of FCC in France.

User QoS is measured in the fixed access by third parties.
The tests they run can be ameliorated of course but the concept is right.
The data is then published periodically.

Luca


On 03/20/2015 03:08 PM, David P. Reed wrote:
> M-Lab is better by far. But control by Google automatically discredits
> it's data. As well as the claims by operators that measurements by
> independent parties violate their trade secrets. Winning that battle
> requires a group that can measure while supporting a very expensive
> defense against lawsuits by operators making such claim of trade secrecy.
Matt Mathis
2015-03-20 14:48:36 UTC
Permalink
Section 7.2 of
https://tools.ietf.org/html/draft-ietf-ippm-model-based-metrics-04 includes
a bufferbloat test. It is however somewhat underspecified.

Thanks,
--MM--
The best way to predict the future is to create it. - Alan Kay

Privacy matters! We know from recent events that people are using our
services to speak in defiance of unjust governments. We treat privacy and
security as matters of life and death, because for some users, they are.

On Fri, Mar 20, 2015 at 7:14 AM, MUSCARIELLO Luca IMT/OLN <
***@orange.com> wrote:

> FYI, we have this in France.
>
> http://www.arcep.fr/index.php?id=8571&tx_gsactualite_pi1[
> uid]=1701&tx_gsactualite_pi1[annee]=&tx_gsactualite_pi1[
> theme]=&tx_gsactualite_pi1[motscle]=&tx_gsactualite_pi1[backID]=26&cHash=
> f558832b5af1b8e505a77860f9d555f5&L=1
>
> ARCEP is the equivalent of FCC in France.
>
> User QoS is measured in the fixed access by third parties.
> The tests they run can be ameliorated of course but the concept is right.
> The data is then published periodically.
>
> Luca
>
>
> On 03/20/2015 03:08 PM, David P. Reed wrote:
>
>> M-Lab is better by far. But control by Google automatically discredits
>> it's data. As well as the claims by operators that measurements by
>> independent parties violate their trade secrets. Winning that battle
>> requires a group that can measure while supporting a very expensive defense
>> against lawsuits by operators making such claim of trade secrecy.
>>
>
> _______________________________________________
> Bloat mailing list
> ***@lists.bufferbloat.net
> https://lists.bufferbloat.net/listinfo/bloat
>
Jim Gettys
2015-03-20 13:48:28 UTC
Permalink
On Thu, Mar 19, 2015 at 3:58 PM, Livingood, Jason <
***@cable.comcast.com> wrote:

> On 3/19/15, 1:11 PM, "Dave Taht" <***@gmail.com> wrote:
>
> >On Thu, Mar 19, 2015 at 6:53 AM, <***@reed.com> wrote:
> >> How many years has it been since Comcast said they were going to fix
> >>bufferbloat in their network within a year?
>
> I¹m not sure anyone ever said it¹d take a year. If someone did (even if it
> was me) then it was in the days when the problem appeared less complicated
> than it is and I apologize for that. Let¹s face it - the problem is
> complex and the software that has to be fixed is everywhere. As I said
> about IPv6: if it were easy, it¹d be done by now. ;-)
>

​I think this was the hope that the buffer size control feature in Docsis
could at least be used to cut bufferbloat down to the "traditional" 100ms
level, as I remember the sequence of events. But reality intervened: buggy
implementations by too many vendors, is what I remember hearing from Rich
Woundy.
​


>
> >>It's almost as if the cable companies don't want OTT video or
> >>simultaneous FTP and interactive gaming to work. Of course not. They'd
> >>never do that.
>
> Sorry, but that seems a bit unfair. It flies in the face of what we have
> done and are doing. We¹ve underwritten some of Dave¹s work, we got
> CableLabs to underwrite AQM work, and I personally pushed like heck to get
> AQM built into the default D3.1 spec (had CTO-level awareness & support,
> and was due to Greg White¹s work at CableLabs). We are starting to field
> test D3.1 gear now, by the way. We made some bad bets too, such as trying
> to underwrite an OpenWRT-related program with ISC, but not every tactic
> will always be a winner.
>
> As for existing D3.0 gear, it¹s not for lack of trying. Has any DOCSIS
> network of any scale in the world solved it? If so, I have something to
> use to learn from and apply here at Comcast - and I¹d **love** an
> introduction to someone who has so I can get this info.
>
> But usually there are rational explanations for why something is still not
> done. One of them is that the at-scale operational issues are more
> complicated that some people realize. And there is always a case of
> prioritization - meaning things like running out of IPv4 addresses and not
> having service trump more subtle things like buffer bloat (and the effort
> to get vendors to support v6 has been tremendous).
>
> >I do understand there are strong forces against us, especially in the USA.
>
> I¹m not sure there are any forces against this issue. It¹s more a question
> of awareness - it is not apparent it is more urgent than other work in
> everyone¹s backlog. For example, the number of ISP customers even aware of
> buffer bloat is probably 0.001%; if customers aren¹t asking for it, the
> product managers have a tough time arguing to prioritize buffer bloat work
> over new feature X or Y.
>

​I agree with Jason on this one. We have to take bufferbloat mainstream to
generate "market pull". I've been reluctant in the past before we had
solutions in hand: very early in this quest, Dave Clark noted:
​"Yelling fire without having the exits marked" could be counter
productive. I think we have the exits marked now. Time to yell "Fire".

Even when you get to engineers in the organizations who build the
equipment, it's hard. First you have to explain that "more is not better",
and "some packet loss is good for you".

Day to day market pressures for other features mean that:
1) many/most of the engineers

​don't see that ​as what they need to do in the next quarter/year.
2) their management don't see that working on it should take any of their
time. It won't help them sell the next set of gear.

***So we have to generate demand from the market.***

Now, I can see a couple ways to do this:

1) help expose the problem, preferably in a dead simple way that everyone
sees. If we can get Ookla to add a simple test to their test system, this
would be a good start. If not, other test sites are needed. Nice as
Netalyzer is, it a) tops out around 20Mbps, and b) buries the buffering
results among 50 other numbers.
2) Markets such as gaming are large, and very latency sensitive. Even
better, lots of geeks hang out there. So investing in educating that
submarket may help pull things through the system overall.
3) Competitive pressures can be very helpful: but this requires at least
one significant player in each product category to "get it". So these are
currently slow falling dominoes.


> One suggestion I have made to increase awareness is that there be a nice,
> web-based, consumer-friendly latency under load / bloat test that you
> could get people to run as they do speed tests today. (If someone thinks
> they can actually deliver this, I will try to fund it - ping me off-list.)
> I also think a better job can be done explaining buffer bloat - it¹s hard
> to make an Œelevator pitch¹ about it.
>

​Yeah, the elevator pitch is hard, since a number of things around
bufferbloat are counter intuitive. I know, I've tried, and not really
succeeded. The best kinds of metaphors have been traffic related
("building parking lots at all the bottlenecks"), and explanations like
"packet loss is how the Internet enforces speed limits"
http://www.circleid.com/posts/20150228_packet_loss_how_the_internet_enforces_speed_limits/
.
​


>
> It reminds me a bit of IPv6 several years ago. Rather than saying in
> essence Œyou operators are dummies¹ for not already fixing this, maybe
> assume the engineers all Œget it¹ and what to do it.


​Many/most practicing engineers are still unaware of it, or if they have
heard the word bufferbloat, still don't "get it" that they see
bufferbloat's effects all the time.
​


> Because we really do
> get it and want to do something about it. Then ask those operators what
> they need to convince their leadership and their suppliers and product
> managers and whomever else that it needs to be resourced more effectively
> (see above for example).
>
> We¹re at least part of the way there in DOCSIS networks. It is in D3.1 by
> default, and we¹re starting trials now. And probably within 18-24 months
> we won¹t buy any DOCSIS CPE that is not 3.1.
>
> The question for me is how and when to address it in DOCSIS 3.0.
>

​We should talk at IETF.
​


>
> - Jason
>
>
>
> _______________________________________________
> Bloat mailing list
> ***@lists.bufferbloat.net
> https://lists.bufferbloat.net/listinfo/bloat
>
Livingood, Jason
2015-03-20 14:11:12 UTC
Permalink
On 3/20/15, 9:48 AM, "Jim Gettys" <***@freedesktop.org<mailto:***@freedesktop.org>> wrote:
​I think this was the hope that the buffer size control feature in Docsis could at least be used to cut bufferbloat down to the "traditional" 100ms level, as I remember the sequence of events. But reality intervened: buggy implementations by too many vendors, is what I remember hearing from Rich Woundy.

Indeed!

If I can re-prioritize some work (and fight some internal battles) to do a buffer bloat trial this year (next few months) - would folks here be willing to give input on the design / parameters? It would not be perfect but would be along the lines of ‘what’s the best we can do regarding buffer bloat with the equipment/software/systems/network we have now’.

​
Even when you get to engineers in the organizations who build the equipment, it's hard. First you have to explain that "more is not better", and "some packet loss is good for you".

That’s right, Jim. The “some packet loss is good” part is from what I have seen the hardest thing for people to understand. People have been trained to believe that any packet loss is terrible, not to mention that you should never fill a link to capacity (meaning either there should never be a bottleneck link anywhere on the Internet and/or that congestion should never occur anywhere).

***So we have to generate demand from the market.***

+1

1) help expose the problem, preferably in a dead simple way that everyone sees. If we can get Ookla to add a simple test to their test system, this would be a good start. If not, other test sites are needed. Nice as Netalyzer is, it a) tops out around 20Mbps, and b) buries the buffering results among 50 other numbers.

+1

2) Markets such as gaming are large, and very latency sensitive. Even better, lots of geeks hang out there. So investing in educating that submarket may help pull things through the system overall.

Consumer segments like gamers are very important. I suggest getting them coordinated in some manner. Create a campaign like #GamersAgainstBufferbloat / GamersAgainstBufferbloat.org or something.

We should talk at IETF.

Wish I were there! I will be in Amsterdam at the RIPE Atlas hack-a-thon. Some cool work happening on that measurement platform!

Jason
Michael Welzl
2015-03-20 14:54:07 UTC
Permalink
Folks,

I think I have just seen this statement a little too often:

> That’s right, Jim. The “some packet loss is good” part is from what I have seen the hardest thing for people to understand. People have been trained to believe that any packet loss is terrible (..)

I understand the "wrong mindset" thing and the idea of AQM doing something better. Still, I'd like people to understand that packet loss often also comes with delay - for having to retransmit. This delay is not visible in the queue, but it's visible in the end system. It also comes with head-of-line blocking delay on the receiver side: at least with TCP, whatever has been received after a dropped packet needs to wait in the OS for the hole to be filled before it can be handed over to the application.

Here we're not talking a few ms more or less in the queue, we're talking an RTT, when enough DupACKs are produced to make the sender clock out the missing packet again. Else, we're talking an RTO, which can be much, much more than an RTT, and which is what TLP tries to fix (but TLP's timer is also 2 RTTs - so this is all about delay at RTT-and-higher magnitudes).

Again, significant delay can come from dropped packets - you just don't see it when all you measure is the queue. ECN can help.

Cheers,
Michael
Jim Gettys
2015-03-20 15:31:27 UTC
Permalink
On Fri, Mar 20, 2015 at 10:54 AM, Michael Welzl <***@ifi.uio.no> wrote:

> Folks,
>
> I think I have just seen this statement a little too often:
>
> > That’s right, Jim. The “some packet loss is good” part is from what I
> have seen the hardest thing for people to understand. People have been
> trained to believe that any packet loss is terrible (..)
>
> I understand the "wrong mindset" thing and the idea of AQM doing something
> better. Still, I'd like people to understand that packet loss often also
> comes with delay - for having to retransmit. This delay is not visible in
> the queue, but it's visible in the end system. It also comes with
> head-of-line blocking delay on the receiver side: at least with TCP,
> whatever has been received after a dropped packet needs to wait in the OS
> for the hole to be filled before it can be handed over to the application.
>
> Here we're not talking a few ms more or less in the queue, we're talking
> an RTT, when enough DupACKs are produced to make the sender clock out the
> missing packet again. Else, we're talking an RTO, which can be much, much
> more than an RTT, and which is what TLP tries to fix (but TLP's timer is
> also 2 RTTs - so this is all about delay at RTT-and-higher magnitudes).
>
> Again, significant delay can come from dropped packets - you just don't
> see it when all you measure is the queue. ECN can help.
>

​And without AQM, the RTT's are often many times the actual speed of light
RTT's, sometimes measured in seconds. And you eventually get the losses
anyway, as the bloated queues overflow.

So without AQM, you are ​often/usually in much, much, much worse shape;
better to suffer the loss, and do the retransmit than wait forever.
- Jim


> Cheers,
> Michael
>
>
Michael Welzl
2015-03-20 15:39:11 UTC
Permalink
Sent from my iPhone

> On 20. mars 2015, at 16:31, Jim Gettys <***@freedesktop.org> wrote:
>
>
>
>> On Fri, Mar 20, 2015 at 10:54 AM, Michael Welzl <***@ifi.uio.no> wrote:
>> Folks,
>>
>> I think I have just seen this statement a little too often:
>>
>> > That’s right, Jim. The “some packet loss is good” part is from what I have seen the hardest thing for people to understand. People have been trained to believe that any packet loss is terrible (..)
>>
>> I understand the "wrong mindset" thing and the idea of AQM doing something better. Still, I'd like people to understand that packet loss often also comes with delay - for having to retransmit. This delay is not visible in the queue, but it's visible in the end system. It also comes with head-of-line blocking delay on the receiver side: at least with TCP, whatever has been received after a dropped packet needs to wait in the OS for the hole to be filled before it can be handed over to the application.
>>
>> Here we're not talking a few ms more or less in the queue, we're talking an RTT, when enough DupACKs are produced to make the sender clock out the missing packet again. Else, we're talking an RTO, which can be much, much more than an RTT, and which is what TLP tries to fix (but TLP's timer is also 2 RTTs - so this is all about delay at RTT-and-higher magnitudes).
>>
>> Again, significant delay can come from dropped packets - you just don't see it when all you measure is the queue. ECN can help.
>
> ​And without AQM, the RTT's are often many times the actual speed of light RTT's, sometimes measured in seconds. And you eventually get the losses anyway, as the bloated queues overflow.
>

not necessarily with ecn. and where in a burst loss occurs also matters


> So without AQM, you are ​often/usually in much, much, much worse shape; better to suffer the loss, and do the retransmit than wait forever.

sure!!


> - Jim
>
>>
>> Cheers,
>> Michael
>
Jonathan Morton
2015-03-20 16:31:53 UTC
Permalink
> On 20 Mar, 2015, at 16:54, Michael Welzl <***@ifi.uio.no> wrote:
>
> I'd like people to understand that packet loss often also comes with delay - for having to retransmit.

Or, turning it upside down, it’s always a win to drop packets (in the service of signalling congestion) if the induced delay exceeds the inherent RTT.

With ECN, of course, you don’t even have that caveat.

- Jonathan Morton
Michael Welzl
2015-03-20 20:59:37 UTC
Permalink
> On 20. mar. 2015, at 17.31, Jonathan Morton <***@gmail.com> wrote:
>
>
>> On 20 Mar, 2015, at 16:54, Michael Welzl <***@ifi.uio.no> wrote:
>>
>> I'd like people to understand that packet loss often also comes with delay - for having to retransmit.
>
> Or, turning it upside down, it’s always a win to drop packets (in the service of signalling congestion) if the induced delay exceeds the inherent RTT.

Actually, no: as I said, the delay caused by a dropped packet can be more than 1 RTT - even much more under some circumstances. Consider this quote from the intro of https://tools.ietf.org/html/draft-dukkipati-tcpm-tcp-loss-probe-01 :

***
To get a sense of just how long the RTOs are in relation to
connection RTTs, following is the distribution of RTO/RTT values on
Google Web servers. [percentile, RTO/RTT]: [50th percentile, 4.3];
[75th percentile, 11.3]; [90th percentile, 28.9]; [95th percentile,
53.9]; [99th percentile, 214].
***

That would be for the unfortunate case where you drop a packet at the end of a burst and you don't have TLP or anything, and only an RTO helps...

Cheers,
Michael
David P. Reed
2015-03-20 23:47:01 UTC
Permalink
I think this is because there are a lot of packets in flight from end to end meaning that the window is wide open and has way overshot the mark. This can happen if the receiving end keeps opening it's window and has not encountered a lost frame. That is: the dropped or marked packets are not happening early eniugh.

Evaluating an RTO measure from an out of whack system that is not sending congestion signals is not a good source of data, unless you show the internal state of the endpoints that was going on at the same time.

Do the control theory.

On Mar 20, 2015, Michael Welzl <***@ifi.uio.no> wrote:
>
>> On 20. mar. 2015, at 17.31, Jonathan Morton <***@gmail.com>
>wrote:
>>
>>
>>> On 20 Mar, 2015, at 16:54, Michael Welzl <***@ifi.uio.no> wrote:
>>>
>>> I'd like people to understand that packet loss often also comes with
>delay - for having to retransmit.
>>
>> Or, turning it upside down, it’s always a win to drop packets (in the
>service of signalling congestion) if the induced delay exceeds the
>inherent RTT.
>
>Actually, no: as I said, the delay caused by a dropped packet can be
>more than 1 RTT - even much more under some circumstances. Consider
>this quote from the intro of
>https://tools.ietf.org/html/draft-dukkipati-tcpm-tcp-loss-probe-01 :
>
>***
>To get a sense of just how long the RTOs are in relation to
> connection RTTs, following is the distribution of RTO/RTT values on
> Google Web servers. [percentile, RTO/RTT]: [50th percentile, 4.3];
> [75th percentile, 11.3]; [90th percentile, 28.9]; [95th percentile,
> 53.9]; [99th percentile, 214].
>***
>
>That would be for the unfortunate case where you drop a packet at the
>end of a burst and you don't have TLP or anything, and only an RTO
>helps...
>
>Cheers,
>Michael

-- Sent with K-@ Mail - the evolution of emailing.
Michael Welzl
2015-03-21 00:08:00 UTC
Permalink
> On 21. mar. 2015, at 00.47, David P. Reed <***@reed.com> wrote:
>
> I think this is because there are a lot of packets in flight from end to end meaning that the window is wide open and has way overshot the mark. This can happen if the receiving end keeps opening it's window and has not encountered a lost frame. That is: the dropped or marked packets are not happening early eniugh.

... or they're so early that there are not enough RTT samples for a meaningful RTT measure.


> Evaluating an RTO measure from an out of whack system that is not sending congestion signals is not a good source of data, unless you show the internal state of the endpoints that was going on at the same time.
>
> Do the control theory.

Well - the RTO calculation can easily go out of whack when there is some variation, due to the + 4*RTTVAR bit. I don't need control theory to show that, a simple Excel sheet with a few realistic example numbers is enough. There's not much deep logic behind the 4*RTTVAR AFAIK - probably 4 worked ok in tests that Van did back then. Okay though, as fine tuning would mean making more assumptions about the path which is unknown in TCP - its just a conservative calculation, and the RTO being way too large often just doesn't matter much (thanks to DupACKs). Anyway, sometimes it can - and then a dropped packet can be pretty bad.

Cheers
Michael


>
> On Mar 20, 2015, Michael Welzl <***@ifi.uio.no> wrote:
>
> On 20. mar. 2015, at 17.31, Jonathan Morton <***@gmail.com> wrote:
>
>
> On 20 Mar, 2015, at 16:54, Michael Welzl <***@ifi.uio.no> wrote:
>
> I'd like people to understand that packet loss often also comes with delay - for having to retransmit.
>
> Or, turning it upside down, it’s always a win to drop packets (in the service of signalling congestion) if the induced delay exceeds the inherent RTT.
>
> Actually, no: as I said, the delay caused by a dropped packet can be more than 1 RTT - even much more under some circumstances. Consider this quote from the intro of https://tools.ietf.org/html/draft-dukkipati-tcpm-tcp-loss-probe-01 :
>
> ***
> To get a sense of just how long the RTOs are in relation to
> connection RTTs, following is the distribution of RTO/RTT values on
> Google Web servers. [percentile, RTO/RTT]: [50th percentile, 4.3];
> [75th percentile, 11.3]; [90th percentile, 28.9]; [95th percentile,
> 53.9]; [99th percentile, 214].
> ***
>
> That would be for the unfortunate case where you drop a packet at the end of a burst and you don't have TLP or anything, and only an RTO helps...
>
> Cheers,
> Michael
>
>
> -- Sent with K-@ Mail - the evolution of emailing.
David Lang
2015-03-21 00:03:16 UTC
Permalink
On Fri, 20 Mar 2015, Michael Welzl wrote:

>> On 20. mar. 2015, at 17.31, Jonathan Morton <***@gmail.com> wrote:
>>
>>
>>> On 20 Mar, 2015, at 16:54, Michael Welzl <***@ifi.uio.no> wrote:
>>>
>>> I'd like people to understand that packet loss often also comes with delay - for having to retransmit.
>>
>> Or, turning it upside down, it’s always a win to drop packets (in the service of signalling congestion) if the induced delay exceeds the inherent RTT.
>
> Actually, no: as I said, the delay caused by a dropped packet can be more than 1 RTT - even much more under some circumstances. Consider this quote from the intro of https://tools.ietf.org/html/draft-dukkipati-tcpm-tcp-loss-probe-01 :

You are viewing this as a question to drop a packet or not drop a packet.

The problem is that isn't the actual question.

The question is to drop a packet early and have the sender slow down, or wait
until the sender has filled the buffer to the point that all traffic (including
acks) is experiencing multi-second latency and then drop a bunch of packets.

In theory ECN would allow for feedback to the sender to have it slow down
without any packet being dropped, but in the real world it doesn't work that
well.

1. If you mark packets as congested if they have ECN and drop them if they
don't, programmers will mark everything ECN (and not slow transmission) because
doing so gives them an advantage over applications that don't mark their packets
with ECN

marking packets with ECN gives an advantage to them in mixed environments

2. If you mark packets as congested at a lower level than where you drop them,
no programmer is going to enable ECN because flows with ECN will be prioritized
below flows without ECN

If everyone use ECN you don't have a problem, but if only some
users/applications do, there's no way to make it equal, so one or the other is
going to have an advantage, programmers will game the system to do whatever
gives them the advantage

David Lang
Steinar H. Gunderson
2015-03-21 00:13:07 UTC
Permalink
On Fri, Mar 20, 2015 at 05:03:16PM -0700, David Lang wrote:
> 1. If you mark packets as congested if they have ECN and drop them
> if they don't, programmers will mark everything ECN (and not slow
> transmission) because doing so gives them an advantage over
> applications that don't mark their packets with ECN

I'm not sure if this is actually true. Somehow TCP stacks appear to be tricky
enough to mess with that the people who are capable of gaming congestion
control algorithms are also wise enough not to do so. Granted, we are seeing
some mild IW escalation, but you could very well make a TCP that's
dramatically unfair to everything else and deploy that on your CDN, and
somehow we're not seeing that.

(OK, concession #2, “download accelerators” are doing really bad things with
multiple connections to gain TCP unfairness, but that's on the client side
only, not the server side.)

Based on this, I'm not convinced that people would bulk-mark their packets as
ECN-capable just to get ahead in the queues. It _is_ hard to know when to
drop and when to ECN-mark, though; maybe you could imagine the benefits of
ECN (for the flow itself) to be big enough that you don't actually need to
lower the drop probability (just make the ECN probability a bit higher),
but this is pure unfounded speculation on my behalf.

/* Steinar */
--
Homepage: http://www.sesse.net/
David Lang
2015-03-21 00:25:08 UTC
Permalink
On Sat, 21 Mar 2015, Steinar H. Gunderson wrote:

> On Fri, Mar 20, 2015 at 05:03:16PM -0700, David Lang wrote:
>> 1. If you mark packets as congested if they have ECN and drop them
>> if they don't, programmers will mark everything ECN (and not slow
>> transmission) because doing so gives them an advantage over
>> applications that don't mark their packets with ECN
>
> I'm not sure if this is actually true. Somehow TCP stacks appear to be tricky
> enough to mess with that the people who are capable of gaming congestion
> control algorithms are also wise enough not to do so. Granted, we are seeing
> some mild IW escalation, but you could very well make a TCP that's
> dramatically unfair to everything else and deploy that on your CDN, and
> somehow we're not seeing that.

It doesn't take deep mucking with the TCP stack. A simple iptables rule to OR a
bit on as it's leaving the box would make the router think that the system has
ECN enabled (or do it on your local gateway if you think it gives you higher
priority over the wider network)

If you start talking about ECN and UDP things are even simpler, there's no need
to go through the OS stack at all, craft your own packets and send the raw
packets

> (OK, concession #2, “download accelerators” are doing really bad things with
> multiple connections to gain TCP unfairness, but that's on the client side
> only, not the server side.)
>
> Based on this, I'm not convinced that people would bulk-mark their packets as
> ECN-capable just to get ahead in the queues.

Given the money they will spend and the cargo-cult steps that gamers will do in
the hope of gaining even a slight advantage, I can easily see this happening

> It _is_ hard to know when to
> drop and when to ECN-mark, though; maybe you could imagine the benefits of
> ECN (for the flow itself) to be big enough that you don't actually need to
> lower the drop probability (just make the ECN probability a bit higher),
> but this is pure unfounded speculation on my behalf.

As I said, there are two possibilities

1. if you mark packets sooner than you would drop them, advantage non-ECN

2. if you mark packets and don't drop them until higher levels, advantage ECN,
and big advantage to fake ECN

David Lang
Jonathan Morton
2015-03-21 00:34:23 UTC
Permalink
> On 21 Mar, 2015, at 02:25, David Lang <***@lang.hm> wrote:
>
> As I said, there are two possibilities
>
> 1. if you mark packets sooner than you would drop them, advantage non-ECN
>
> 2. if you mark packets and don't drop them until higher levels, advantage ECN, and big advantage to fake ECN

3: if you have flow isolation with drop-from-longest-queue-on-overflow, faking ECN doesn’t matter to other traffic - it just turns the faker’s allocation of queue into a dumb, non-AQM one. No problem.

- Jonathan Morton
David Lang
2015-03-21 00:38:26 UTC
Permalink
On Sat, 21 Mar 2015, Jonathan Morton wrote:

>> On 21 Mar, 2015, at 02:25, David Lang <***@lang.hm> wrote:
>>
>> As I said, there are two possibilities
>>
>> 1. if you mark packets sooner than you would drop them, advantage non-ECN
>>
>> 2. if you mark packets and don't drop them until higher levels, advantage ECN, and big advantage to fake ECN
>
> 3: if you have flow isolation with drop-from-longest-queue-on-overflow, faking ECN doesn’t matter to other traffic - it just turns the faker’s allocation of queue into a dumb, non-AQM one. No problem.

so if every flow is isolated so that what it generates has no effect on any
other traffic, what value does ECN provide?

and how do you decide what the fair allocation of bandwidth is between all the
threads?

David Lang
Jonathan Morton
2015-03-21 00:43:58 UTC
Permalink
> On 21 Mar, 2015, at 02:38, David Lang <***@lang.hm> wrote:
>
> On Sat, 21 Mar 2015, Jonathan Morton wrote:
>
>>> On 21 Mar, 2015, at 02:25, David Lang <***@lang.hm> wrote:
>>>
>>> As I said, there are two possibilities
>>>
>>> 1. if you mark packets sooner than you would drop them, advantage non-ECN
>>>
>>> 2. if you mark packets and don't drop them until higher levels, advantage ECN, and big advantage to fake ECN
>>
>> 3: if you have flow isolation with drop-from-longest-queue-on-overflow, faking ECN doesn’t matter to other traffic - it just turns the faker’s allocation of queue into a dumb, non-AQM one. No problem.
>
> so if every flow is isolated so that what it generates has no effect on any other traffic, what value does ECN provide?

A *genuine* ECN flow benefits from reduced packet loss and smoother progress, because the AQM can signal congestion to it without dropping.

> and how do you decide what the fair allocation of bandwidth is between all the threads?

Using DRR. This is what fq_codel does already, as it happens. As does cake.

In other words, the last half-dozen posts have been an argument about a solved problem.

- Jonathan Morton
Michael Welzl
2015-03-22 04:15:48 UTC
Permalink
> On 20. mar. 2015, at 19.25, David Lang <***@lang.hm> wrote:
>
> On Sat, 21 Mar 2015, Steinar H. Gunderson wrote:
>
>> On Fri, Mar 20, 2015 at 05:03:16PM -0700, David Lang wrote:
>>> 1. If you mark packets as congested if they have ECN and drop them
>>> if they don't, programmers will mark everything ECN (and not slow
>>> transmission) because doing so gives them an advantage over
>>> applications that don't mark their packets with ECN
>>
>> I'm not sure if this is actually true. Somehow TCP stacks appear to be tricky
>> enough to mess with that the people who are capable of gaming congestion
>> control algorithms are also wise enough not to do so. Granted, we are seeing
>> some mild IW escalation, but you could very well make a TCP that's
>> dramatically unfair to everything else and deploy that on your CDN, and
>> somehow we're not seeing that.
>
> It doesn't take deep mucking with the TCP stack. A simple iptables rule to OR a bit on as it's leaving the box would make the router think that the system has ECN enabled (or do it on your local gateway if you think it gives you higher priority over the wider network)
>
> If you start talking about ECN and UDP things are even simpler, there's no need to go through the OS stack at all, craft your own packets and send the raw packets
>
>> (OK, concession #2, “download accelerators” are doing really bad things with
>> multiple connections to gain TCP unfairness, but that's on the client side
>> only, not the server side.)
>>
>> Based on this, I'm not convinced that people would bulk-mark their packets as
>> ECN-capable just to get ahead in the queues.
>
> Given the money they will spend and the cargo-cult steps that gamers will do in the hope of gaining even a slight advantage, I can easily see this happening
>
>> It _is_ hard to know when to
>> drop and when to ECN-mark, though; maybe you could imagine the benefits of
>> ECN (for the flow itself) to be big enough that you don't actually need to
>> lower the drop probability (just make the ECN probability a bit higher),
>> but this is pure unfounded speculation on my behalf.
>
> As I said, there are two possibilities
>
> 1. if you mark packets sooner than you would drop them, advantage non-ECN

Agreed, with a risk of starvation of ECN flows as we've seen - this is not easy to get right and shouldn't be "just done somehow".


> 2. if you mark packets and don't drop them until higher levels, advantage ECN, and big advantage to fake ECN

Same level as you would normally drop is what the RFC recommends. Result: advantage ECN mostly because of the end-to-end effects I was explaining earlier, not because of the immediate queuing behavior (as figure 14 in https://www.duo.uio.no/handle/10852/37381 shows). "Big advantage to fake ECN" is the part I don't buy; I explained in more detail in the AQM list.

Cheers,
Michael
Michael Welzl
2015-03-21 00:15:35 UTC
Permalink
> On 21. mar. 2015, at 01.03, David Lang <***@lang.hm> wrote:
>
> On Fri, 20 Mar 2015, Michael Welzl wrote:
>
>>> On 20. mar. 2015, at 17.31, Jonathan Morton <***@gmail.com> wrote:
>>>> On 20 Mar, 2015, at 16:54, Michael Welzl <***@ifi.uio.no> wrote:
>>>> I'd like people to understand that packet loss often also comes with delay - for having to retransmit.
>>> Or, turning it upside down, it’s always a win to drop packets (in the service of signalling congestion) if the induced delay exceeds the inherent RTT.
>>
>> Actually, no: as I said, the delay caused by a dropped packet can be more than 1 RTT - even much more under some circumstances. Consider this quote from the intro of https://tools.ietf.org/html/draft-dukkipati-tcpm-tcp-loss-probe-01 :
>
> You are viewing this as a question to drop a packet or not drop a packet.
>
> The problem is that isn't the actual question.
>
> The question is to drop a packet early and have the sender slow down, or wait until the sender has filled the buffer to the point that all traffic (including acks) is experiencing multi-second latency and then drop a bunch of packets.
>
> In theory ECN would allow for feedback to the sender to have it slow down without any packet being dropped, but in the real world it doesn't work that well.

I think it's about time we finally turn it on in the real world.


> 1. If you mark packets as congested if they have ECN and drop them if they don't, programmers will mark everything ECN (and not slow transmission) because doing so gives them an advantage over applications that don't mark their packets with ECN

I heard this before but don't buy this as being a significant problem (and haven't seen evidence thereof either). Getting more queue space and occasionally getting a packet through that others don't isn't that much of an advantage - it comes at the cost of latency for your own application too unless you react to congestion.


> marking packets with ECN gives an advantage to them in mixed environments
>
> 2. If you mark packets as congested at a lower level than where you drop them, no programmer is going to enable ECN because flows with ECN will be prioritized below flows without ECN

Well.... longer story. Let me just say that marking where you would otherwise drop would be fine as a starting point. You don't HAVE to mark lower than you'd drop.


> If everyone use ECN you don't have a problem, but if only some users/applications do, there's no way to make it equal, so one or the other is going to have an advantage, programmers will game the system to do whatever gives them the advantage

I don't buy this at all. Game to gain what advantage? Anyway I can be more aggressive than everyone else if I want to, by backing off less, or not backing off at all, with or without ECN. Setting ECN-capable lets me do this with also getting a few more packets through without dropping - but packets get dropped at the hard queue limit anyway. So what's the big deal? What is the major gain that can be gained over others?

Cheers,
Michael
David Lang
2015-03-21 00:29:00 UTC
Permalink
On Sat, 21 Mar 2015, Michael Welzl wrote:

>> On 21. mar. 2015, at 01.03, David Lang <***@lang.hm> wrote:
>>
>> On Fri, 20 Mar 2015, Michael Welzl wrote:
>>
>>>> On 20. mar. 2015, at 17.31, Jonathan Morton <***@gmail.com> wrote:
>>>>> On 20 Mar, 2015, at 16:54, Michael Welzl <***@ifi.uio.no> wrote:
>>>>> I'd like people to understand that packet loss often also comes with delay - for having to retransmit.
>>>> Or, turning it upside down, it’s always a win to drop packets (in the service of signalling congestion) if the induced delay exceeds the inherent RTT.
>>>
>>> Actually, no: as I said, the delay caused by a dropped packet can be more than 1 RTT - even much more under some circumstances. Consider this quote from the intro of https://tools.ietf.org/html/draft-dukkipati-tcpm-tcp-loss-probe-01 :
>>
>> You are viewing this as a question to drop a packet or not drop a packet.
>>
>> The problem is that isn't the actual question.
>>
>> The question is to drop a packet early and have the sender slow down, or wait
>> until the sender has filled the buffer to the point that all traffic
>> (including acks) is experiencing multi-second latency and then drop a bunch
>> of packets.
>>
>> In theory ECN would allow for feedback to the sender to have it slow down
>> without any packet being dropped, but in the real world it doesn't work that
>> well.
>
> I think it's about time we finally turn it on in the real world.
>
>
>> 1. If you mark packets as congested if they have ECN and drop them if they
>> don't, programmers will mark everything ECN (and not slow transmission)
>> because doing so gives them an advantage over applications that don't mark
>> their packets with ECN
>
> I heard this before but don't buy this as being a significant problem (and
> haven't seen evidence thereof either). Getting more queue space and
> occasionally getting a packet through that others don't isn't that much of an
> advantage - it comes at the cost of latency for your own application too
> unless you react to congestion.

but the router will still be working to reduce traffic, so more non-ECN flows
will get packets dropped to reduce the
loadhttp://email.chase.com/10385c493layfousub74lnvqaaaaaahg7lbwdgdvonyyaaaaa/C?V=emlwX2NvZGUBAUNVU1RfTEFTVF9OTQFMQU5HAVJFV0FSRFNfQkF
MQU5DRQExNi43MwFnX2luZGV4AQFDVVNUX0ZJUlNUX05NAURBVklEAUxBU1RfNAE1NDE3AWxfaW5kZXgBAXByb2ZpbGVfaWQBNDg0Mzk5MjEyAW1haWxpbmdfaWQBMTE
0OTI5NTU5AV9XQVZFX0lEXwE4NTY2MDAxNzQBX1BMSVNUX0lEXwExNjgwMTYwMQFVTlFfRU5STF9DRAEyMTEyMzkzOTE1AWVtYWlsX2FkX2lkAQFMU1RfU1RNVF9EQVR
FATAyLzAxLzE1AWVtYWlsX2FkZHIBZGF2aWRAbGFuZy5obQFfU0NIRF9UTV8BMjAxNTAzMjAyMTAwMDABcHJvZmlsZV9rZXkBQTE0NjQ3MjgxMTQ%3D&KwXv5L3yGN8q
uPM67mqc0Q

>
>> marking packets with ECN gives an advantage to them in mixed environments
>>
>> 2. If you mark packets as congested at a lower level than where you drop
>> them, no programmer is going to enable ECN because flows with ECN will be
>> prioritized below flows without ECN
>
> Well.... longer story. Let me just say that marking where you would otherwise
> drop would be fine as a starting point. You don't HAVE to mark lower than
> you'd drop.
>
>
>> If everyone use ECN you don't have a problem, but if only some
>> users/applications do, there's no way to make it equal, so one or the other
>> is going to have an advantage, programmers will game the system to do
>> whatever gives them the advantage
>
> I don't buy this at all. Game to gain what advantage? Anyway I can be more
> aggressive than everyone else if I want to, by backing off less, or not
> backing off at all, with or without ECN. Setting ECN-capable lets me do this
> with also getting a few more packets through without dropping - but packets
> get dropped at the hard queue limit anyway. So what's the big deal? What is
> the major gain that can be gained over others?

for gamers, even a small gain can be major. Don't forget that there's also the
perceived advantage "If I do this, everyone else's packets will be dropped and
mine will get through, WIN!!!"

David Lang
Michael Welzl
2015-03-22 04:10:25 UTC
Permalink
> On 20. mar. 2015, at 19.29, David Lang <***@lang.hm> wrote:
>
> On Sat, 21 Mar 2015, Michael Welzl wrote:
>
>>> On 21. mar. 2015, at 01.03, David Lang <***@lang.hm> wrote:
>>>
>>> On Fri, 20 Mar 2015, Michael Welzl wrote:
>>>
>>>>> On 20. mar. 2015, at 17.31, Jonathan Morton <***@gmail.com> wrote:
>>>>>> On 20 Mar, 2015, at 16:54, Michael Welzl <***@ifi.uio.no> wrote:
>>>>>> I'd like people to understand that packet loss often also comes with delay - for having to retransmit.
>>>>> Or, turning it upside down, it’s always a win to drop packets (in the service of signalling congestion) if the induced delay exceeds the inherent RTT.
>>>>
>>>> Actually, no: as I said, the delay caused by a dropped packet can be more than 1 RTT - even much more under some circumstances. Consider this quote from the intro of https://tools.ietf.org/html/draft-dukkipati-tcpm-tcp-loss-probe-01 :
>>>
>>> You are viewing this as a question to drop a packet or not drop a packet.
>>>
>>> The problem is that isn't the actual question.
>>>
>>> The question is to drop a packet early and have the sender slow down, or wait until the sender has filled the buffer to the point that all traffic (including acks) is experiencing multi-second latency and then drop a bunch of packets.
>>>
>>> In theory ECN would allow for feedback to the sender to have it slow down without any packet being dropped, but in the real world it doesn't work that well.
>>
>> I think it's about time we finally turn it on in the real world.
>>
>>
>>> 1. If you mark packets as congested if they have ECN and drop them if they don't, programmers will mark everything ECN (and not slow transmission) because doing so gives them an advantage over applications that don't mark their packets with ECN
>>
>> I heard this before but don't buy this as being a significant problem (and haven't seen evidence thereof either). Getting more queue space and occasionally getting a packet through that others don't isn't that much of an advantage - it comes at the cost of latency for your own application too unless you react to congestion.
>
> but the router will still be working to reduce traffic, so more non-ECN flows will get packets dropped to reduce the loadhttp://email.chase.com/10385c493layfousub74lnvqaaaaaahg7lbwdgdvonyyaaaaa/C?V=emlwX2NvZGUBAUNVU1RfTEFTVF9OTQFMQU5HAVJFV0FSRFNfQkF
> MQU5DRQExNi43MwFnX2luZGV4AQFDVVNUX0ZJUlNUX05NAURBVklEAUxBU1RfNAE1NDE3AWxfaW5kZXgBAXByb2ZpbGVfaWQBNDg0Mzk5MjEyAW1haWxpbmdfaWQBMTE
> 0OTI5NTU5AV9XQVZFX0lEXwE4NTY2MDAxNzQBX1BMSVNUX0lEXwExNjgwMTYwMQFVTlFfRU5STF9DRAEyMTEyMzkzOTE1AWVtYWlsX2FkX2lkAQFMU1RfU1RNVF9EQVR
> FATAyLzAxLzE1AWVtYWlsX2FkZHIBZGF2aWRAbGFuZy5obQFfU0NIRF9UTV8BMjAxNTAzMjAyMTAwMDABcHJvZmlsZV9rZXkBQTE0NjQ3MjgxMTQ%3D&KwXv5L3yGN8q
> uPM67mqc0Q
>
>>
>>> marking packets with ECN gives an advantage to them in mixed environments
>>>
>>> 2. If you mark packets as congested at a lower level than where you drop them, no programmer is going to enable ECN because flows with ECN will be prioritized below flows without ECN
>>
>> Well.... longer story. Let me just say that marking where you would otherwise drop would be fine as a starting point. You don't HAVE to mark lower than you'd drop.
>>
>>
>>> If everyone use ECN you don't have a problem, but if only some users/applications do, there's no way to make it equal, so one or the other is going to have an advantage, programmers will game the system to do whatever gives them the advantage
>>
>> I don't buy this at all. Game to gain what advantage? Anyway I can be more aggressive than everyone else if I want to, by backing off less, or not backing off at all, with or without ECN. Setting ECN-capable lets me do this with also getting a few more packets through without dropping - but packets get dropped at the hard queue limit anyway. So what's the big deal? What is the major gain that can be gained over others?
>
> for gamers, even a small gain can be major. Don't forget that there's also the perceived advantage "If I do this, everyone else's packets will be dropped and mine will get through, WIN!!!"

I just addressed this with a message to the AQM list (should soon be in the archives: http://www.ietf.org/mail-archive/web/aqm/current/maillist.html ). In short, I don't see any clear indications for this "benefit". And clearly game developers also want low delay - blowing up the queue creates more delay... and without clear knowledge about how many flows are actively filling up the queue in parallel, there is a risk of creating extra delay with this for no actual benefit whatsoever.

Cheers,
Michael
Jonathan Morton
2015-03-20 18:14:03 UTC
Permalink
> On 20 Mar, 2015, at 16:11, Livingood, Jason <***@cable.comcast.com> wrote:
>
>> Even when you get to engineers in the organizations who build the equipment, it's hard. First you have to explain that "more is not better", and "some packet loss is good for you".
>
> That’s right, Jim. The “some packet loss is good” part is from what I have seen the hardest thing for people to understand. People have been trained to believe that any packet loss is terrible, not to mention that you should never fill a link to capacity (meaning either there should never be a bottleneck link anywhere on the Internet and/or that congestion should never occur anywhere).

That’s a rather interesting combination of viewpoints to have - and very revealing, too, of a fundamental disconnect between their mental theory of how the Internet works and how the Internet is actually used.

So here are some talking points that might be useful in an elevator pitch. The wording will need to be adjusted to circumstances.

In short, they’re thinking only about the *core* Internet. There, not being the bottleneck is a reasonably good idea, and packet loss is a reasonable metric of performance. Buffers are used to absorb momentary bursts exceeding the normal rate, and since the link is supposed to never be congested, it doesn’t matter for latency how big those buffers are. Adding capacity to satisfy that assumption is relatively easy, too - just plug in another 10G Ethernet module for peering, or another optical transceiver on a spare light-frequency for transit. Or so I hear.

But nobody sees the core Internet except a few technician types in shadowy datacentres. At least 99.999% of Internet users have to deal with the last mile on a daily basis - and it’s usually the last mile that is the bottleneck, unless someone *really* screwed up on a peering arrangement. The key technologies in the last mile are the head-end, the CPE modem, and the CPE router; the last two might be in the same physical box as each other. Those three are where we’re focusing our attention.

There, the basic assumption that the link should never be loaded to capacity is utter bunk. The only common benchmarks of Internet performance that most people have access to (and which CPE vendors perform) are to do precisely that, and see just how big they can make the resulting bandwidth number. And as soon as anyone starts a big TCP/IP-based upload or download, such as a software update or a video, the TCP stack in any modern OS will do its level best to load the link to capacity - and beyond. This is more than a simple buffer - of *any* size - can deal with.

As an aside, it’s occasionally difficult to convince last-mile ISPs that packet loss (of several percent, due to line quality, not congestion) *is* a problem. But in that case, it’s probably because it would cost money (and thus profit margin) to send someone out to fix the underlying physical cause. It really is a different world.

Once upon a time, the receive window of TCP was limited to 64KB, and the momentary bursts that could be expected from a single flow were limited accordingly. Those days are long gone. Given the chance, a modern TCP stack will increase the receive and congestion window to multi-megabyte proportions. Even on a premium, 100Mbps cable or FTTC downlink (which most consumers can’t afford and often can’t even obtain), that corresponds to roughly a whole second of buffering; an order of magnitude above the usual rule of thumb for buffer sizing. On slower links, the proportions are even more outrageous. Something to think about next time you’re negotiating microseconds with a high-frequency trading outfit.

I count myself among the camp of “packet loss is bad”. However, I have the sense to realise that if more packets are persistently coming into a box than can be sent out the other side, some of those packets *will* be lost, sooner or later. What AQM does is to signal (either through early loss or ECN marking) to the TCP endpoints that the link capacity has been reached, and it can stop pushing now - please - thank you. This allows the buffer to do its designed job of absorbing momentary bursts.

Given that last-mile links are often congested, it becomes important to distinguish between latency-sensitive and throughput-sensitive traffic flows. VoIP and online gaming are the most obvious examples of latency-sensitive traffic, but Web browsing is *also* more latency-sensitive than throughput-sensitive, for typical modern Web pages. Video streaming, software updates and uploading photos are good examples of throughput-sensitive applications; latency doesn’t matter much to them, since all they want to do is use the full link capacity.

The trouble is that often, in the same household, there are several different people using the same last-mile link, and they will tend to get home and spend their leisure time on the Internet at roughly the same time as each other. The son fires up his console to frag some noobs, and Mother calls her sister over VoIP; so far so good. But then Father decides on which movie to watch later that evening and starts downloading it, and the daughter starts uploading photos from her school field trip to goodness knows where. So there are now two latency-sensitive and and two throughput-sensitive applications using this single link simultaneously, and the throughput-sensitive ones have immediately loaded the link to capacity in both directions (one each).

So what happens then? You tell me - you know your hardware the best. Or haven’t you measured its behaviour under those conditions? Oh, for shame!

Okay, I’ll tell you what happens with 99.9% of head-end and CPE hardware out there today: Mother can’t hear her sister properly any more, nor vice versa. And not just because the son has just stormed out of his bedroom yelling about lag and how he would have pwned that lamer if only that crucial shot had actually gone where he knows he aimed it. But as far as Father and the daughter are concerned, the Internet is still working just fine - look, the progress bars are ticking along nicely! - until, that is, Father wants to read the evening news, but the news site’s front page takes half a minute to load, and half the images are missing when it does.

And Father knows that calling the ISP in the morning (when their call centre is open) won’t help. They’ll run tests and find absolutely nothing wrong, and not-so-subtly imply that he (or more likely his wife) is an idiotic time-waster. Of course, a weekday morning isn't when everyone’s using it, so nothing *is* wrong. The link is uncongested at the time of testing, latency is as low as it should be, and there’s no line-quality packet loss. The problem has mysteriously disappeared - only to reappear in the evening. It’s not even weather related, and the ISP insists that they have adequate backhaul and peering capacity.

So why? Because the throughput-sensitive applications fill not only the link capacity but the buffers in front of it (on both sides). Since it takes time for a packet at the back of each queue to reach the link, this induces latency - typically *hundreds* of milliseconds of it, and sometimes even much more than that; *minutes* in extreme cases. But both a VoIP call and a typical online game require latencies *below one hundred* milliseconds for optimum performance. That’s why Mother and the son had their respective evening activities ruined, and Father’s experience with the news site is representative of a particularly bad case.

The better AQM systems now available (eg. fq_codel) can separate latency-sensitive traffic from throughput-sensitive traffic and give them both the service they need. This will give your customers a far better experience in the reasonably common situation I just outlined - but only if you put it in your hardware product and make sure that it actually works. Otherwise, you’ll start losing customers to the first competitor who does.

- Jonathan Morton
Loading...