Discussion:
quick review and rant of "Identifying and Handling Non Queue Building Flows in a Bottleneck Link"
Add Reply
Dave Taht
2018-10-29 04:02:45 UTC
Reply
Permalink
Dear Greg:

I don't feel like commenting much on ietf matters these days
but, jeeze, I'm really tired of rehashed arguments based on bad papers
that aren't even cited in the document, that aren't talking about the
real problem.

I get that some magical solution to a non-problem is wanted here:

https://tools.ietf.org/id/draft-white-tsvwg-nqb-00.txt

but then I just read this part... and had to cringe and respond.

Flow queueing approaches (such as fq_codel RFC 8290 [RFC8290]), on
the other hand, achieve latency improvements by associating packets
into "flow" queues and then prioritizing "sparse flows", i.e. packets
that arrive to an empty flow queue. Flow queueing does not attempt
to differentiate between flows on the basis of value (importance or
latency-sensitivity), it simply gives preference to sparse flows, and
tries to guarantee that the non-sparse flows all get an equal share

Nope, that's not what it does. please strike the sentence "it simply
gives preference... " and replace with something like "all flows
get as fully mixed as possible."

It's just slightly better min/max fair than most other fair queuing
mechanisms.

Better:

Flow queueing approaches (such as fq_codel RFC 8290 [RFC8290]), on
the other hand, achieve latency improvements by breaking flows back
into individual packets, and then prioritizing "sparse flows", i.e. packets
that arrive to an empty flow queue that empties every round.

Flow queueing does not attempt to differentiate between flows on the
basis of value (importance or latency-sensitivity), it simply
interleaves flows as best as possible, giving a slight preference to
sparse flows.

As a result, fq mechanisms are appropriate for unmanaged environments
and general internet traffic.

Editorial:

Are. Have. A full deployment is basically done now. fq_codel is
universally "on" on most linuxes, with sch_fq and pacing filling in the
rest. a FQ technique is a near universal default in the cloud
now. deployed in the 10s of millions on commercial linux APs. fq_codel
is the default in OSX wifi. It's now in freebsd, and there was a great
thread recently on how it all works in pfsense here:
https://forum.netgate.com/topic/112527/playing-with-fq_codel-in-2-4/709

Got any users for this other stuff yet?

Downsides to this approach can include loss of low latency performance
due to hash collisions (where a sparse flow shares a queue with a
bulk data flow)

sch_fq has *no* buckets. it's *perfectly* fair to millions of tcp flows.

fq_codel hash collision probability is sqrt(buckets), 1024 has thus far
been pretty good (32). When you combine that with the probability of
there being a fat flow to collide with (call it 3%? of all flows), the
real world impact of these collisions is nearly immeasurable against
normal traffic.

vs a single queue!!! which imposes the queue-length from one fat flow on
*all* flows. The probability of that happening is 100%.

Jeeze. Describing a "loss of low latency performance" this way is a 99%
lie vs what you are comparing against, and it's because I'm ranting
today since I've encountered this lie a dozen times in a dozen documents
as if repetition made it true!

NOW...

I NOTE HOWEVER: We happen to agree that that one collision in a few
hundred in the case of something you really really care about is too
much. So we added not only 8 way set associative hashing to sch_cake
(collision probability of roughly zero), but full support for diffserv
classification, and it's shipping in linux 4.19. (been available for two
years in openwrt, ubnt, evenroute and elsewhere). It's got great support
for docsis framing also.

I sure hope more people outside the ietf try it. Oh yea, we also
defaulted sch_cake to per host FQ (running simultaneously with the per
flow FQ), which works, even through nat, and that does a great job of
giving low latency to all those IOT devices you might have... keeps
torrents even more under control...

https://kernelnewbies.org/Linux_4.19#Better_networking_experience_with_the_CAKE_queue_management_algorithm

Anyway, observed use of diffserv markings is at nearly nil, and comcast
rewrites all traffic it doesn't recognise as CS1, so we wash that clean
on inbound.

And most of the time, at real bandwidths, fq_codel more than sufficies,
except when you need to toss off a one-liner for a shaper, like

tc qdisc add dev eth0 root cake docsis bandwidth 20mbit ack-filter

"complexity in managing a large number of queues"

fq_codel, um, a couple hundred lines of code. Trivial, compared to just
about anything real. tcp is a few thousand, for example. It's smaller
than most device drivers. The core of the algorithm is 20 lines.

(cake is more complicated)

Please stop calling it complicated. The codebase for nearly any queuing
system is going to be within 10-20% of each other...

THE REAL ELEPHANT IN THE ROOM is SHAPING.

... the 680ms of totally gratiutious buffering comcast CMTSes have at
100mbit, and 280 at 10mbit up. Having to outbound shape is not a problem
with any hardware we have deployed, even with cpus designed in the late
80s. Inbound shaping of the "Badwidth" being provided by ISPs today is
CPU intensive, easily 95% of the overall cost and complexity of doing
queuing "right". Can we have a discussion on fixing SHAPING somewhere
in the ietf? Can we get ISPs to at the very least, buffer no more than a
85ms of data?

"and
the scheduling (typically DRR) that enforces that each non-sparse
flow gets an equal fraction of link bandwidth causes problems with
VPNs and other tunnels"

strike "non-sparse" and say "saturating". The problem is MOST VPN
traffic is not saturating either, and when it is, it's usually managed
by a lousy FIFO queue in the first place.

So fq_codel gained the ability to also hash stuff in a terminating
tunnel with really nice results for voip in particular a few years
back. I can't find the test right now, but we observed peak latency and
jitter in the 100ms range before doing that, and 2ms after.

Sure, in contrast to other saturating flows outside the tunnel, you
might get less bandwidth, but all the vpn bandwidth you do get, is
goodwidth. For all traffic.

And it's per vpn tunnel. These days I see one vpn per user...

Did I rant already that the vast majority of flows are non-saturating?

I really wish, just once, one day that the l4s effort would actually try
real traffic in a real scenario, with web, voip, dns, gaming. I've
reached the point where I just start calling crazy conclusions based on
crazy assumptions "big buck bunny scenarios".

start with an hour's packet capture of your home gateways normal
traffic. count the number of distinct flows.... or your office gateways'...

, exhibits poor behavior with less-aggressive
CA algos, e.g. LEDBAT, and exhibits poor behavior with RMCAT CA
algos.

I would not call it poor. I would call it undesirable based on the
intent of the algorithm's designers. HOWEVER:

LEDBAT advocates think that imposing 100ms of queuing delay on all flows
is *ok*. To me, that's *poor*. In fact - *nuts*.

https://perso.telecom-paristech.fr/drossi/paper/rossi14comnet-b.pdf

I really wish I wasn't outnumbered on writing that paper, and could
release a sequel to this based on real world results...

'cause fq_codel users run bittorrent all day long and don't notice its
there. Every freaking day. I'm running 10 torrents right now. Never
notice. I knew when we wrote that we'd obsolete LPCC's brain damaged
100ms induced latency concept as a "good" method... *entirely*, and we
did, and *WE DON'T CARE*. You shouldn't either.

Try running torrent all day on cable without an inbound fq_codel
shaper. Just try it. Chose.

* "Exhibits poor behavior with rmcat CA algos."

The relevant paper for this was terrible, and based on an artificial benchmark not modeling real traffic, and was at 2Mbit to boot, by a very biased observer.

Videoconferencing traffic works GREAT with fq_codel at higher bandwidths
against all forms of real traffic. 20mbit down/4mbit up and above.

2Mbit is just going to always suck for videoconferencing, above that...

To quote VJ : "low rate videoconferencing never gets dropped"

Which is really the measured case, in real traffic, on real networks,
doing videoconferencing. Try it.

" In effect the network element is making a decision as to what
constitutes a flow, and then forcing all such flows to take equal
bandwidth at every instant."

Still wouldn't say it that way. It's maximizing entropy that matters.
Microbursts get ripped out. Jitter vanishes. Only fat flows see "equal
bandwidth".

My biggest bugaboo with the l4s thing is that somehow y'all think that
traffic will pre-arrive, automagically dispersed in statistically random
ways, and that does not happen. Ever. Traffic is naturally bursty and
needs to be broken up.

Anyway this rant is somewhat mis-directed. I do hope some magical way of
identifying sparser flows appears, other than FQ already does so
beautifully, but until then I really wish more people would repeat less
bullshit and go actually use the stuff they are dissing and understand
it more deeply.

It certainly is possible to do better entropy e2e than we do.

There has certainly been wonderful things with fq + pacing of late, and
I look forward to things like the new etx scheduler and intel hardware
support for timerwheels to be widely deployed.
Toke Høiland-Jørgensen
2018-10-29 14:15:13 UTC
Reply
Permalink
Hi Greg

Since Dave's email prompted me to read your draft, I thought I'd chime
3.2. Queuing behavior analysis
Similar to the queue protection function outlined in the previous
section, it may be feasible to devise a real time flow analyzer for a
node that would identify flows that are causing queue build up, and
redirect those flows to the QB queue, leaving the remaining flows in
the NQB queue.
To me, this sounds exactly like a description of what the sparse flow
optimisation of FQ-CoDel, so I'm puzzled why you don't believe this to
be the case? Sure, FQ-CoDel uses its per-flow queueing system as a
mechanism to achieve this, but if you really can't do per-flow queueing,
it is conceivable that the mechanism could be implemented without it.
The high-level logic is basically (in your terms) "on enqueue, does this
flow already have a packet queued? if yes, it's a QB flow, if no, it's
an NQB flow". It is simple, but quite effective.

I've recently analysed the behaviours that fall out of this simple
mechanism in some detail, which may be relevant to your efforts. See
this publication in IEEE Communication Letters:
https://doi.org/10.1109/LCOMM.2018.2871457

-Toke
Greg White
2018-10-31 00:12:39 UTC
Reply
Permalink
Hi Toke, thanks for the pointer to your paper, I had not seen it before.

I agree that could be a candidate algorithm. It is certainly simple. It may not be the only (or perhaps even the best) solution for the dual-queue case though. I'm thinking in the context of an L4S TCP flow, which can respond quickly to "new" ECN markings and achieve link saturation with ultra low (but not zero) queuing delay. A good property for a queue protection algorithm would be that these L4S flows could be consistently placed in the NQB queue. I think that the simple approach you mentioned would result in many L4S flows being deemed Queue Building.

Additionally, we've observed applications that send variable sized "messages" at a fixed rate (e.g. 20 messages/second) where the message sometimes exceeds the MTU and results in two closely spaced (possibly back-to-back) packets. This is a flow that I think should be considered to be NQB, but would get flagged as QB by the simple approach. You described this case in your paper, where you state that the first Q bytes of each burst will be treated as NQB (the first packet in the case we're talking about here), but the rest will be treated as QB. Assuming that message latency is important for these sorts of applications, this is equivalent to saying that the entire burst is considered as QB. In the fq_codel case, the message latency would be something like Q(n-1)(N+1)/R (assuming no other sparse flow arrivals), something like 1.3ms using the example values in your paper (plus n=2, N=10) which may be ok. In the dual-queue case it is a bigger deal, because the remaining packets would be put at the end of the QB queue, which could have a latency of 10 or 20 ms.

So, a queue protection function that provides a bit more (but still limited) allowance for a flow to have packets in queue would likely work better in the dual-queue case.

-Greg



On 10/29/18, 8:15 AM, "Toke Høiland-Jørgensen" <***@toke.dk> wrote:

Hi Greg

Since Dave's email prompted me to read your draft, I thought I'd chime
3.2. Queuing behavior analysis
Similar to the queue protection function outlined in the previous
section, it may be feasible to devise a real time flow analyzer for a
node that would identify flows that are causing queue build up, and
redirect those flows to the QB queue, leaving the remaining flows in
the NQB queue.
To me, this sounds exactly like a description of what the sparse flow
optimisation of FQ-CoDel, so I'm puzzled why you don't believe this to
be the case? Sure, FQ-CoDel uses its per-flow queueing system as a
mechanism to achieve this, but if you really can't do per-flow queueing,
it is conceivable that the mechanism could be implemented without it.
The high-level logic is basically (in your terms) "on enqueue, does this
flow already have a packet queued? if yes, it's a QB flow, if no, it's
an NQB flow". It is simple, but quite effective.

I've recently analysed the behaviours that fall out of this simple
mechanism in some detail, which may be relevant to your efforts. See
this publication in IEEE Communication Letters:
https://doi.org/10.1109/LCOMM.2018.2871457

-Toke
Toke Høiland-Jørgensen
2018-11-01 13:25:01 UTC
Reply
Permalink
Post by Greg White
Hi Toke, thanks for the pointer to your paper, I had not seen it before.
You're welcome :)
Post by Greg White
I agree that could be a candidate algorithm. It is certainly simple.
It may not be the only (or perhaps even the best) solution for the
dual-queue case though. I'm thinking in the context of an L4S TCP
flow, which can respond quickly to "new" ECN markings and achieve link
saturation with ultra low (but not zero) queuing delay. A good
property for a queue protection algorithm would be that these L4S
flows could be consistently placed in the NQB queue. I think that the
simple approach you mentioned would result in many L4S flows being
deemed Queue Building.
Yes, I think you are right (depending on traffic mix, of course). It
might be possible to tweak it to work better, though. E.g., by changing
the threshold (moving flows to QB if they end up with more than X ms of
queue). This would only work if you start out all flows at NQB, with the
associated aggressive marking behaviour; so I'm not sure if a normal TCP
flow would ever manage to get to QB state before getting clobbered by
the NQB markings...
Post by Greg White
Additionally, we've observed applications that send variable sized
"messages" at a fixed rate (e.g. 20 messages/second) where the message
sometimes exceeds the MTU and results in two closely spaced (possibly
back-to-back) packets. This is a flow that I think should be
considered to be NQB, but would get flagged as QB by the simple
approach. You described this case in your paper, where you state that
the first Q bytes of each burst will be treated as NQB (the first
packet in the case we're talking about here), but the rest will be
treated as QB. Assuming that message latency is important for these
sorts of applications, this is equivalent to saying that the entire
burst is considered as QB. In the fq_codel case, the message latency
would be something like Q(n-1)(N+1)/R (assuming no other sparse flow
arrivals), something like 1.3ms using the example values in your paper
(plus n=2, N=10) which may be ok. In the dual-queue case it is a
bigger deal, because the remaining packets would be put at the end of
the QB queue, which could have a latency of 10 or 20 ms.
Sure, it's by no means a perfect mechanism. But it makes up for that by
it's simplicity, IMO. And it does work really well for *a lot* of
today's latency-sensitive traffic.

(In your case of two-MTU messages, you could tune the quantum to allow
those; but of course you can construct examples that won't work).
Post by Greg White
So, a queue protection function that provides a bit more (but still
limited) allowance for a flow to have packets in queue would likely
work better in the dual-queue case.
Yeah, that's tricky, especially if you want it to be very accurate in
its distinction; which I sort of gather that you do, right?

-Toke
Dave Taht
2018-11-01 19:13:19 UTC
Reply
Permalink
Post by Toke Høiland-Jørgensen
Post by Greg White
Hi Toke, thanks for the pointer to your paper, I had not seen it before.
You're welcome :)
Post by Greg White
I agree that could be a candidate algorithm. It is certainly simple.
It may not be the only (or perhaps even the best) solution for the
dual-queue case though. I'm thinking in the context of an L4S TCP
flow, which can respond quickly to "new" ECN markings and achieve link
saturation with ultra low (but not zero) queuing delay. A good
property for a queue protection algorithm would be that these L4S
flows could be consistently placed in the NQB queue. I think that the
simple approach you mentioned would result in many L4S flows being
deemed Queue Building.
Yes, I think you are right (depending on traffic mix, of course). It
might be possible to tweak it to work better, though. E.g., by changing
the threshold (moving flows to QB if they end up with more than X ms of
queue). This would only work if you start out all flows at NQB, with the
associated aggressive marking behaviour; so I'm not sure if a normal TCP
flow would ever manage to get to QB state before getting clobbered by
the NQB markings...
Post by Greg White
Additionally, we've observed applications that send variable sized
"messages" at a fixed rate (e.g. 20 messages/second) where the message
sometimes exceeds the MTU and results in two closely spaced (possibly
back-to-back) packets. This is a flow that I think should be
considered to be NQB, but would get flagged as QB by the simple
approach. You described this case in your paper, where you state that
the first Q bytes of each burst will be treated as NQB (the first
packet in the case we're talking about here), but the rest will be
treated as QB. Assuming that message latency is important for these
sorts of applications, this is equivalent to saying that the entire
burst is considered as QB. In the fq_codel case, the message latency
would be something like Q(n-1)(N+1)/R (assuming no other sparse flow
arrivals), something like 1.3ms using the example values in your paper
(plus n=2, N=10) which may be ok. In the dual-queue case it is a
bigger deal, because the remaining packets would be put at the end of
the QB queue, which could have a latency of 10 or 20 ms.
Sure, it's by no means a perfect mechanism. But it makes up for that by
it's simplicity, IMO. And it does work really well for *a lot* of
today's latency-sensitive traffic.
(In your case of two-MTU messages, you could tune the quantum to allow
those; but of course you can construct examples that won't work).
sch_fq has a default quantum of 2 mtu, with a initial burst of 10.
There's all sorts of interesting work inside that to "right-size" the
ongoing gso offloads and a major new advance over there on calculating
rtts properly is described here:

https://lwn.net/Articles/766564/

...

there was at one point, a fq_pie implementation that used the rbtree
in sch_fq to
achieve perfect fairness.

we often tune fq_codel's quantum as low as 300, at low rates.
Post by Toke Høiland-Jørgensen
Post by Greg White
So, a queue protection function that provides a bit more (but still
limited) allowance for a flow to have packets in queue would likely
work better in the dual-queue case.
For inbound shaping sch_cake defaults to 2mtu at higher rates.

This kind of opens a question in that, what is a typical target
bandwidth for l4s applications?

the videoconferencing paper I dissed in my earlier rant focused only
at 2mbits (and drew the conclusion that sfq was the best option for
the cc algo's 300ms desired range)

I have generally been focused on badwidths in the 4mbit-40gbit range.
Post by Toke Høiland-Jørgensen
Yeah, that's tricky, especially if you want it to be very accurate in
its distinction; which I sort of gather that you do, right?
In part, that is why I would like the language in the problem
statement clarified. To give a concrete counterexample,
think upon the ultimate choice of a crc32 algorithm and the parameters
that drove that choice.

If an accuracy of identifying "sparse flows" or NQB flows can be
specified, then all sorts of algorithms can be thrown into the fray.
AI. Bloom filters. rbtrees. regexes. cookoo++ hashes... and boring
old fq tech. :)
Post by Toke Høiland-Jørgensen
-Toke
_______________________________________________
Bloat mailing list
https://lists.bufferbloat.net/listinfo/bloat
--
Dave Täht
CTO, TekLibre, LLC
http://www.teklibre.com
Tel: 1-831-205-9740
Greg White
2018-11-06 04:17:38 UTC
Reply
Permalink
Hi Dave,

Thanks for the constructive review comments on the text of the draft, including pointers to references that could be included. I'll take those comments on in the next revision.

As I know you are aware, the majority of bottleneck links are not linux machines (running tc and feeding packets to a continuous dedicated link), so comparing lines of kernel code as a metric for complexity isn't the most useful. In the bottleneck links that I am concerned about (and other similar links), it is advantageous to tightly couple the queuing and scheduling mechanisms with the hardware-optimized discontinuous MAC layer, and also take into account engineering tradeoffs that are different than those that apply in a linux machine.

The point about bittorrent and RMCAT wasn't about latency, it was about throughput. I have come to share the view that others have expressed that it is a valuable property of the internet that congestion control algorithm developers are free to innovate. FQ in the bottleneck link diminishes this ability in some important ways. Some applications (cloud backup, mass uploads like the icloud photo library updates, bittorrent, etc) might not be interested in per-flow fairness with TCP, but would prefer to be less aggressive and thus take a smaller fraction of the link when there are competing TCP flows (note, I am not arguing that the *current* ledbat protocol is the right one). Other applications (RMCAT) might in the future aim to be per-flow fair, but need slightly longer to adapt than FQ allows.

Some folks are just uncomfortable with the departure from the "end-to-end" principal with fq (i.e. a middlebox gets to determine what constitutes a "flow", then enforces that all fat "flows" get equal bandwidth at every instant). Fq_codel is a pragmatic solution for some systems, and it gives a real undeniable benefit in common cases with current applications/mixes, but it isn't perfect. Perhaps sch_cake is better in some cases, but it makes a lot more decisions on behalf of the user that might be ok now, but may not be in the future. In light of all of the above, I'm open to considering other alternatives. So, I don't see this as a non-problem, and I don't see the potential solutions as being in any way magical.

-Greg
Post by Toke Høiland-Jørgensen
Post by Greg White
Hi Toke, thanks for the pointer to your paper, I had not seen it before.
You're welcome :)
Post by Greg White
I agree that could be a candidate algorithm. It is certainly simple.
It may not be the only (or perhaps even the best) solution for the
dual-queue case though. I'm thinking in the context of an L4S TCP
flow, which can respond quickly to "new" ECN markings and achieve link
saturation with ultra low (but not zero) queuing delay. A good
property for a queue protection algorithm would be that these L4S
flows could be consistently placed in the NQB queue. I think that the
simple approach you mentioned would result in many L4S flows being
deemed Queue Building.
Yes, I think you are right (depending on traffic mix, of course). It
might be possible to tweak it to work better, though. E.g., by changing
the threshold (moving flows to QB if they end up with more than X ms of
queue). This would only work if you start out all flows at NQB, with the
associated aggressive marking behaviour; so I'm not sure if a normal TCP
flow would ever manage to get to QB state before getting clobbered by
the NQB markings...
Post by Greg White
Additionally, we've observed applications that send variable sized
"messages" at a fixed rate (e.g. 20 messages/second) where the message
sometimes exceeds the MTU and results in two closely spaced (possibly
back-to-back) packets. This is a flow that I think should be
considered to be NQB, but would get flagged as QB by the simple
approach. You described this case in your paper, where you state that
the first Q bytes of each burst will be treated as NQB (the first
packet in the case we're talking about here), but the rest will be
treated as QB. Assuming that message latency is important for these
sorts of applications, this is equivalent to saying that the entire
burst is considered as QB. In the fq_codel case, the message latency
would be something like Q(n-1)(N+1)/R (assuming no other sparse flow
arrivals), something like 1.3ms using the example values in your paper
(plus n=2, N=10) which may be ok. In the dual-queue case it is a
bigger deal, because the remaining packets would be put at the end of
the QB queue, which could have a latency of 10 or 20 ms.
Sure, it's by no means a perfect mechanism. But it makes up for that by
it's simplicity, IMO. And it does work really well for *a lot* of
today's latency-sensitive traffic.
(In your case of two-MTU messages, you could tune the quantum to allow
those; but of course you can construct examples that won't work).
sch_fq has a default quantum of 2 mtu, with a initial burst of 10.
There's all sorts of interesting work inside that to "right-size" the
ongoing gso offloads and a major new advance over there on calculating
rtts properly is described here:

https://lwn.net/Articles/766564/

...

there was at one point, a fq_pie implementation that used the rbtree
in sch_fq to
achieve perfect fairness.

we often tune fq_codel's quantum as low as 300, at low rates.
Post by Toke Høiland-Jørgensen
Post by Greg White
So, a queue protection function that provides a bit more (but still
limited) allowance for a flow to have packets in queue would likely
work better in the dual-queue case.
For inbound shaping sch_cake defaults to 2mtu at higher rates.

This kind of opens a question in that, what is a typical target
bandwidth for l4s applications?

the videoconferencing paper I dissed in my earlier rant focused only
at 2mbits (and drew the conclusion that sfq was the best option for
the cc algo's 300ms desired range)

I have generally been focused on badwidths in the 4mbit-40gbit range.
Post by Toke Høiland-Jørgensen
Yeah, that's tricky, especially if you want it to be very accurate in
its distinction; which I sort of gather that you do, right?
In part, that is why I would like the language in the problem
statement clarified. To give a concrete counterexample,
think upon the ultimate choice of a crc32 algorithm and the parameters
that drove that choice.

If an accuracy of identifying "sparse flows" or NQB flows can be
specified, then all sorts of algorithms can be thrown into the fray.
AI. Bloom filters. rbtrees. regexes. cookoo++ hashes... and boring
old fq tech. :)
Post by Toke Høiland-Jørgensen
-Toke
_______________________________________________
Bloat mailing list
https://lists.bufferbloat.net/listinfo/bloat
--

Dave Täht
CTO, TekLibre, LLC
http://www.teklibre.com
Tel: 1-831-205-9740

Michael Welzl
2018-11-01 10:39:18 UTC
Reply
Permalink
Hi,
Post by Dave Taht
I don't feel like commenting much on ietf matters these days
but, jeeze,
(snip)
Post by Dave Taht
Did I rant already that the vast majority of flows are non-saturating?
That's a bug, not a feature - and you seem to treat it as an unchangeable fact.
Despite the undebatable importance of bufferbloat and its impact on e2e packet latency, this is only one of the factors playing into the "latency" that I perceive when I click on the link as I surf the Internet.
Flow completion time has to do with saturation as well.

Cheers,
Michael
Dave Taht
2018-11-01 14:20:34 UTC
Reply
Permalink
Post by Michael Welzl
Hi,
Post by Dave Taht
I don't feel like commenting much on ietf matters these days
but, jeeze,
(snip)
Post by Dave Taht
Did I rant already that the vast majority of flows are non-saturating?
That's a bug, not a feature - and you seem to treat it as an unchangeable fact.
It's been the case for 50 years. Until we have one sole source of data
(which given industry consolidation is seemingly increasingly likely),
coming from one big machine (unlikely). The original cablelabs (2012?)
study of this stuff assumed that growth of a web page would be past
6MB by now, it instead is under 3.
Post by Michael Welzl
Despite the undebatable importance of bufferbloat and its impact on e2e packet latency, this is only one of the factors playing into the "latency" that I perceive when I click on the link as I surf the Internet.
Doing a breakdown of that latency - most of that seems solved...

Background prefetch
DNS lookup
SSL connection negotiation
The actual transfer.
Screen draw....

I'm still missing your point. Is looking for "sparseness" part of a
CCN-like effort?

In my mind QUIC, when basically nailed up, with its improvements to
tcp (0-rtt crypto, pacing, bbr) is in the end game here, and "if only"
fq existed on the cmts's as I'd had so much hope for after the arris
study

bufferbloat would be basically be over and dealt with.

And most flows would still be short.
Post by Michael Welzl
Flow completion time has to do with saturation as well.
FCT was not a subject of that draft.

My (admittedly ranty) points were:

* Mis-representation of how fq_codel actually worked for political purposes,
in particular mischaracterizing the behavior for videoconferencing
over more typical links...

* a complaint that despite all the work on bufferbloat, measured CMTS
buffering was still in the 680ms range at 100mbit. If anyone had
bothered to click on the pfsense discussion, I "subtly" pointed at a
FIOS measurement of 120ms...

Still waiting for that first step from the US cable industry. Things
are getting MUCH better in the past 2 years (see
http://www.dslreports.com/speedtest/results/bufferbloat ) just not (if
you scroll down) in the US.

I don't, incidentally, know why they are getting better... certainly I
have some good numbers on the size of the wifi deployment...

* Pointing out that sch_cake did diffserv, per host and per flow fq,
and docsis framing. Some of which had applicability to the draft.

We have not the slightest intent to bring that to the ietf, but it
does tackle and solve what little there was of a "collision" problem,
thoroughly... the diffserv stuff might help (I routinely give dns a
boost)... feel free to go reflash a router and see if it helps in your
scenarios.

Inbound shaping is cpu intensive, but it works pretty well.
Post by Michael Welzl
Cheers,
Michael
_______________________________________________
Bloat mailing list
https://lists.bufferbloat.net/listinfo/bloat
--
Dave Täht
CTO, TekLibre, LLC
http://www.teklibre.com
Tel: 1-831-205-9740
Michael Welzl
2018-11-04 18:16:51 UTC
Reply
Permalink
Hi,
Post by Dave Taht
Post by Michael Welzl
Despite the undebatable importance of bufferbloat and its impact on e2e packet latency, this is only one of the factors playing into the "latency" that I perceive when I click on the link as I surf the Internet.
Doing a breakdown of that latency - most of that seems solved...
Background prefetch
DNS lookup
SSL connection negotiation
The actual transfer.
Screen draw....
I'm still missing your point. Is looking for "sparseness" part of a
CCN-like effort?
No, it’s just about flow completion time (“The actual transfer”) above being a function not only of the queue length, but also of the capacity the flow gets to use. Hence the push for a larger initial window.
Post by Dave Taht
Post by Michael Welzl
Flow completion time has to do with saturation as well.
FCT was not a subject of that draft.
Right - sorry for side-tracking.
I read them - I didn’t want to get into this debate, it was only a side comment about not all flows being limited, and there being some value in better capacity usage too.

Cheers,
Michael
Michael Welzl
2018-11-01 19:12:11 UTC
Reply
Permalink
I was thinking about the web. You’re right about all the rest.

Cheers
Michael


Sent from my iPhone
Post by Michael Welzl
Post by Dave Taht
I don't feel like commenting much on ietf matters these days
but, jeeze,
(snip)
Post by Dave Taht
Did I rant already that the vast majority of flows are non-saturating?
That's a bug, not a feature - and you seem to treat it as an unchangeable fact.
Why would you think that saturating flows should be common? A very large percentage of Internet traffic is streaming audio/video and that should never saturate a link, it should be pacing the data to the rate of the content.
DNS queries are not going to be saturating.
queries to check cache validity are not going to be saturating.
microservices calls (including most IoT data) and their replies are not going to be saturating, in part because they don't have much to say, and in part because even if they do have more to say, they aren't going to ramp up to high packet rates before they run out of data to send.
It's only bulk transfers of data that are possibly going to be saturating, and they are only going to saturate their allowed share of the slowest link in the path. On all other links they are going to be non-saturating.
As links get faster, things that would have been saturating years ago fail to saturate the new, faster links.
So what would the Internet look like if it didn't have the vast majority of flows being non-saturating?
David Lang
David Lang
2018-11-01 18:15:25 UTC
Reply
Permalink
re-sending since it bounced the first time
Post by Michael Welzl
Post by Dave Taht
I don't feel like commenting much on ietf matters these days
but, jeeze,
(snip)
Post by Dave Taht
Did I rant already that the vast majority of flows are non-saturating?
That's a bug, not a feature - and you seem to treat it as an unchangeable fact.
Why would you think that saturating flows should be common? A very large
percentage of Internet traffic is streaming audio/video and that should never
saturate a link, it should be pacing the data to the rate of the content.
DNS queries are not going to be saturating.
queries to check cache validity are not going to be saturating.
microservices calls (including most IoT data) and their replies are not going
to be saturating, in part because they don't have much to say, and in part
because even if they do have more to say, they aren't going to ramp up to
high packet rates before they run out of data to send.
It's only bulk transfers of data that are possibly going to be saturating,
and they are only going to saturate their allowed share of the slowest link
in the path. On all other links they are going to be non-saturating.
As links get faster, things that would have been saturating years ago fail to
saturate the new, faster links.
So what would the Internet look like if it didn't have the vast majority of
flows being non-saturating?
David Lang
Loading...