Discussion:
lwn.net's tcp small queues vs wifi aggregation solved
(too old to reply)
Dave Taht
2018-06-21 04:58:59 UTC
Permalink
Raw Message
Nice war story. I'm glad this last problem with the fq_codel wifi code
is solved, and the article points to a few usb wifi dongles that work
better now.

https://lwn.net/SubscriberLink/757643/b25587e3593e9f9e/



--

Dave Täht
CEO, TekLibre, LLC
http://www.teklibre.com
Tel: 1-669-226-2619
Toke Høiland-Jørgensen
2018-06-21 09:22:46 UTC
Permalink
Raw Message
Dave Taht <***@gmail.com> writes:

> Nice war story. I'm glad this last problem with the fq_codel wifi code
> is solved

This wasn't specific to the fq_codel wifi code, but hit all WiFi devices
that were running TCP on the local stack. Which would be mostly laptops,
I guess...

-Toke
Eric Dumazet
2018-06-21 12:55:45 UTC
Permalink
Raw Message
On 06/21/2018 02:22 AM, Toke Høiland-Jørgensen wrote:
> Dave Taht <***@gmail.com> writes:
>
>> Nice war story. I'm glad this last problem with the fq_codel wifi code
>> is solved
>
> This wasn't specific to the fq_codel wifi code, but hit all WiFi devices
> that were running TCP on the local stack. Which would be mostly laptops,
> I guess...

Yes.

Also switching TCP stack to always GSO has been a major gain for wifi in my tests.

(TSQ budget is based on sk_wmem_alloc, tracking truesize of skbs, and not having
GSO is considerably inflating the truesize/payload ratio)

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=0a6b2a1dc2a2105f178255fe495eb914b09cb37a
tcp: switch to GSO being always on

I expect SACK compression to also give a nice boost to wifi.

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=5d9f4262b7ea41ca9981cc790e37cca6e37c789e
tcp: add SACK compression

Lastly I am working on adding ACK compression in TCP stack itself.
Dave Taht
2018-06-21 15:18:07 UTC
Permalink
Raw Message
On Thu, Jun 21, 2018 at 5:55 AM, Eric Dumazet <***@gmail.com> wrote:
>
>
> On 06/21/2018 02:22 AM, Toke Høiland-Jørgensen wrote:
>> Dave Taht <***@gmail.com> writes:
>>
>>> Nice war story. I'm glad this last problem with the fq_codel wifi code
>>> is solved
>>
>> This wasn't specific to the fq_codel wifi code, but hit all WiFi devices
>> that were running TCP on the local stack. Which would be mostly laptops,
>> I guess...
>
> Yes.
>
> Also switching TCP stack to always GSO has been a major gain for wifi in my tests.
>
> (TSQ budget is based on sk_wmem_alloc, tracking truesize of skbs, and not having
> GSO is considerably inflating the truesize/payload ratio)
>
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=0a6b2a1dc2a2105f178255fe495eb914b09cb37a
> tcp: switch to GSO being always on
>
> I expect SACK compression to also give a nice boost to wifi.
>
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=5d9f4262b7ea41ca9981cc790e37cca6e37c789e
> tcp: add SACK compression
>
> Lastly I am working on adding ACK compression in TCP stack itself.

One thing I've seen repeatedly on mac80211 aircaps is a tendency for
clients to use up two TXOPs rather than one.

scenario:

1) A tcp burst arrives at the client
2) A single ack migrates down the client stack into the driver, into
the device, which then attempts to compete for airtime on that TXOP
for that single ack, sometimes waiting 10s of msec to get that op
3) a bunch more acks arrive "slightly late"[1], and then get queued
for the next TXOP, waiting, again sometimes 10s of msec

(similar scenario in a client making a quick string of web related requests)

This is a case where inserting a teeny bit more latency to fill up the
queue (ugh!), or a driver having some way to ask the probability of
seeing more data in the
next 10us, or... something like that, could help.

...

[1] if you need coffee through your nose this morning, regarding usage
of the phrase "slightly late", read
http://www.rawbw.com/~svw/superman.html

--

Dave Täht
CEO, TekLibre, LLC
http://www.teklibre.com
Tel: 1-669-226-2619
Caleb Cushing
2018-06-21 15:31:18 UTC
Permalink
Raw Message
actually... all of my devices, including my desktop connect through wifi
these days... and only one of them isn't running some variant of linux.

On Thu, Jun 21, 2018 at 10:18 AM Dave Taht <***@gmail.com> wrote:

> On Thu, Jun 21, 2018 at 5:55 AM, Eric Dumazet <***@gmail.com>
> wrote:
> >
> >
> > On 06/21/2018 02:22 AM, Toke HÞiland-JÞrgensen wrote:
> >> Dave Taht <***@gmail.com> writes:
> >>
> >>> Nice war story. I'm glad this last problem with the fq_codel wifi code
> >>> is solved
> >>
> >> This wasn't specific to the fq_codel wifi code, but hit all WiFi devices
> >> that were running TCP on the local stack. Which would be mostly laptops,
> >> I guess...
> >
> > Yes.
> >
> > Also switching TCP stack to always GSO has been a major gain for wifi in
> my tests.
> >
> > (TSQ budget is based on sk_wmem_alloc, tracking truesize of skbs, and
> not having
> > GSO is considerably inflating the truesize/payload ratio)
> >
> >
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=0a6b2a1dc2a2105f178255fe495eb914b09cb37a
> > tcp: switch to GSO being always on
> >
> > I expect SACK compression to also give a nice boost to wifi.
> >
> >
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=5d9f4262b7ea41ca9981cc790e37cca6e37c789e
> > tcp: add SACK compression
> >
> > Lastly I am working on adding ACK compression in TCP stack itself.
>
> One thing I've seen repeatedly on mac80211 aircaps is a tendency for
> clients to use up two TXOPs rather than one.
>
> scenario:
>
> 1) A tcp burst arrives at the client
> 2) A single ack migrates down the client stack into the driver, into
> the device, which then attempts to compete for airtime on that TXOP
> for that single ack, sometimes waiting 10s of msec to get that op
> 3) a bunch more acks arrive "slightly late"[1], and then get queued
> for the next TXOP, waiting, again sometimes 10s of msec
>
> (similar scenario in a client making a quick string of web related
> requests)
>
> This is a case where inserting a teeny bit more latency to fill up the
> queue (ugh!), or a driver having some way to ask the probability of
> seeing more data in the
> next 10us, or... something like that, could help.
>
> ...
>
> [1] if you need coffee through your nose this morning, regarding usage
> of the phrase "slightly late", read
> http://www.rawbw.com/~svw/superman.html
>
> --
>
> Dave TÀht
> CEO, TekLibre, LLC
> http://www.teklibre.com
> Tel: 1-669-226-2619 <(669)%20226-2619>
> _______________________________________________
> Bloat mailing list
> ***@lists.bufferbloat.net
> https://lists.bufferbloat.net/listinfo/bloat
>
--
Caleb Cushing

http://xenoterracide.com
Stephen Hemminger
2018-06-21 15:46:11 UTC
Permalink
Raw Message
On Thu, 21 Jun 2018 10:31:18 -0500
Caleb Cushing <***@gmail.com> wrote:

> actually... all of my devices, including my desktop connect through wifi
> these days... and only one of them isn't running some variant of linux.
>

Sigh. My experience with wifi is that it is not stable enough for that.
Both AP's I have tried Linksys ACM3200 or Netgear WNDR3800 I still see random drop outs.
Not sure if these are device resets (ie workarounds) or other issues.

These happen independent of firmware (vendor, OpenWRT, or LEDE).
So my suspicion is the that Wifi hardware is shite and that firmware is trying
to workaround and mask the problem.
Caleb Cushing
2018-06-21 17:41:10 UTC
Permalink
Raw Message
I'm not disagreeing, just saying that wifi is much more prevalent now than
just laptops... literally I only have a cord for emergency use

On Thu, Jun 21, 2018 at 10:46 AM Stephen Hemminger <
***@networkplumber.org> wrote:

> On Thu, 21 Jun 2018 10:31:18 -0500
> Caleb Cushing <***@gmail.com> wrote:
>
> > actually... all of my devices, including my desktop connect through wifi
> > these days... and only one of them isn't running some variant of linux.
> >
>
> Sigh. My experience with wifi is that it is not stable enough for that.
> Both AP's I have tried Linksys ACM3200 or Netgear WNDR3800 I still see
> random drop outs.
> Not sure if these are device resets (ie workarounds) or other issues.
>
> These happen independent of firmware (vendor, OpenWRT, or LEDE).
> So my suspicion is the that Wifi hardware is shite and that firmware is
> trying
> to workaround and mask the problem.
>
--
Caleb Cushing

http://xenoterracide.com
Dave Taht
2018-06-21 15:50:30 UTC
Permalink
Raw Message
On Thu, Jun 21, 2018 at 8:31 AM, Caleb Cushing <***@gmail.com> wrote:
> actually... all of my devices, including my desktop connect through wifi
> these days... and only one of them isn't running some variant of linux.

Yes the tendency of manufacturers to hook things up to the more
convenient, but overbuffered and less opaque USB bus has become an
increasingly large problem
(canonical example - raspberry pi). In the case of LTE, especially,
everything is a USB dongle, and the CDC_ETH driver and device spec
actually mandates at least 32k of
on-chip buffering on the other side of the bus.

We had tried at one point (5 years ago) to find ways to apply
something BQL-like to this but failed.

I am currently getting miserable performance out of the one LTE dongle
I have (16K/sec up) but have not gone and fiddled with it with more
modern kernels. I ended up
just tethering via an android phone, which cracks 1mbit up.

The quality of the wifi drivers for USB is almost uniformly miserable,
and out of tree.

>
> On Thu, Jun 21, 2018 at 10:18 AM Dave Taht <***@gmail.com> wrote:
>>
>> On Thu, Jun 21, 2018 at 5:55 AM, Eric Dumazet <***@gmail.com>
>> wrote:
>> >
>> >
>> > On 06/21/2018 02:22 AM, Toke Høiland-Jørgensen wrote:
>> >> Dave Taht <***@gmail.com> writes:
>> >>
>> >>> Nice war story. I'm glad this last problem with the fq_codel wifi code
>> >>> is solved
>> >>
>> >> This wasn't specific to the fq_codel wifi code, but hit all WiFi
>> >> devices
>> >> that were running TCP on the local stack. Which would be mostly
>> >> laptops,
>> >> I guess...
>> >
>> > Yes.
>> >
>> > Also switching TCP stack to always GSO has been a major gain for wifi in
>> > my tests.
>> >
>> > (TSQ budget is based on sk_wmem_alloc, tracking truesize of skbs, and
>> > not having
>> > GSO is considerably inflating the truesize/payload ratio)
>> >
>> >
>> > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=0a6b2a1dc2a2105f178255fe495eb914b09cb37a
>> > tcp: switch to GSO being always on
>> >
>> > I expect SACK compression to also give a nice boost to wifi.
>> >
>> >
>> > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=5d9f4262b7ea41ca9981cc790e37cca6e37c789e
>> > tcp: add SACK compression
>> >
>> > Lastly I am working on adding ACK compression in TCP stack itself.
>>
>> One thing I've seen repeatedly on mac80211 aircaps is a tendency for
>> clients to use up two TXOPs rather than one.
>>
>> scenario:
>>
>> 1) A tcp burst arrives at the client
>> 2) A single ack migrates down the client stack into the driver, into
>> the device, which then attempts to compete for airtime on that TXOP
>> for that single ack, sometimes waiting 10s of msec to get that op
>> 3) a bunch more acks arrive "slightly late"[1], and then get queued
>> for the next TXOP, waiting, again sometimes 10s of msec
>>
>> (similar scenario in a client making a quick string of web related
>> requests)
>>
>> This is a case where inserting a teeny bit more latency to fill up the
>> queue (ugh!), or a driver having some way to ask the probability of
>> seeing more data in the
>> next 10us, or... something like that, could help.
>>
>> ...
>>
>> [1] if you need coffee through your nose this morning, regarding usage
>> of the phrase "slightly late", read
>> http://www.rawbw.com/~svw/superman.html
>>
>> --
>>
>> Dave Täht
>> CEO, TekLibre, LLC
>> http://www.teklibre.com
>> Tel: 1-669-226-2619
>> _______________________________________________
>> Bloat mailing list
>> ***@lists.bufferbloat.net
>> https://lists.bufferbloat.net/listinfo/bloat
>
> --
> Caleb Cushing
>
> http://xenoterracide.com



--

Dave Täht
CEO, TekLibre, LLC
http://www.teklibre.com
Tel: 1-669-226-2619
David Collier-Brown
2018-06-21 16:29:18 UTC
Permalink
Raw Message
On 21/06/18 11:18 AM, Dave Taht wrote
> This is a case where inserting a teeny bit more latency to fill up the
> queue (ugh!), or a driver having some way to ask the probability of
> seeing more data in the
> next 10us, or... something like that, could help.

Hmmn, that sounds like a pattern seen in physical switching systems:
someone with knowledge that another car is coming (especially if it's 
unexpected) waves a flag at the dispatcher to warn them to leave space
and avoid a nasty ka-thump and the extra strain on the couplers (;-))

--dave


--
David Collier-Brown, | Always do right. This will gratify
System Programmer and Author | some people and astonish the rest
***@spamcop.net | -- Mark Twain
Jonathan Morton
2018-06-21 16:54:08 UTC
Permalink
Raw Message
>> This is a case where inserting a teeny bit more latency to fill up the
>> queue (ugh!), or a driver having some way to ask the probability of
>> seeing more data in the
>> next 10us, or... something like that, could help.
>
> Hmmn, that sounds like a pattern seen in physical switching systems: someone with knowledge that another car is coming (especially if it's unexpected) waves a flag at the dispatcher to warn them to leave space and avoid a nasty ka-thump and the extra strain on the couplers (;-))

A more relevant railway analogy would be that a passenger train keeps its doors open while waiting for the departure signal to clear, permitting more passengers to board. At large stations the crew will press a TRTS (Train Ready To Start) button on the platform about half a minute before departure time, to prompt setting of the departure route in time, but a conflicting movement may delay the signal actually clearing.

- Jonathan Morton
Kathleen Nichols
2018-06-21 16:43:28 UTC
Permalink
Raw Message
On 6/21/18 8:18 AM, Dave Taht wrote:

> This is a case where inserting a teeny bit more latency to fill up the
> queue (ugh!), or a driver having some way to ask the probability of
> seeing more data in the
> next 10us, or... something like that, could help.
>

Well, if the driver sees the arriving packets, it could infer that an
ack will be produced shortly and will need a sending opportunity.

Kathie

(we tried this mechanism out for cable data head ends at Com21 and it
went into a patent that probably belongs to Arris now. But that was for
cable. It is a fact universally acknowledged that a packet of data must
be in want of an acknowledgement.)
Dave Taht
2018-06-21 19:17:21 UTC
Permalink
Raw Message
On Thu, Jun 21, 2018 at 9:43 AM, Kathleen Nichols <***@pollere.com> wrote:
> On 6/21/18 8:18 AM, Dave Taht wrote:
>
>> This is a case where inserting a teeny bit more latency to fill up the
>> queue (ugh!), or a driver having some way to ask the probability of
>> seeing more data in the
>> next 10us, or... something like that, could help.
>>
>
> Well, if the driver sees the arriving packets, it could infer that an
> ack will be produced shortly and will need a sending opportunity.

Certainly in the case of wifi and lte and other simplex technologies
this seems feasible...

'cept that we're all busy finding ways to do ack compression this
month and thus the
two big tcp packets = 1 ack rule is going away. Still, an estimate,
with a short timeout
might help.

Another thing I've longed for (sometimes) is whether or not an
application like a web
browser signalling the OS that it has a batch of network packets
coming would help...

web browser:
setsockopt(batch_everything)
parse the web page, generate all your dns, tcp requests, etc, etc
setsockopt(release_batch)

> Kathie
>
> (we tried this mechanism out for cable data head ends at Com21 and it
> went into a patent that probably belongs to Arris now. But that was for
> cable. It is a fact universally acknowledged that a packet of data must
> be in want of an acknowledgement.)

voip doesn't behave this way, but for recognisable protocols like tcp
and perhaps quic...

> _______________________________________________
> Bloat mailing list
> ***@lists.bufferbloat.net
> https://lists.bufferbloat.net/listinfo/bloat



--

Dave Täht
CEO, TekLibre, LLC
http://www.teklibre.com
Tel: 1-669-226-2619
Sebastian Moeller
2018-06-21 19:41:26 UTC
Permalink
Raw Message
Hi All,

> On Jun 21, 2018, at 21:17, Dave Taht <***@gmail.com> wrote:
>
> On Thu, Jun 21, 2018 at 9:43 AM, Kathleen Nichols <***@pollere.com> wrote:
>> On 6/21/18 8:18 AM, Dave Taht wrote:
>>
>>> This is a case where inserting a teeny bit more latency to fill up the
>>> queue (ugh!), or a driver having some way to ask the probability of
>>> seeing more data in the
>>> next 10us, or... something like that, could help.
>>>
>>
>> Well, if the driver sees the arriving packets, it could infer that an
>> ack will be produced shortly and will need a sending opportunity.
>
> Certainly in the case of wifi and lte and other simplex technologies
> this seems feasible...
>
> 'cept that we're all busy finding ways to do ack compression this
> month and thus the
> two big tcp packets = 1 ack rule is going away. Still, an estimate,
> with a short timeout
> might help.

That short timeout seems essential, just because a link is wireless, does not mean the ACKs for passing TCP packets will appear shortly, who knows what routing happens after the wireless link (think city-wide mesh network). In a way such a solution should first figure out whether waiting has any chance of being useful, by looking at te typical delay between Data packets and the matching ACKs.

>
> Another thing I've longed for (sometimes) is whether or not an
> application like a web
> browser signalling the OS that it has a batch of network packets
> coming would help...

To make up for the fact that wireless uses unfortunately uses a very high per packet overhead it just tries to "hide" by amortizing it over more than one data packet. How about trying to find a better, less wasteful MAC instead ;) (and now we have two problems...) Now really from a latency perspective it clearly is better to ovoid overhead instead of use "batching" to better amortize it since batching increases latency (I stipulate that there are condition in which clever batching will not increase the noticeable latency if it can hide inside another latency increasing process).

>
> web browser:
> setsockopt(batch_everything)
> parse the web page, generate all your dns, tcp requests, etc, etc
> setsockopt(release_batch)
>
>> Kathie
>>
>> (we tried this mechanism out for cable data head ends at Com21 and it
>> went into a patent that probably belongs to Arris now. But that was for
>> cable. It is a fact universally acknowledged that a packet of data must
>> be in want of an acknowledgement.)
>
> voip doesn't behave this way, but for recognisable protocols like tcp
> and perhaps quic...

I note that for voip, waiting does not make sense as all packets carry information and keeping jitter low will noticeably increase a calls perceived quality (if just by allowing the application yo use a small de-jitter buffer and hence less latency). There is a reason why wifi's voice access class, oith has the highest probability to get the next tx-slot and also is not allowed to send aggregates (whether that is fully sane is another question, answering which I do not feel competent).
I also think that on a docsis system it is probably a decent heuristic to assume that the endpoints will be a few milliseconds away at most (and only due to the coarse docsis grant-request clock).

Best Regards
Sebastian


>
>> _______________________________________________
>> Bloat mailing list
>> ***@lists.bufferbloat.net
>> https://lists.bufferbloat.net/listinfo/bloat
>
>
>
> --
>
> Dave Täht
> CEO, TekLibre, LLC
> http://www.teklibre.com
> Tel: 1-669-226-2619
> _______________________________________________
> Bloat mailing list
> ***@lists.bufferbloat.net
> https://lists.bufferbloat.net/listinfo/bloat
Toke Høiland-Jørgensen
2018-06-21 19:51:55 UTC
Permalink
Raw Message
Sebastian Moeller <***@gmx.de> writes:

> To make up for the fact that wireless uses unfortunately uses a
> very high per packet overhead it just tries to "hide" by
> amortizing it over more than one data packet. How about trying
> to find a better, less wasteful MAC instead ;) (and now we have
> two problems...) Now really from a latency perspective it
> clearly is better to ovoid overhead instead of use "batching" to
> better amortize it since batching increases latency (I stipulate
> that there are condition in which clever batching will not
> increase the noticeable latency if it can hide inside another
> latency increasing process).

Seems that 802.11ax will have some interesting features to this end.
Specifically, the spectrum can be split, allowing smaller chunks of it
to be used for reverse path transmissions (full-duplex at last?).

https://en.wikipedia.org/wiki/802.11ax#Technical_improvements

Also, 1024-QAM on 160Mhz channels; omg...

-Toke
Dave Taht
2018-06-21 19:54:21 UTC
Permalink
Raw Message
On Thu, Jun 21, 2018 at 12:41 PM, Sebastian Moeller <***@gmx.de> wrote:
> Hi All,
>
>> On Jun 21, 2018, at 21:17, Dave Taht <***@gmail.com> wrote:
>>
>> On Thu, Jun 21, 2018 at 9:43 AM, Kathleen Nichols <***@pollere.com> wrote:
>>> On 6/21/18 8:18 AM, Dave Taht wrote:
>>>
>>>> This is a case where inserting a teeny bit more latency to fill up the
>>>> queue (ugh!), or a driver having some way to ask the probability of
>>>> seeing more data in the
>>>> next 10us, or... something like that, could help.
>>>>
>>>
>>> Well, if the driver sees the arriving packets, it could infer that an
>>> ack will be produced shortly and will need a sending opportunity.
>>
>> Certainly in the case of wifi and lte and other simplex technologies
>> this seems feasible...
>>
>> 'cept that we're all busy finding ways to do ack compression this
>> month and thus the
>> two big tcp packets = 1 ack rule is going away. Still, an estimate,
>> with a short timeout
>> might help.
>
> That short timeout seems essential, just because a link is wireless, does not mean the ACKs for passing TCP packets will appear shortly, who knows what routing happens after the wireless link (think city-wide mesh network). In a way such a solution should first figure out whether waiting has any chance of being useful, by looking at te typical delay between Data packets and the matching ACKs.

We are in this discussion, having a few issues with multiple contexts.
Mine (and eric's) is in improving wifi clients (laptops, handhelds)
behavior, where the tcp stack is local.

packet pairing estimates on routers... well, if you get an aggregate
"in", you should be able to get an aggregate "out" when it traverses
the same driver. routerwise, ack compression "done right" will help a
bit... it's the "done right" part that's the sticking point.

>
>>
>> Another thing I've longed for (sometimes) is whether or not an
>> application like a web
>> browser signalling the OS that it has a batch of network packets
>> coming would help...
>
> To make up for the fact that wireless uses unfortunately uses a very high per packet overhead it just tries to "hide" by amortizing it over more than one data packet. How about trying to find a better, less wasteful MAC instead ;) (and now we have two problems...)

On my bad days I'd really like to have a do-over on wifi. The only
hope I've had has been for LiFi or a ressurection of

I haven't poked into what's going on in 5G lately (the mac is
"better", but towers being distant does not help), nor have I been
tracking 802.11ax for a few years. Lower latency was all over the
802.11ax standard when I last paid attention.

Has 802.11ad gone anywhere?


>Now really from a latency perspective it clearly is better to ovoid overhead instead of use "batching" to better amortize it since batching increases latency (I stipulate that there are condition in which clever batching will not increase the noticeable latency if it can hide inside another latency increasing process).
>
>>
>> web browser:
>> setsockopt(batch_everything)
>> parse the web page, generate all your dns, tcp requests, etc, etc
>> setsockopt(release_batch)
>>
>>> Kathie
>>>
>>> (we tried this mechanism out for cable data head ends at Com21 and it
>>> went into a patent that probably belongs to Arris now. But that was for
>>> cable. It is a fact universally acknowledged that a packet of data must
>>> be in want of an acknowledgement.)
>>
>> voip doesn't behave this way, but for recognisable protocols like tcp
>> and perhaps quic...
>
> I note that for voip, waiting does not make sense as all packets carry information and keeping jitter low will noticeably increase a calls perceived quality (if just by allowing the application yo use a small de-jitter buffer and hence less latency). There is a reason why wifi's voice access class, oith has the highest probability to get the next tx-slot and also is not allowed to send aggregates (whether that is fully sane is another question, answering which I do not feel competent).
> I also think that on a docsis system it is probably a decent heuristic to assume that the endpoints will be a few milliseconds away at most (and only due to the coarse docsis grant-request clock).
>
> Best Regards
> Sebastian
>
>
>>
>>> _______________________________________________
>>> Bloat mailing list
>>> ***@lists.bufferbloat.net
>>> https://lists.bufferbloat.net/listinfo/bloat
>>
>>
>>
>> --
>>
>> Dave Täht
>> CEO, TekLibre, LLC
>> http://www.teklibre.com
>> Tel: 1-669-226-2619
>> _______________________________________________
>> Bloat mailing list
>> ***@lists.bufferbloat.net
>> https://lists.bufferbloat.net/listinfo/bloat
>



--

Dave Täht
CEO, TekLibre, LLC
http://www.teklibre.com
Tel: 1-669-226-2619
Sebastian Moeller
2018-06-21 20:11:43 UTC
Permalink
Raw Message
Hi Dave,

> On Jun 21, 2018, at 21:54, Dave Taht <***@gmail.com> wrote:
>
> On Thu, Jun 21, 2018 at 12:41 PM, Sebastian Moeller <***@gmx.de> wrote:
>> Hi All,
>>
>>> On Jun 21, 2018, at 21:17, Dave Taht <***@gmail.com> wrote:
>>>
>>> On Thu, Jun 21, 2018 at 9:43 AM, Kathleen Nichols <***@pollere.com> wrote:
>>>> On 6/21/18 8:18 AM, Dave Taht wrote:
>>>>
>>>>> This is a case where inserting a teeny bit more latency to fill up the
>>>>> queue (ugh!), or a driver having some way to ask the probability of
>>>>> seeing more data in the
>>>>> next 10us, or... something like that, could help.
>>>>>
>>>>
>>>> Well, if the driver sees the arriving packets, it could infer that an
>>>> ack will be produced shortly and will need a sending opportunity.
>>>
>>> Certainly in the case of wifi and lte and other simplex technologies
>>> this seems feasible...
>>>
>>> 'cept that we're all busy finding ways to do ack compression this
>>> month and thus the
>>> two big tcp packets = 1 ack rule is going away. Still, an estimate,
>>> with a short timeout
>>> might help.
>>
>> That short timeout seems essential, just because a link is wireless, does not mean the ACKs for passing TCP packets will appear shortly, who knows what routing happens after the wireless link (think city-wide mesh network). In a way such a solution should first figure out whether waiting has any chance of being useful, by looking at te typical delay between Data packets and the matching ACKs.
>
> We are in this discussion, having a few issues with multiple contexts.
> Mine (and eric's) is in improving wifi clients (laptops, handhelds)
> behavior, where the tcp stack is local.

Ah, sorry, I got this wrong and was looking at this from the APs perspective; sorry for the noise... and thanks for the patience

>
> packet pairing estimates on routers... well, if you get an aggregate
> "in", you should be able to get an aggregate "out" when it traverses
> the same driver. routerwise, ack compression "done right" will help a
> bit... it's the "done right" part that's the sticking point.

How will ACK compression help? If done aggressively it will sparse out the ACK stream potentially making aggregating ACK infeasible, no? On the other hand if sparse enough maybe not aggregating is not too painful? I guess I am just slow today...

Best Regards
Sebastian

>
>>
>>>
>>> Another thing I've longed for (sometimes) is whether or not an
>>> application like a web
>>> browser signalling the OS that it has a batch of network packets
>>> coming would help...
>>
>> To make up for the fact that wireless uses unfortunately uses a very high per packet overhead it just tries to "hide" by amortizing it over more than one data packet. How about trying to find a better, less wasteful MAC instead ;) (and now we have two problems...)
>
> On my bad days I'd really like to have a do-over on wifi. The only
> hope I've had has been for LiFi or a ressurection of
>
> I haven't poked into what's going on in 5G lately (the mac is
> "better", but towers being distant does not help), nor have I been
> tracking 802.11ax for a few years. Lower latency was all over the
> 802.11ax standard when I last paid attention.
>
> Has 802.11ad gone anywhere?
>
>
>> Now really from a latency perspective it clearly is better to ovoid overhead instead of use "batching" to better amortize it since batching increases latency (I stipulate that there are condition in which clever batching will not increase the noticeable latency if it can hide inside another latency increasing process).
>>
>>>
>>> web browser:
>>> setsockopt(batch_everything)
>>> parse the web page, generate all your dns, tcp requests, etc, etc
>>> setsockopt(release_batch)
>>>
>>>> Kathie
>>>>
>>>> (we tried this mechanism out for cable data head ends at Com21 and it
>>>> went into a patent that probably belongs to Arris now. But that was for
>>>> cable. It is a fact universally acknowledged that a packet of data must
>>>> be in want of an acknowledgement.)
>>>
>>> voip doesn't behave this way, but for recognisable protocols like tcp
>>> and perhaps quic...
>>
>> I note that for voip, waiting does not make sense as all packets carry information and keeping jitter low will noticeably increase a calls perceived quality (if just by allowing the application yo use a small de-jitter buffer and hence less latency). There is a reason why wifi's voice access class, oith has the highest probability to get the next tx-slot and also is not allowed to send aggregates (whether that is fully sane is another question, answering which I do not feel competent).
>> I also think that on a docsis system it is probably a decent heuristic to assume that the endpoints will be a few milliseconds away at most (and only due to the coarse docsis grant-request clock).
>>
>> Best Regards
>> Sebastian
>>
>>
>>>
>>>> _______________________________________________
>>>> Bloat mailing list
>>>> ***@lists.bufferbloat.net
>>>> https://lists.bufferbloat.net/listinfo/bloat
>>>
>>>
>>>
>>> --
>>>
>>> Dave Täht
>>> CEO, TekLibre, LLC
>>> http://www.teklibre.com
>>> Tel: 1-669-226-2619
>>> _______________________________________________
>>> Bloat mailing list
>>> ***@lists.bufferbloat.net
>>> https://lists.bufferbloat.net/listinfo/bloat
>>
>
>
>
> --
>
> Dave Täht
> CEO, TekLibre, LLC
> http://www.teklibre.com
> Tel: 1-669-226-2619
Kathleen Nichols
2018-06-22 14:01:37 UTC
Permalink
Raw Message
On 6/21/18 12:17 PM, Dave Taht wrote:
> On Thu, Jun 21, 2018 at 9:43 AM, Kathleen Nichols <***@pollere.com> wrote:
>> On 6/21/18 8:18 AM, Dave Taht wrote:
>>
>>> This is a case where inserting a teeny bit more latency to fill up the
>>> queue (ugh!), or a driver having some way to ask the probability of
>>> seeing more data in the
>>> next 10us, or... something like that, could help.
>>>
>>
>> Well, if the driver sees the arriving packets, it could infer that an
>> ack will be produced shortly and will need a sending opportunity.
>
> Certainly in the case of wifi and lte and other simplex technologies
> this seems feasible...
>
> 'cept that we're all busy finding ways to do ack compression this
> month and thus the
> two big tcp packets = 1 ack rule is going away. Still, an estimate,
> with a short timeout
> might help.

It would be a poor algorithm that assumed the answer was "1" or "2" or
"42". It would be necessary to analyze data to see if something adaptive
is possible and it may not be. Your original note was looking for a way
for finding out if the probability of seeing more data in the next 10us
was sufficiently large to delay "a teeny bit" so that would be the
problem statement.

>
> Another thing I've longed for (sometimes) is whether or not an
> application like a web
> browser signalling the OS that it has a batch of network packets
> coming would help...
>
> web browser:
> setsockopt(batch_everything)
> parse the web page, generate all your dns, tcp requests, etc, etc
> setsockopt(release_batch)
>
>> Kathie
>>
>> (we tried this mechanism out for cable data head ends at Com21 and it
>> went into a patent that probably belongs to Arris now. But that was for
>> cable. It is a fact universally acknowledged that a packet of data must
>> be in want of an acknowledgement.)
>
> voip doesn't behave this way, but for recognisable protocols like tcp
> and perhaps quic...
>
>> _______________________________________________
>> Bloat mailing list
>> ***@lists.bufferbloat.net
>> https://lists.bufferbloat.net/listinfo/bloat
>
>
>
Jonathan Morton
2018-06-22 14:12:30 UTC
Permalink
Raw Message
> On 22 Jun, 2018, at 5:01 pm, Kathleen Nichols <***@pollere.com> wrote:
>
> Your original note was looking for a way
> for finding out if the probability of seeing more data in the next 10us
> was sufficiently large to delay "a teeny bit" so that would be the
> problem statement.

I would instead frame the problem as "how can we get hardware to incorporate extra packets, which arrive between the request and grant phases of the MAC, into the same TXOP?" Then we no longer need to think probabilistically, or induce unnecessary delay in the case that no further packets arrive.

- Jonathan Morton
Michael Richardson
2018-06-22 14:49:46 UTC
Permalink
Raw Message
Jonathan Morton <***@gmail.com> wrote:
>> Your original note was looking for a way
>> for finding out if the probability of seeing more data in the next 10us
>> was sufficiently large to delay "a teeny bit" so that would be the
>> problem statement.

> I would instead frame the problem as "how can we get hardware to
> incorporate extra packets, which arrive between the request and grant
> phases of the MAC, into the same TXOP?" Then we no longer need to
> think probabilistically, or induce unnecessary delay in the case that
> no further packets arrive.

I've never looked at the ring/buffer/descriptor structure of the ath9k, but
with most ethernet devices, they would just continue reading descriptors
until it was empty. Is there some reason that something similar can not
occur?

Or is the problem at a higher level?
Or is that we don't want to enqueue packets so early, because it's a source
of bloat?

--
] Never tell me the odds! | ipv6 mesh networks [
] Michael Richardson, Sandelman Software Works | network architect [
] ***@sandelman.ca http://www.sandelman.ca/ | ruby on rails [
Jonathan Morton
2018-06-22 15:02:55 UTC
Permalink
Raw Message
> On 22 Jun, 2018, at 5:49 pm, Michael Richardson <***@sandelman.ca> wrote:
>
>> I would instead frame the problem as "how can we get hardware to
>> incorporate extra packets, which arrive between the request and grant
>> phases of the MAC, into the same TXOP?" Then we no longer need to
>> think probabilistically, or induce unnecessary delay in the case that
>> no further packets arrive.
>
> I've never looked at the ring/buffer/descriptor structure of the ath9k, but
> with most ethernet devices, they would just continue reading descriptors
> until it was empty. Is there some reason that something similar can not
> occur?
>
> Or is the problem at a higher level?
> Or is that we don't want to enqueue packets so early, because it's a source
> of bloat?

The question is of when the aggregate frame is constructed and "frozen", using only the packets in the queue at that instant. When the MAC grant occurs, transmission must begin immediately, so most hardware prepares the frame in advance of that moment - but how far in advance?

Behaviour suggests that it can be as soon as the MAC request is issued, in response to the *first* packet arriving in the queue - so a second TXOP is required for the *subsequent* packets arriving a microsecond later, even though there's technically still plenty of time to reform the aggregate then.

In principle it should be possible to delay frame construction until the moment the radio is switched on; there is a short period consumed by a data-indepedent preamble sequence. In the old days, HW designers would have bent over backwards to make that happen.

- Jonathan Morton
Michael Richardson
2018-06-22 21:55:05 UTC
Permalink
Raw Message
Jonathan Morton <***@gmail.com> wrote:
>>> I would instead frame the problem as "how can we get hardware to
>>> incorporate extra packets, which arrive between the request and grant
>>> phases of the MAC, into the same TXOP?" Then we no longer need to
>>> think probabilistically, or induce unnecessary delay in the case that
>>> no further packets arrive.
>>
>> I've never looked at the ring/buffer/descriptor structure of the ath9k, but
>> with most ethernet devices, they would just continue reading descriptors
>> until it was empty. Is there some reason that something similar can not
>> occur?
>>
>> Or is the problem at a higher level?
>> Or is that we don't want to enqueue packets so early, because it's a source
>> of bloat?

> The question is of when the aggregate frame is constructed and
> "frozen", using only the packets in the queue at that instant. When
> the MAC grant occurs, transmission must begin immediately, so most
> hardware prepares the frame in advance of that moment - but how far in
> advance?

Oh, I understand now. The aggregate frame has to be constructed, and it's
this frame that is actually in the xmit queue. I'm guessing that it's in the
hardware, because if it was in the driver, then we could perhaps do something?

> In principle it should be possible to delay frame construction until
> the moment the radio is switched on; there is a short period consumed
> by a data-indepedent preamble sequence. In the old days, HW designers
> would have bent over backwards to make that happen.

--
] Never tell me the odds! | ipv6 mesh networks [
] Michael Richardson, Sandelman Software Works | network architect [
] ***@sandelman.ca http://www.sandelman.ca/ | ruby on rails [
Toke Høiland-Jørgensen
2018-06-25 10:38:24 UTC
Permalink
Raw Message
Michael Richardson <***@sandelman.ca> writes:

> Jonathan Morton <***@gmail.com> wrote:
> >>> I would instead frame the problem as "how can we get hardware to
> >>> incorporate extra packets, which arrive between the request and grant
> >>> phases of the MAC, into the same TXOP?" Then we no longer need to
> >>> think probabilistically, or induce unnecessary delay in the case that
> >>> no further packets arrive.
> >>
> >> I've never looked at the ring/buffer/descriptor structure of the ath9k, but
> >> with most ethernet devices, they would just continue reading descriptors
> >> until it was empty. Is there some reason that something similar can not
> >> occur?
> >>
> >> Or is the problem at a higher level?
> >> Or is that we don't want to enqueue packets so early, because it's a source
> >> of bloat?
>
> > The question is of when the aggregate frame is constructed and
> > "frozen", using only the packets in the queue at that instant. When
> > the MAC grant occurs, transmission must begin immediately, so most
> > hardware prepares the frame in advance of that moment - but how far in
> > advance?
>
> Oh, I understand now. The aggregate frame has to be constructed, and it's
> this frame that is actually in the xmit queue. I'm guessing that it's in the
> hardware, because if it was in the driver, then we could perhaps do
> something?

No, it's in the driver for ath9k. So it would be possible to delay it
slightly to try to build a larger one. The timing constraints are too
tight to do it reactively when the request is granted, though; so
delaying would result in idleness if there are no other flows to queue
before then...

Even for devices that build aggregates in firmware or hardware (as all
AC chipsets do), it might be possible to throttle the queues at higher
levels to try to get better batching. It's just not obvious that there's
an algorithm that can do this in a way that will "do no harm" for other
types of traffic, for instance...

-Toke
Jim Gettys
2018-06-25 23:54:18 UTC
Permalink
Raw Message
On Mon, Jun 25, 2018 at 6:38 AM Toke HÞiland-JÞrgensen <***@toke.dk> wrote:

> Michael Richardson <***@sandelman.ca> writes:
>
> > Jonathan Morton <***@gmail.com> wrote:
> > >>> I would instead frame the problem as "how can we get hardware to
> > >>> incorporate extra packets, which arrive between the request and
> grant
> > >>> phases of the MAC, into the same TXOP?" Then we no longer need
> to
> > >>> think probabilistically, or induce unnecessary delay in the case
> that
> > >>> no further packets arrive.
> > >>
> > >> I've never looked at the ring/buffer/descriptor structure of the
> ath9k, but
> > >> with most ethernet devices, they would just continue reading
> descriptors
> > >> until it was empty. Is there some reason that something similar
> can not
> > >> occur?
> > >>
> > >> Or is the problem at a higher level?
> > >> Or is that we don't want to enqueue packets so early, because
> it's a source
> > >> of bloat?
> >
> > > The question is of when the aggregate frame is constructed and
> > > "frozen", using only the packets in the queue at that instant.
> When
> > > the MAC grant occurs, transmission must begin immediately, so most
> > > hardware prepares the frame in advance of that moment - but how
> far in
> > > advance?
> >
> > Oh, I understand now. The aggregate frame has to be constructed, and
> it's
> > this frame that is actually in the xmit queue. I'm guessing that it's
> in the
> > hardware, because if it was in the driver, then we could perhaps do
> > something?
>
> No, it's in the driver for ath9k. So it would be possible to delay it
> slightly to try to build a larger one. The timing constraints are too
> tight to do it reactively when the request is granted, though; so
> delaying would result in idleness if there are no other flows to queue
> before then...
>
> Even for devices that build aggregates in firmware or hardware (as all
> AC chipsets do), it might be possible to throttle the queues at higher
> levels to try to get better batching. It's just not obvious that there's
> an algorithm that can do this in a way that will "do no harm" for other
> types of traffic, for instance...
>
>
> ​
​
​Isn't this sort of delay a natural consequence of a busy channel?

What matters is not conserving txops *all the time*, but only when the
channel is busy and there aren't more txops available....

So when you are trying to transmit on a busy channel, that contention time
will naturally increase, since you won't
be able to get a transmit opportunity immediately. So you should queue up
more packets into an aggregate in that case.

We only care about conserving txops when they are scarce, not when they are
abundant.

This principle is why a window system as crazy as X11 is competitive: it
naturally becomes more efficient in the
face of load (more and more requests batch up and are handled at maximum
efficiency, so the system is at maximum
efficiency at full load.

Or am I missing something here?

Jim
Jonathan Morton
2018-06-26 00:07:30 UTC
Permalink
Raw Message
>> No, it's in the driver for ath9k. So it would be possible to delay it
>> slightly to try to build a larger one. The timing constraints are too
>> tight to do it reactively when the request is granted, though; so
>> delaying would result in idleness if there are no other flows to queue
>> before then...


There has to be some sort of viable compromise here. How about initiating the request immediately, then building the aggregate when the request completes transmission? That should give at least the few microseconds required for the immediately following acks to reach the queue, and be included in the same aggregate.

> ​​​Isn't this sort of delay a natural consequence of a busy channel?
>
> What matters is not conserving txops *all the time*, but only when the channel is busy and there aren't more txops available....
>
> So when you are trying to transmit on a busy channel, that contention time will naturally increase, since you won't be able to get a transmit opportunity immediately. So you should queue up more packets into an aggregate in that case.
>
> We only care about conserving txops when they are scarce, not when they are abundant.
>
> This principle is why a window system as crazy as X11 is competitive: it naturally becomes more efficient in the face of load (more and more requests batch up and are handled at maximum efficiency, so the system is at maximum efficiency at full load.
>
> Or am I missing something here?

The problem is that currently every data aggregate received (one TXOP each from the AP) results in two TXOPs just to acknowledge them, the first one containing only a single ack. This is clearly wasteful, given the airtime overhead per TXOP relative to the raw data rate of modern wifi. Relying solely on backpressure would require that the channel was sufficiently busy to prevent the second TXOP from occurring until the following data aggregate is received, and that just seems too delicate to me.

- Jonathan Morton
David Lang
2018-06-26 00:21:34 UTC
Permalink
Raw Message
On Tue, 26 Jun 2018, Jonathan Morton wrote:

>> We only care about conserving txops when they are scarce, not when they are abundant.
>>
>> This principle is why a window system as crazy as X11 is competitive: it naturally becomes more efficient in the face of load (more and more requests batch up and are handled at maximum efficiency, so the system is at maximum efficiency at full load.
>>
>> Or am I missing something here?
>
> The problem is that currently every data aggregate received (one TXOP each
> from the AP) results in two TXOPs just to acknowledge them, the first one
> containing only a single ack. This is clearly wasteful, given the airtime
> overhead per TXOP relative to the raw data rate of modern wifi. Relying
> solely on backpressure would require that the channel was sufficiently busy to
> prevent the second TXOP from occurring until the following data aggregate is
> received, and that just seems too delicate to me.

If there are no other stations competing for airtime, why does it matter that we
use two txops? [1]

If there are no other stations that you are competing with for airtime, go ahead
and use it. If there are other stations that you are competing with for airtime,
you are unlikely to get the txop immediately, so as long as you can keep
updating the rf packet to send until the txop actially happens, the later data
will get folded in.

There will be a few times when you do get the txop immediately, and so you do
end up 'wasting' a txop, but the vast majority of the time you will be able to
combine the packets.

Now, the trick is figureing out how long we can wait to finalize the rf packet

David Lang


[1] ignoring the hidden transmitter problem for the moment)
Simon Barber
2018-06-26 00:36:11 UTC
Permalink
Raw Message
Most hardware needs the packet finalized before it starts to contend for the medium (as far as I’m aware - let me know if you know differently). One issue is that if RTS/CTS is in use, then the packet duration needs to be known in advance (or at least mid point of the RTS transmission).

Simon


> On Jun 25, 2018, at 5:21 PM, David Lang <***@lang.hm> wrote:
>
> On Tue, 26 Jun 2018, Jonathan Morton wrote:
>
>>> We only care about conserving txops when they are scarce, not when they are abundant.
>>> This principle is why a window system as crazy as X11 is competitive: it naturally becomes more efficient in the face of load (more and more requests batch up and are handled at maximum efficiency, so the system is at maximum efficiency at full load.
>>> Or am I missing something here?
>>
>> The problem is that currently every data aggregate received (one TXOP each from the AP) results in two TXOPs just to acknowledge them, the first one containing only a single ack. This is clearly wasteful, given the airtime overhead per TXOP relative to the raw data rate of modern wifi. Relying solely on backpressure would require that the channel was sufficiently busy to prevent the second TXOP from occurring until the following data aggregate is received, and that just seems too delicate to me.
>
> If there are no other stations competing for airtime, why does it matter that we use two txops? [1]
>
> If there are no other stations that you are competing with for airtime, go ahead and use it. If there are other stations that you are competing with for airtime, you are unlikely to get the txop immediately, so as long as you can keep updating the rf packet to send until the txop actially happens, the later data will get folded in.
>
> There will be a few times when you do get the txop immediately, and so you do end up 'wasting' a txop, but the vast majority of the time you will be able to combine the packets.
>
> Now, the trick is figureing out how long we can wait to finalize the rf packet
>
> David Lang
>
>
> [1] ignoring the hidden transmitter problem for the moment)
> _______________________________________________
> Bloat mailing list
> ***@lists.bufferbloat.net
> https://lists.bufferbloat.net/listinfo/bloat
Jonathan Morton
2018-06-26 00:44:00 UTC
Permalink
Raw Message
> On 26 Jun, 2018, at 3:36 am, Simon Barber <***@superduper.net> wrote:
>
> Most hardware needs the packet finalized before it starts to contend for the medium (as far as I’m aware - let me know if you know differently). One issue is that if RTS/CTS is in use, then the packet duration needs to be known in advance (or at least mid point of the RTS transmission).

This is a valid argument. I think we could successfully argue for a delay of 1ms, if there isn't already enough data in the queue to fill an aggregate, after the oldest packet arrives until a request is issued.

> If there are no other stations competing for airtime, why does it matter that we use two txops?

One further argument would be power consumption. Radio transmitters eat batteries for lunch; the only consistently worse offender I can think of is a display backlight, assuming the software is efficient.

- Jonathan Morton
Jim Gettys
2018-06-26 00:52:08 UTC
Permalink
Raw Message
On Mon, Jun 25, 2018 at 8:44 PM Jonathan Morton <***@gmail.com>
wrote:

> > On 26 Jun, 2018, at 3:36 am, Simon Barber <***@superduper.net> wrote:
> >
> > Most hardware needs the packet finalized before it starts to contend for
> the medium (as far as I’m aware - let me know if you know differently). One
> issue is that if RTS/CTS is in use, then the packet duration needs to be
> known in advance (or at least mid point of the RTS transmission).
>
> This is a valid argument. I think we could successfully argue for a delay
> of 1ms, if there isn't already enough data in the queue to fill an
> aggregate, after the oldest packet arrives until a request is issued.
>
> > If there are no other stations competing for airtime, why does it matter
> that we use two txops?
>
> One further argument would be power consumption. Radio transmitters eat
> batteries for lunch; the only consistently worse offender I can think of is
> a display backlight, assuming the software is efficient.
> ​​
>
>
​No​t clear if this is true; we need current data.

In OLPC days, we measured the receive/transmit power consumption, and
transmit took essentially no more power than receive. The dominant power
consumption was due to signal processing the RF, not the transmitter. Just
listening sucked power....

Does someone understand what current 802.11 and actual chip sets consume
for power?

Jim


> - Jonathan Morton
>
> _______________________________________________
> Bloat mailing list
> ***@lists.bufferbloat.net
> https://lists.bufferbloat.net/listinfo/bloat
>
David Lang
2018-06-26 00:56:05 UTC
Permalink
Raw Message
On Tue, 26 Jun 2018, Jonathan Morton wrote:

>> On 26 Jun, 2018, at 3:36 am, Simon Barber <***@superduper.net> wrote:
>>
>> Most hardware needs the packet finalized before it starts to contend for the
>> medium (as far as I’m aware - let me know if you know differently). One issue
>> is that if RTS/CTS is in use, then the packet duration needs to be known in
>> advance (or at least mid point of the RTS transmission).
>
> This is a valid argument. I think we could successfully argue for a delay of
> 1ms, if there isn't already enough data in the queue to fill an aggregate,
> after the oldest packet arrives until a request is issued.

why does the length of the txop need to be known at the time that it's
requested?

I could see an argument that fairness algorithms need this info, but the per
txop overhead is _so_ much larger than the data transmission, that you would
have to add a huge amount of data to noticably affect the length of the
transmission.

remember, in wifi you don't ask a central point for permission to use X amount
of airtime, you wait for everyone to stop transmitting (and then a random time)
and then start sending. Nothing else in the area knows that you are going to
start transmitting, and it's only once they start decoding the start of the rf
packet you are sending that they can see how long it will be before you finish

>> If there are no other stations competing for airtime, why does it matter that we use two txops?
>
> One further argument would be power consumption. Radio transmitters eat
> batteries for lunch; the only consistently worse offender I can think of is a
> display backlight, assuming the software is efficient.

True, but this gets back to the question of how frequent this case is.

If you are in areas with congestion most of the time, so the common case is to
have to wait long enough for the data to be combined, then the difference in
power savings is going to be small.

'waiting just in case there is more to send' looks good on specific benchmarks,
but it adds latency all the time, even when it's not needed.

Now, using a travel analogy

I think how we operate today is as if we were a train at a station, when we
first are ready to move, the doors are closed and everyone sits inside waiting
for permission to move (think of how annoyed you have been sitting in a closed
aircraft at an airport waiting to move), and anyone outside has to wait for the
next train

But if instead we leave the doors open after we request permission, and only
close them when we know that we are going to be able to send very soon, late
arrivals can board.
Toke Høiland-Jørgensen
2018-06-26 11:16:54 UTC
Permalink
Raw Message
David Lang <***@lang.hm> writes:

> On Tue, 26 Jun 2018, Jonathan Morton wrote:
>
>>> On 26 Jun, 2018, at 3:36 am, Simon Barber <***@superduper.net> wrote:
>>>
>>> Most hardware needs the packet finalized before it starts to contend for the
>>> medium (as far as I’m aware - let me know if you know differently). One issue
>>> is that if RTS/CTS is in use, then the packet duration needs to be known in
>>> advance (or at least mid point of the RTS transmission).
>>
>> This is a valid argument. I think we could successfully argue for a delay of
>> 1ms, if there isn't already enough data in the queue to fill an aggregate,
>> after the oldest packet arrives until a request is issued.
>
> why does the length of the txop need to be known at the time that it's
> requested?

Because that's how the hardware is designed. There are really two
discussions here: (1) what could we do with a clean-slate(ish) design,
and (2) what can we retrofit into existing drivers such as the ath9k.

I think that the answer to (1) is probably 'quite a lot', but
unfortunately the answer to (2) is 'not that much'. We could probably do
a little bit better in ath9k, but for anything newer all bets are off,
as this functionality has moved into firmware.

Now, if there was a hardware vendor that was paying attention and could
do the right thing throughout the stack, that would be awesome of
course. But for Linux at least, sadly it seems that most hardware
vendors can barely figure out how to get *any* driver upstream... :/

Also, from a conceptual point of view, I really think ACK timing issues
are best solved at the TCP stack level. Which Eric is already working on
(SACK compression is already in 4.18, with normal ACK compression to
follow).

-Toke
Dave Taht
2018-06-26 01:27:55 UTC
Permalink
Raw Message
On Mon, Jun 25, 2018 at 5:44 PM, Jonathan Morton <***@gmail.com> wrote:
>> On 26 Jun, 2018, at 3:36 am, Simon Barber <***@superduper.net> wrote:
>>
>> Most hardware needs the packet finalized before it starts to contend for the medium (as far as I’m aware - let me know if you know differently). One issue is that if RTS/CTS is in use, then the packet duration needs to be known in advance (or at least mid point of the RTS transmission).
>
> This is a valid argument. I think we could successfully argue for a delay of 1ms, if there isn't already enough data in the queue to fill an aggregate, after the oldest packet arrives until a request is issued.

Whoa, nelly! In the context of the local tcp stack over wifi, I was
making an observation that I "frequently" saw a pattern of a single
ack txop followed by a bunch in a separate txop. and I suggested a
very short (10us) timeout before committing to the hw - not 1ms.

Aside from this anecdote we have not got real data or statistics. The
closest thing I have to a tool that can take apart wireless aircaps is
here: https://github.com/dtaht/airtime-pie-chart which can be hacked
to take more things apart than it currently does. Looking for this
pattern in more traffic would be revealing in multiple ways. Looking
for more patterns in bigger wifi networks would be good also.

I like erics suggestion of doing more ack compression higher up in the
tcp stack.

There are two other things I've suggested in the past we look at. 1)
The current fq_codel_for_wifi code has a philosophy of "one aggregate
in the hardware, one ready to go". A simpler modification to fit more
in would be to (wait the best case estimate for delivering the one in
the hardware - a bit), then form the one ready-to-go.

2) rate limiting mcast and smoothing mcast bursts over time, allowing
more unicast through. presently the mcast queue is infinite and very
bursty. 802.11 std actually suggests mcast be rate limited by htb,
where I'd be htb + fq + merging dup packets. I was routinely able to
blow up the c.h.i.p's wifi and the babel protocol by flooding it with
mcast, as the local mcast queue could easily grow 16+ seconds long.

um, I'm giving a preso tomorrow and will run behind this thread. It's
nice to see the renewed enthusiasm here, keep it up.

>> If there are no other stations competing for airtime, why does it matter that we use two txops?
>
> One further argument would be power consumption. Radio transmitters eat batteries for lunch; the only consistently worse offender I can think of is a display backlight, assuming the software is efficient.

> - Jonathan Morton
>
> _______________________________________________
> Bloat mailing list
> ***@lists.bufferbloat.net
> https://lists.bufferbloat.net/listinfo/bloat



--

Dave Täht
CEO, TekLibre, LLC
http://www.teklibre.com
Tel: 1-669-226-2619
Simon Barber
2018-06-26 03:30:55 UTC
Permalink
Raw Message
Current versions of Wireshark have an experimental feature I added to
expose airtime usage per packet and show 802.11 pcaps on a timeline.

Enable it under Preferences->Protocol->802.11 Radio

Simon

On June 25, 2018 6:27:59 PM Dave Taht <***@gmail.com> wrote:

> On Mon, Jun 25, 2018 at 5:44 PM, Jonathan Morton <***@gmail.com> wrote:
>>> On 26 Jun, 2018, at 3:36 am, Simon Barber <***@superduper.net> wrote:
>>>
>>> Most hardware needs the packet finalized before it starts to contend for
>>> the medium (as far as I’m aware - let me know if you know differently). One
>>> issue is that if RTS/CTS is in use, then the packet duration needs to be
>>> known in advance (or at least mid point of the RTS transmission).
>>
>> This is a valid argument. I think we could successfully argue for a delay
>> of 1ms, if there isn't already enough data in the queue to fill an
>> aggregate, after the oldest packet arrives until a request is issued.
>
> Whoa, nelly! In the context of the local tcp stack over wifi, I was
> making an observation that I "frequently" saw a pattern of a single
> ack txop followed by a bunch in a separate txop. and I suggested a
> very short (10us) timeout before committing to the hw - not 1ms.
>
> Aside from this anecdote we have not got real data or statistics. The
> closest thing I have to a tool that can take apart wireless aircaps is
> here: https://github.com/dtaht/airtime-pie-chart which can be hacked
> to take more things apart than it currently does. Looking for this
> pattern in more traffic would be revealing in multiple ways. Looking
> for more patterns in bigger wifi networks would be good also.
>
> I like erics suggestion of doing more ack compression higher up in the
> tcp stack.
>
> There are two other things I've suggested in the past we look at. 1)
> The current fq_codel_for_wifi code has a philosophy of "one aggregate
> in the hardware, one ready to go". A simpler modification to fit more
> in would be to (wait the best case estimate for delivering the one in
> the hardware - a bit), then form the one ready-to-go.
>
> 2) rate limiting mcast and smoothing mcast bursts over time, allowing
> more unicast through. presently the mcast queue is infinite and very
> bursty. 802.11 std actually suggests mcast be rate limited by htb,
> where I'd be htb + fq + merging dup packets. I was routinely able to
> blow up the c.h.i.p's wifi and the babel protocol by flooding it with
> mcast, as the local mcast queue could easily grow 16+ seconds long.
>
> um, I'm giving a preso tomorrow and will run behind this thread. It's
> nice to see the renewed enthusiasm here, keep it up.
>
>>> If there are no other stations competing for airtime, why does it matter
>>> that we use two txops?
>>
>> One further argument would be power consumption. Radio transmitters eat
>> batteries for lunch; the only consistently worse offender I can think of is
>> a display backlight, assuming the software is efficient.
>
>> - Jonathan Morton
>>
>> _______________________________________________
>> Bloat mailing list
>> ***@lists.bufferbloat.net
>> https://lists.bufferbloat.net/listinfo/bloat
>
>
>
> --
>
> Dave Täht
> CEO, TekLibre, LLC
> http://www.teklibre.com
> Tel: 1-669-226-2619


Sent with AquaMail for Android
https://www.mobisystems.com/aqua-mail
Loading...