[Bloat] No backpressure "shaper"+AQM

The problem that I'm getting is by adding my own shaping, a measurable amount of the benefit of their AQM is lost. While I am limited to Codel, HFSC+Codel, or FairQ+Codel for now, I am actually doing a worse job of anti-bufferbloat than my ISP is. Fewer latency spices according to DSLReports.

We do know that applying SQM at the entry to the bottleneck link works much better than at the exit. It's a fundamental principle.

That's when I thought of a backpressure-less AQM. Instead of having backpressure and measuring sojourn time as a function of how long it takes packets to get scheduled, predict an estimated sojourn time based on the observed rate of flow, but allow packets to immediately vacate the queue. The AQM would either mark ECN or drop the packet, but never delay the packet.

It's a reasonable idea. The key point is to use a deficit-mode scheduler/shaper, rather than the credit-mode ones that are common (mainly TBF/HTB). The latter are why you have such a big, uncontrolled burst from the ISP in the first place.

- Jonathan Morton

Benjamin Cronce

2018-07-24 20:48:35 UTC

Post by Benjamin Cronce
The problem that I'm getting is by adding my own shaping, a measurable

amount of the benefit of their AQM is lost. While I am limited to Codel,
HFSC+Codel, or FairQ+Codel for now, I am actually doing a worse job of
anti-bufferbloat than my ISP is. Fewer latency spices according to
DSLReports.
We do know that applying SQM at the entry to the bottleneck link works
much better than at the exit. It's a fundamental principle.

Post by Benjamin Cronce
That's when I thought of a backpressure-less AQM. Instead of having

backpressure and measuring sojourn time as a function of how long it takes
packets to get scheduled, predict an estimated sojourn time based on the
observed rate of flow, but allow packets to immediately vacate the queue.
The AQM would either mark ECN or drop the packet, but never delay the
packet.
It's a reasonable idea. The key point is to use a deficit-mode
scheduler/shaper, rather than the credit-mode ones that are common (mainly
TBF/HTB). The latter are why you have such a big, uncontrolled burst from
the ISP in the first place.
- Jonathan Morton

From what I understand, the ISP is shaping on the core router and they're
using whatever algorithm so happens to be implemented. It has been a few
years since I last talked to anyone from there and it does seem to be
acting differently, so I am not sure if they purposefully made any changes,
but when I did talk to them last time, they said they did not do any
purposeful configurations to combat bufferbloat and whatever I was seeing
was entirely arbitrary. When their shaping was worse, it very much acted
like a sliding window in that it pretty much like line rate 1Gb/s through
until ~200ms, at which point it started to clamp down very quickly and
reach a healthy steady state in ~2 seconds. But during that transition,
loss spikes were pretty bad. Now it feels like the window is just much
larger. I no longer see it hitting line rate anymore, but it does seems to
be capped around 2x provisioned. When I was at 150Mb, It maxed out around
300Mb/s and slowly dropped to 150Mb. Now it maxed out about 500Mb and
roughly the same slope down to 250Mb.

Here is an example of what I'm seeing
https://www.dslreports.com/speedtest/36310277
While there are a few spikes on the download, when running many tests in a
row, I see fewer and smaller spikes than if I do my own shaping.

Dave Taht

2018-07-24 20:57:31 UTC

Maybe the Bobbie idea already would do this, but I did not see it explicitly mentioned on its wiki.

you are basically correct below. bobbie's core idea was a kinder,
gentler, deficit mode policer, with something fq_codel-like aimed at
reducing accumulated bytes to below the set rate.

this is way different from conventional policers

The below is all about shaping ingress, not egress.
My issue is that my ISP already does a good job with their AQM but nothing is perfect and their implementation of rate limiting has a kind of burst at the start. According to speedtests, my 250Mb connection starts off around 500Mb/s and slowly drops over the next few seconds until a near perfect 250Mb/s steady state.

ideally their htb shaper would use fq_codel as the underlying qdisc.
Or at least reduce their burst size to something saner. I hope it's
not a policer.

You can usually "see" a policer in action. Long strings of packets are
dropped in a row.

The burst at the beginning adds a certain amount of destabilization to the TCP flows since the window quickly grows to 500Mb and then has to backdown by dropping. If I add my own traffic shaping and AQM, I can reduce the reported TCP retransmissions from ~3% to ~0.1%.

sure.

? measured how.

This also does not touch on that the act of adding back-pressure in its nature increases latency. I cannot say if it's a fundamental requirement in order to better my current situation, but I am curious if there's a better way. A thought that came to me is that like Bobbie, do a light touch as the packets have already made their way and you don't want to aggressively drop packets, but at the same time, I want the packets that already made the journey to mostly unhindered enter my network.
That's when I thought of a backpressure-less AQM.

I like the restating of what policers do....

Instead of having backpressure and measuring sojourn time as a function of how long it takes packets to get scheduled, predict an estimated sojourn time based on the observed rate of flow, but allow packets to immediately vacate the queue. The AQM would either mark ECN or drop the packet, but never delay the packet.

aqms don't delay packets. shapers do.

In summary, my ISP seems to have better latency with their AQM, but due to their shaper, loss during the burst is much higher than desirable.
Maybe this will be mostly moot once I get fq_codel going on pfSense, but I do find it an interesting issue.

I thought it's been in there for a while?

_______________________________________________
Bloat mailing list
https://lists.bufferbloat.net/listinfo/bloat

--
Dave Täht
CEO, TekLibre, LLC
http://www.teklibre.com
Tel: 1-669-226-2619

Dave Taht

2018-07-24 21:31:16 UTC

I'd actually written some code for this way back when, if you want to
know how policers currently work,
google for "tri-color policer". They worked ok in the T1 era, but suck
rocks now.

TL;DR

Earlier this week while we (ended up) debugging a buggy comcast modem
I was desparate enough to resume thinking it was time to revisit the
concept as a first stage filter prior to hitting cake.

I did think that an *aqm* that aimed for defeating a burst policer
rate (much like BBR is doing now) would be a goodness. Say you know
there is a policer upstream configured like yours...

Then... I thought we could build something lighter weight than
shaping, but the feature set built and built... A few useful
enhancements to the standard policer like deficits rather than tbf,
adding ECN, shooting equally at all flows it sees (fq), not shooting
at one flow for more than one packet in 4 (thus, voip suffers not),
trying to wait an RTT before shooting again (codel - actually pie in
this case).

As one example I controlled the shooting schedule with a 2048 bit, (2
bits per flow) bitmap sampling every 5ms, keeping around 16 versions
(thus 80ms of history)

I discarded the idea for several other reasons back then

ENOFUNDING
TBF Policers generally are used in switches and routers that do it in
hardware. Everything I came up with
was doable in HW (O1) (which cake/fq_codel are not), but waiting 10
years for it to show up in ISP hardware seemed harder than waiting 10
years to see ISP shapers get fixed.
Shaping, given the growing amounts of multi-core underutilized cpus,
allowed us to be more gentle and achieve
goals like better e2e host and flow fq, while allowing sparse flows to
not be delayed very much.
ENOFUNDING
Identifying flows required taking a hash which slows things down.

Still, a "slightly better" bobbie aqm influenced policer has long
seemed doable so long as it's extremely lightweight.

Dave Taht

2018-07-26 04:42:00 UTC

Post by Dave Taht
I'd actually written some code for this way back when, if you want to
know how policers currently work,
google for "tri-color policer". They worked ok in the T1 era, but suck
rocks now.
TL;DR
Earlier this week while we (ended up) debugging a buggy comcast modem
I was desparate enough to resume thinking it was time to revisit the
concept as a first stage filter prior to hitting cake.
I did think that an *aqm* that aimed for defeating a burst policer
rate (much like BBR is doing now) would be a goodness. Say you know
there is a policer upstream configured like yours...
Then... I thought we could build something lighter weight than
shaping, but the feature set built and built... A few useful
enhancements to the standard policer like deficits rather than tbf,
adding ECN, shooting equally at all flows it sees (fq), not shooting
at one flow for more than one packet in 4 (thus, voip suffers not),
trying to wait an RTT before shooting again (codel - actually pie in
this case).

I don't want this convo to die on details. Having a less cpu intensive
than shaping yet more effective than policing "thing" would be a
goodness.

Post by Dave Taht
As one example I controlled the shooting schedule with a 2048 bit, (2
bits per flow) bitmap sampling every 5ms, keeping around 16 versions
(thus 80ms of history)

Another variant was bloom filters. I'd played with murmur as an
alternate hash simply because this pull request was a gas:
https://github.com/bitly/dablooms/pull/19

Post by Dave Taht
I discarded the idea for several other reasons back then
ENOFUNDING
TBF Policers generally are used in switches and routers that do it in
hardware. Everything I came up with
was doable in HW (O1) (which cake/fq_codel are not), but waiting 10
years for it to show up in ISP hardware seemed harder than waiting 10
years to see ISP shapers get fixed.
Shaping, given the growing amounts of multi-core underutilized cpus,
allowed us to be more gentle and achieve
goals like better e2e host and flow fq, while allowing sparse flows to
not be delayed very much.
ENOFUNDING
Identifying flows required taking a hash which slows things down.
Still, a "slightly better" bobbie aqm influenced policer has long
seemed doable so long as it's extremely lightweight.

--
Dave Täht
CEO, TekLibre, LLC
http://www.teklibre.com
Tel: 1-669-226-2619

Benjamin Cronce

2018-07-24 21:39:29 UTC

Post by Benjamin Cronce
Maybe the Bobbie idea already would do this, but I did not see it

explicitly mentioned on its wiki.
you are basically correct below. bobbie's core idea was a kinder,
gentler, deficit mode policer, with something fq_codel-like aimed at
reducing accumulated bytes to below the set rate.
this is way different from conventional policers

Post by Benjamin Cronce
The below is all about shaping ingress, not egress.
My issue is that my ISP already does a good job with their AQM but

nothing is perfect and their implementation of rate limiting has a kind of
burst at the start. According to speedtests, my 250Mb connection starts off
around 500Mb/s and slowly drops over the next few seconds until a near
perfect 250Mb/s steady state.
ideally their htb shaper would use fq_codel as the underlying qdisc.
Or at least reduce their burst size to something saner. I hope it's
not a policer.
You can usually "see" a policer in action. Long strings of packets are
dropped in a row.

I feel as if this new configuration is not quite a policer as it feels much
less abrupt as the old configuration. It used to have massive loss spikes
that wrecked havoc on other flows and make the fat TCP flows have a kind of
rebound. Their newer setup seems to be gentler. While there is an increased
rate of loss as it attempt to "slowly" settle at the provisioned rate, it's
not like the cliff it used to be, it actually has a slope.

Post by Benjamin Cronce
The burst at the beginning adds a certain amount of destabilization to

the TCP flows since the window quickly grows to 500Mb and then has to
backdown by dropping. If I add my own traffic shaping and AQM, I can reduce
the reported TCP retransmissions from ~3% to ~0.1%.
sure.

Post by Benjamin Cronce
The problem that I'm getting is by adding my own shaping, a measurable

Just looking visual at the DSLReport graphs, I more normally see maybe a
few 40ms-150ms ping spikes, while my own attempts to shape can get me
several 300ms spikes. I would really need a lot more samples and actually
run the numbers on them, but just causally looking at them, I get the sense
that mine is worse.

Post by Benjamin Cronce
This also does not touch on that the act of adding back-pressure in its

nature increases latency. I cannot say if it's a fundamental requirement in
order to better my current situation, but I am curious if there's a better
way. A thought that came to me is that like Bobbie, do a light touch as the
packets have already made their way and you don't want to aggressively drop
packets, but at the same time, I want the packets that already made the
journey to mostly unhindered enter my network.

Post by Benjamin Cronce
That's when I thought of a backpressure-less AQM.

I like the restating of what policers do....

I think I need to look at the definition of a policer. I always through
them as a strict cut-off. I'm not talking about mass dropping packets
beyond a rate, just doing something like Codel where a packet here and
there get dropped at an increasing rate until the observed rate normalizes.

Post by Benjamin Cronce
Instead of having backpressure and measuring sojourn time as a function

of how long it takes packets to get scheduled, predict an estimated sojourn
time based on the observed rate of flow, but allow packets to immediately
vacate the queue. The AQM would either mark ECN or drop the packet, but
never delay the packet.
aqms don't delay packets. shapers do.

My described "AQM" is not a shaper in that it does not schedule
packets(possibly FIFO and at line rate), but does understand bandwidth. It
neither delays packets nor has a strict cut-off. It essentially would allow
packets to flow through at line rate, but if the "average" rate gets too
high, it may decide to drop/mark the next packet. It might be described as
bufferless shaping where the goal is to minimize packet-loss. Shaping
purely by a gentle rate of increasing loss.

Of course this whole thought may be total rubbish, but I figured I'd throw
it out there.

Post by Benjamin Cronce
In summary, my ISP seems to have better latency with their AQM, but due

to their shaper, loss during the burst is much higher than desirable.

Post by Benjamin Cronce
Maybe this will be mostly moot once I get fq_codel going on pfSense, but

I do find it an interesting issue.
I thought it's been in there for a while?

Technically, but not practically. It should be easily available via the UI
with 2.4.4 which is slowly nearing release.

Post by Benjamin Cronce
_______________________________________________
Bloat mailing list
https://lists.bufferbloat.net/listinfo/bloat

--
Dave TÃ€ht
CEO, TekLibre, LLC
http://www.teklibre.com
Tel: 1-669-226-2619

Jonathan Morton

2018-07-24 21:44:26 UTC

Just looking visual at the DSLReport graphs, I more normally see maybe a few 40ms-150ms ping spikes, while my own attempts to shape can get me several 300ms spikes. I would really need a lot more samples and actually run the numbers on them, but just causally looking at them, I get the sense that mine is worse.

That could just be an artefact of your browser's scheduling latency. Try running an independent ping test alongside for verification.

Currently one of my machines has Chrome exhibiting frequent and very noticeable "hitching", while Firefox on the same machine is much smoother. Similar behaviour would easily be enough to cause such data anomalies.

- Jonathan Morton

Dave Taht

2018-07-24 21:58:21 UTC

Maybe the Bobbie idea already would do this, but I did not see it explicitly mentioned on its wiki.

ideally their htb shaper would use fq_codel as the underlying qdisc.
Or at least reduce their burst size to something saner. I hope it's
not a policer.
You can usually "see" a policer in action. Long strings of packets are
dropped in a row.

I feel as if this new configuration is not quite a policer as it feels much less abrupt as the old configuration. It used to have massive loss spikes that wrecked havoc on other flows and make the fat TCP flows have a kind of rebound. Their newer setup seems to be gentler. While there is an increased rate of loss as it attempt to "slowly" settle at the provisioned rate, it's not like the cliff it used to be, it actually has a slope.

well, ask 'em.

sure.

? measured how.

Just looking visual at the DSLReport graphs, I more normally see maybe a few 40ms-150ms ping spikes, while my own attempts to shape can get me several 300ms spikes. I would really need a lot more samples and actually run the numbers on them, but just causally looking at them, I get the sense that mine is worse.

too gentle we are perhaps. out of cpu you may be.

I like the restating of what policers do....

I think I need to look at the definition of a policer. I always through them as a strict cut-off. I'm not talking about mass dropping packets beyond a rate, just doing something like Codel where a packet here and there get dropped at an increasing rate until the observed rate normalizes.

bobs below the set rate long enough to drain the queue upstream.

aqms don't delay packets. shapers do.

My described "AQM" is not a shaper in that it does not schedule packets(possibly FIFO and at line rate), but does understand bandwidth. It neither delays packets nor has a strict cut-off. It essentially would allow packets to flow through at line rate, but if the "average" rate gets too high, it may decide to drop/mark the next packet. It might be described as bufferless shaping where the goal is to minimize packet-loss. Shaping purely by a gentle rate of increasing loss.

sure.

Of course this whole thought may be total rubbish, but I figured I'd throw it out there.

not rubbish, could be better than policing.

I thought it's been in there for a while?

Technically, but not practically. It should be easily available via the UI with 2.4.4 which is slowly nearing release.

_______________________________________________
Bloat mailing list
https://lists.bufferbloat.net/listinfo/bloat

--
Dave Täht
CEO, TekLibre, LLC
http://www.teklibre.com
Tel: 1-669-226-2619

Dave Taht

2018-07-24 22:12:04 UTC

Maybe the Bobbie idea already would do this, but I did not see it explicitly mentioned on its wiki.

ideally their htb shaper would use fq_codel as the underlying qdisc.
Or at least reduce their burst size to something saner. I hope it's
not a policer.
You can usually "see" a policer in action. Long strings of packets are
dropped in a row.

I feel as if this new configuration is not quite a policer as it feels much less abrupt as the old configuration. It used to have massive loss spikes that wrecked havoc on other flows and make the fat TCP flows have a kind of rebound. Their newer setup seems to be gentler. While there is an increased rate of loss as it attempt to "slowly" settle at the provisioned rate, it's not like the cliff it used to be, it actually has a slope.

well, ask 'em.

sure.

? measured how.

Just looking visual at the DSLReport graphs, I more normally see maybe a few 40ms-150ms ping spikes, while my own attempts to shape can get me several 300ms spikes. I would really need a lot more samples and actually run the numbers on them, but just causally looking at them, I get the sense that mine is worse.

too gentle we are perhaps. out of cpu you may be.

I like the restating of what policers do....

I think I need to look at the definition of a policer. I always through them as a strict cut-off. I'm not talking about mass dropping packets beyond a rate, just doing something like Codel where a packet here and there get dropped at an increasing rate until the observed rate normalizes.

bobs below the set rate long enough to drain the queue upstream.

s/queue/tbf

aqms don't delay packets. shapers do.

My described "AQM" is not a shaper in that it does not schedule packets(possibly FIFO and at line rate), but does understand bandwidth. It neither delays packets nor has a strict cut-off. It essentially would allow packets to flow through at line rate, but if the "average" rate gets too high, it may decide to drop/mark the next packet. It might be described as bufferless shaping where the goal is to minimize packet-loss. Shaping purely by a gentle rate of increasing loss.

sure.

Of course this whole thought may be total rubbish, but I figured I'd throw it out there.

not rubbish, could be better than policing.

I thought it's been in there for a while?

Technically, but not practically. It should be easily available via the UI with 2.4.4 which is slowly nearing release.

_______________________________________________
Bloat mailing list
https://lists.bufferbloat.net/listinfo/bloat

--
Dave Täht
CEO, TekLibre, LLC
http://www.teklibre.com
Tel: 1-669-226-2619

Benjamin Cronce

2018-07-25 00:11:29 UTC

Post by Benjamin Cronce
Just looking visual at the DSLReport graphs, I more normally see maybe a

few 40ms-150ms ping spikes, while my own attempts to shape can get me
several 300ms spikes. I would really need a lot more samples and actually
run the numbers on them, but just causally looking at them, I get the sense
that mine is worse.
That could just be an artefact of your browser's scheduling latency. Try
running an independent ping test alongside for verification.
Currently one of my machines has Chrome exhibiting frequent and very
noticeable "hitching", while Firefox on the same machine is much smoother.
Similar behaviour would easily be enough to cause such data anomalies.
- Jonathan Morton

Challenge accepted. 10 pings per second at my ISP's speedtest server. My
wife was watching Sing for the millionth time on Netflix during these tests.

Idle
Packets: sent=300, rcvd=300, error=0, lost=0 (0.0% loss) in 29.903240 sec
RTTs in ms: min/avg/max/dev: 1.554 / 2.160 / 3.368 / 0.179
Bandwidth in kbytes/sec: sent=0.601, rcvd=0.601

shaping
------------------------
During download
Packets: sent=123, rcvd=122, error=0, lost=1 (0.8% loss) in 12.203803 sec
RTTs in ms: min/avg/max/dev: 1.459 / 2.831 / 8.281 / 0.955
Bandwidth in kbytes/sec: sent=0.604, rcvd=0.599

During upload
Packets: sent=196, rcvd=195, error=0, lost=1 (0.5% loss) in 19.503948 sec
RTTs in ms: min/avg/max/dev: 1.608 / 3.247 / 5.471 / 0.853
Bandwidth in kbytes/sec: sent=0.602, rcvd=0.599

no shaping
-----------------------------
During download
Packets: sent=147, rcvd=147, error=0, lost=0 (0.0% loss) in 14.604027 sec
RTTs in ms: min/avg/max/dev: 1.161 / 2.110 / 13.525 / 1.069
Bandwidth in kbytes/sec: sent=0.603, rcvd=0.603

During upload
Packets: sent=199, rcvd=199, error=0, lost=0 (0.0% loss) in 19.802377 sec
RTTs in ms: min/avg/max/dev: 1.238 / 2.071 / 4.715 / 0.373
Bandwidth in kbytes/sec: sent=0.602, rcvd=0.602

Now I really feel like disabling shaping on my end. The TCP streams have
increased loss without shaping, but my ICMP looks better. Better flow
isolation? Need me some fq_Codel or Cake. Going to set fq_Codel to
something like target 3ms and 45ms RTT. Due to CDNs and regional gaming
servers, something like 95% of everything is less than 30ms away and
something like 80% is like less than 15ms away.

Akamai 1-2ms
Netflix 2-3ms
Hulu 2-3ms
Cloudflare 9ms
Discord 9ms
World of Warcraft/Battle.Net 9ms
Youtube 12ms

Too short of tests, but interesting.