Discussion:
[Bloat] excellent result with OpenWrt 18.06.1 + CAKE on FTTH
Mikael Abrahamsson
2018-11-12 06:53:49 UTC
Permalink
Hi,

I am running "stock" OpenWrt 18.06.1 on an WRT1200AC with
CAKE+piece_of_cake.qos and set to 250 down 100 up. This is on an ethernet
point-to-point FTTH connection in Stockholm, Sweden. Basically just
installed OpenWrt and then added the sqm-scripts-extra and luci-app-sqm
packages, went in and configured the correct settings in the web UI, and
then everything was great.

Biggest benefit with this FTTH setup is that I don't have to experience
the first-hop sceduler I had with my previous DOCSIS connection (that also
sometimes didn't do advertised bandwidth so I ended up getting 10-30ms of
bufferbloat).

http://www.dslreports.com/speedtest/41682104

The smokeping screenshots below show the difference between DOCSIS and
FTTH scheduler, but the much lower access RTT (1-2 ms ) and the lower PDV
(which seems to be several ms on DOCSIS but not on my P2P FTTH).

https://imgur.com/a/96dFdho

Thanks everybody for the excellent packaging and ease of use for end users
to get this to work. I've had this running now for 40 days without any
issue.
--
Mikael Abrahamsson email: ***@swm.pp.se
Dave Taht
2018-11-12 15:26:50 UTC
Permalink
Post by Mikael Abrahamsson
Hi,
I am running "stock" OpenWrt 18.06.1 on an WRT1200AC with
CAKE+piece_of_cake.qos and set to 250 down 100 up. This is on an
ethernet point-to-point FTTH connection in Stockholm,
Sweden. Basically just installed OpenWrt and then added the
sqm-scripts-extra and luci-app-sqm packages, went in and configured
the correct settings in the web UI, and then everything was great.
Biggest benefit with this FTTH setup is that I don't have to
experience the first-hop sceduler I had with my previous DOCSIS
connection (that also sometimes didn't do advertised bandwidth so I
ended up getting 10-30ms of bufferbloat).
http://www.dslreports.com/speedtest/41682104
The smokeping screenshots below show the difference between DOCSIS and
FTTH scheduler, but the much lower access RTT (1-2 ms ) and the lower
PDV (which seems to be several ms on DOCSIS but not on my P2P FTTH).
https://imgur.com/a/96dFdho
Thanks everybody for the excellent packaging and ease of use for end
users to get this to work. I've had this running now for 40 days
without any issue.
After running a few days... (I imagine you've restarted cake a few times)

tc -s qdisc show dev your_device?
tc -s qdisc show dev your_ifbdevice?
Dave Taht
2018-11-12 15:36:51 UTC
Permalink
I guess my biggest question is how bloated is the "Before cake"
version of the link?
Mikael Abrahamsson
2018-11-12 15:51:13 UTC
Permalink
Post by Dave Taht
I guess my biggest question is how bloated is the "Before cake"
version of the link?
Not very.

http://www.dslreports.com/speedtest/41693199

I then did another test while at the same time doing a different vendor
speedtest:

http://www.dslreports.com/speedtest/41693256

Ping just increased 5-10 ms when doing this.

If I then re-enable cake with 250000/100000 I get:

http://www.dslreports.com/speedtest/41693346

qdisc after this last test:

qdisc cake 8034: dev eth1.2 root refcnt 2 bandwidth 100Mbit besteffort
triple-isolate nonat nowash no-ack-filter split-gso rtt 100.0ms raw
overhead 0
Sent 391610860 bytes 650447 pkt (dropped 1430, overlimits 645558 requeues
0)
backlog 0b 0p requeues 0
memory used: 2425408b of 5000000b
capacity estimate: 100Mbit
min/max network layer size: 46 / 1514
min/max overhead-adjusted size: 46 / 1514
average network hdr offset: 14

Tin 0
thresh 100Mbit
target 5.0ms
interval 100.0ms
pk_delay 82us
av_delay 6us
sp_delay 1us
backlog 0b
pkts 651877
bytes 393761357
way_inds 11602
way_miss 3103
way_cols 0
drops 1430
marks 0
ack_drop 0
sp_flows 16
bk_flows 1
un_flows 0
max_len 18168
quantum 1514

qdisc ingress ffff: dev eth1.2 parent ffff:fff1 ----------------
Sent 896042971 bytes 760157 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0
qdisc fq_codel 0: dev tun2025 root refcnt 2 limit 10240p flows 1024
quantum 1514 target 5.0ms interval 100.0ms memory_limit 4Mb ecn
Sent 21580 bytes 166 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0
maxpacket 0 drop_overlimit 0 new_flow_count 0 ecn_mark 0
new_flows_len 0 old_flows_len 0
qdisc cake 8035: dev ifb4eth1.2 root refcnt 2 bandwidth 250Mbit besteffort
triple-isolate nonat nowash no-ack-filter split-gso rtt 100.0ms raw
overhead 0
Sent 912501460 bytes 754253 pkt (dropped 5904, overlimits 926439 requeues
0)
backlog 0b 0p requeues 0
memory used: 805712b of 12500000b
capacity estimate: 250Mbit
min/max network layer size: 60 / 1514
min/max overhead-adjusted size: 60 / 1514
average network hdr offset: 14

Tin 0
thresh 250Mbit
target 5.0ms
interval 100.0ms
pk_delay 650us
av_delay 429us
sp_delay 1us
backlog 0b
pkts 760157
bytes 921432581
way_inds 17426
way_miss 3168
way_cols 0
drops 5904
marks 0
ack_drop 0
sp_flows 7
bk_flows 1
un_flows 0
max_len 15104
quantum 1514


it seems to smoothe out the flows better than my ISP shaper.

These tests are done when rest of people in the household was also using
Internet for other things, so not "clean room".
--
Mikael Abrahamsson email: ***@swm.pp.se
Dave Taht
2018-11-12 16:07:04 UTC
Permalink
Post by Mikael Abrahamsson
Post by Dave Taht
I guess my biggest question is how bloated is the "Before cake"
version of the link?
Not very.
http://www.dslreports.com/speedtest/41693199
Kind of hard to argue with that. However I tend to think dslreports
and browsers start to give inaccurate tests at these speeds. Got rrul
available? The closest flent server I have to you is in germany.

I also agree that, man, the base difference in "getting on" a fiber
network is really something you can feel. - 2ms vs 10+... you can
complete a short transaction before one can even start, on cable. To
me, this is probably the biggest perceptual difference between cable
and fiber behavior (more important than fixing bloat)

I miss being on sonic. And I didn't know, until recently, that my
modem was on badmodems.com's list.
Post by Mikael Abrahamsson
I then did another test while at the same time doing a different vendor
http://www.dslreports.com/speedtest/41693256
Ping just increased 5-10 ms when doing this.
http://www.dslreports.com/speedtest/41693346
I don't "get" the knee in the download curve here and the prior test.
Post by Mikael Abrahamsson
qdisc cake 8034: dev eth1.2 root refcnt 2 bandwidth 100Mbit besteffort
triple-isolate nonat nowash no-ack-filter split-gso rtt 100.0ms raw
overhead 0
Sent 391610860 bytes 650447 pkt (dropped 1430, overlimits 645558 requeues
0)
backlog 0b 0p requeues 0
memory used: 2425408b of 5000000b
capacity estimate: 100Mbit
min/max network layer size: 46 / 1514
min/max overhead-adjusted size: 46 / 1514
average network hdr offset: 14
Tin 0
thresh 100Mbit
target 5.0ms
interval 100.0ms
pk_delay 82us
av_delay 6us
sp_delay 1us
backlog 0b
pkts 651877
bytes 393761357
way_inds 11602
way_miss 3103
way_cols 0
drops 1430
marks 0
ack_drop 0
sp_flows 16
bk_flows 1
un_flows 0
max_len 18168
quantum 1514
qdisc ingress ffff: dev eth1.2 parent ffff:fff1 ----------------
Sent 896042971 bytes 760157 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0
qdisc fq_codel 0: dev tun2025 root refcnt 2 limit 10240p flows 1024
quantum 1514 target 5.0ms interval 100.0ms memory_limit 4Mb ecn
Sent 21580 bytes 166 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0
maxpacket 0 drop_overlimit 0 new_flow_count 0 ecn_mark 0
new_flows_len 0 old_flows_len 0
qdisc cake 8035: dev ifb4eth1.2 root refcnt 2 bandwidth 250Mbit besteffort
triple-isolate nonat nowash no-ack-filter split-gso rtt 100.0ms raw
overhead 0
Sent 912501460 bytes 754253 pkt (dropped 5904, overlimits 926439 requeues
0)
backlog 0b 0p requeues 0
memory used: 805712b of 12500000b
capacity estimate: 250Mbit
min/max network layer size: 60 / 1514
min/max overhead-adjusted size: 60 / 1514
average network hdr offset: 14
Tin 0
thresh 250Mbit
target 5.0ms
interval 100.0ms
pk_delay 650us
av_delay 429us
sp_delay 1us
backlog 0b
pkts 760157
bytes 921432581
way_inds 17426
way_miss 3168
way_cols 0
drops 5904
marks 0
ack_drop 0
sp_flows 7
bk_flows 1
un_flows 0
max_len 15104
quantum 1514
it seems to smoothe out the flows better than my ISP shaper.
These tests are done when rest of people in the household was also using
Internet for other things, so not "clean room".
--
--
Dave Täht
CTO, TekLibre, LLC
http://www.teklibre.com
Tel: 1-831-205-9740
Mikael Abrahamsson
2018-11-12 16:22:29 UTC
Permalink
Post by Dave Taht
Post by Mikael Abrahamsson
http://www.dslreports.com/speedtest/41693346
I don't "get" the knee in the download curve here and the prior test.
That's when I start a competing speedtest to the local swedish speedtest
site using an OSX app they ship.
--
Mikael Abrahamsson email: ***@swm.pp.se
Mikael Abrahamsson
2018-11-12 15:38:05 UTC
Permalink
Post by Dave Taht
tc -s qdisc show dev your_device?
tc -s qdisc show dev your_ifbdevice?
I haven't restarted in 40 days and I don't remember restarting cake, so
this should be several weeks of data.

qdisc cake 8031: dev eth1.2 root refcnt 2 bandwidth 100Mbit besteffort
triple-isolate nonat nowash no-ack-filter split-gso rtt 100.0ms raw
overhead 0
Sent 70822286277 bytes 202513660 pkt (dropped 13984, overlimits 25350421
requeues 0)
backlog 0b 0p requeues 0
memory used: 5156288b of 5000000b
capacity estimate: 100Mbit
min/max network layer size: 42 / 1514
min/max overhead-adjusted size: 42 / 1514
average network hdr offset: 14

Tin 0
thresh 100Mbit
target 5.0ms
interval 100.0ms
pk_delay 4us
av_delay 1us
sp_delay 1us
backlog 0b
pkts 202527644
bytes 70842325936
way_inds 4939006
way_miss 11834545
way_cols 0
drops 13984
marks 512
ack_drop 0
sp_flows 2
bk_flows 1
un_flows 0
max_len 28766
quantum 1514

qdisc ingress ffff: dev eth1.2 parent ffff:fff1 ----------------
Sent 807912654344 bytes 631652827 pkt (dropped 0, overlimits 0 requeues
0)
backlog 0b 0p requeues 0
qdisc cake 8032: dev ifb4eth1.2 root refcnt 2 bandwidth 250Mbit besteffort
triple-isolate nonat nowash no-ack-filter split-gso rtt 100.0ms raw
overhead 0
Sent 829571211610 bytes 631641016 pkt (dropped 11811, overlimits
790055554 requeues 0)
backlog 0b 0p requeues 0
memory used: 4540528b of 12500000b
capacity estimate: 250Mbit
min/max network layer size: 60 / 1514
min/max overhead-adjusted size: 60 / 1514
average network hdr offset: 14

Tin 0
thresh 250Mbit
target 5.0ms
interval 100.0ms
pk_delay 1.2ms
av_delay 559us
sp_delay 1us
backlog 0b
pkts 631652827
bytes 829588333230
way_inds 12061686
way_miss 12913211
way_cols 1
drops 11811
marks 3589
ack_drop 0
sp_flows 1
bk_flows 1
un_flows 0
max_len 38444
quantum 1514
--
Mikael Abrahamsson email: ***@swm.pp.se
Dave Taht
2018-11-12 15:55:35 UTC
Permalink
Post by Mikael Abrahamsson
Post by Dave Taht
tc -s qdisc show dev your_device?
tc -s qdisc show dev your_ifbdevice?
I haven't restarted in 40 days and I don't remember restarting cake, so
this should be several weeks of data.
qdisc cake 8031: dev eth1.2 root refcnt 2 bandwidth 100Mbit besteffort
triple-isolate nonat nowash no-ack-filter split-gso rtt 100.0ms raw
Pretty nice. There is not much need for it, but it would be interesting to know
how much cpu overhead a full set of uploads with and without the
ack-filter enabled
at 100Mbit with this arch takes.
Post by Mikael Abrahamsson
overhead 0
Sent 70822286277 bytes 202513660 pkt (dropped 13984, overlimits 25350421
requeues 0)
backlog 0b 0p requeues 0
memory used: 5156288b of 5000000b
capacity estimate: 100Mbit
min/max network layer size: 42 / 1514
min/max overhead-adjusted size: 42 / 1514
average network hdr offset: 14
Tin 0
thresh 100Mbit
target 5.0ms
interval 100.0ms
pk_delay 4us
av_delay 1us
sp_delay 1us
backlog 0b
pkts 202527644
bytes 70842325936
Don't use this connection much, do you? :)
Post by Mikael Abrahamsson
way_inds 4939006
way_miss 11834545
way_cols 0
drops 13984
It's nice to know AQM is still needed at these speeds.
Post by Mikael Abrahamsson
marks 512
and that you have at least one device with ecn enabled. Would this be
OSX or IOS perhaps?
Post by Mikael Abrahamsson
ack_drop 0
And how many ack-drops you get.
Post by Mikael Abrahamsson
sp_flows 2
bk_flows 1
un_flows 0
max_len 28766
I really hate GRO. At 100Mbit, without splitting, that's 2.5ms of
jitter. I've had a long held dream of being able to do the "jamophone"
with less than 2.5ms latency end-2-end across town, with multichannel
48khz 24 bit audio... at 130us, that leaves some room for the encoder.

I don't suppose you have someone else "across town" you could run some
benchmarks against?

I did a extremely successful test of sonic's fiber network in SF where
I got well under 2.5ms jitter for an 8 channel stream... then I left
SF. :(

I was kind of expecting 64k, though, here.
Post by Mikael Abrahamsson
quantum 1514
qdisc ingress ffff: dev eth1.2 parent ffff:fff1 ----------------
Sent 807912654344 bytes 631652827 pkt (dropped 0, overlimits 0 requeues
0)
backlog 0b 0p requeues 0
qdisc cake 8032: dev ifb4eth1.2 root refcnt 2 bandwidth 250Mbit besteffort
triple-isolate nonat nowash no-ack-filter split-gso rtt 100.0ms raw
overhead 0
Similarly, a cpu number under load. I note here, that splitting GSO
has a big cost, (primarily in routing table lookup) and you can at
these speeds, probably disable it.

ack filtering will not help.
Post by Mikael Abrahamsson
Sent 829571211610 bytes 631641016 pkt (dropped 11811, overlimits
790055554 requeues 0)
backlog 0b 0p requeues 0
memory used: 4540528b of 12500000b
capacity estimate: 250Mbit
min/max network layer size: 60 / 1514
min/max overhead-adjusted size: 60 / 1514
average network hdr offset: 14
Tin 0
thresh 250Mbit
target 5.0ms
interval 100.0ms
pk_delay 1.2ms
av_delay 559us
sp_delay 1us
backlog 0b
pkts 631652827
bytes 829588333230
way_inds 12061686
way_miss 12913211
way_cols 1
drops 11811
marks 3589
ack_drop 0
sp_flows 1
bk_flows 1
un_flows 0
max_len 38444
I was also expecting 64k here. I imagine you are using modern linuxes
that don't overuse TSO anymore,
and osx and windows never got into it to the extreme that linux did.
Post by Mikael Abrahamsson
quantum 1514
--
_______________________________________________
Bloat mailing list
https://lists.bufferbloat.net/listinfo/bloat
--
Dave Täht
CTO, TekLibre, LLC
http://www.teklibre.com
Tel: 1-831-205-9740
Mikael Abrahamsson
2018-11-12 16:21:13 UTC
Permalink
Post by Dave Taht
Don't use this connection much, do you? :)
Last 4 week average is 300 kilobit/s up and 3000 kilobit/s down. So no.
Mostly streaming Netflix and similar things.
Post by Dave Taht
Post by Mikael Abrahamsson
marks 512
and that you have at least one device with ecn enabled. Would this be
OSX or IOS perhaps?
I typically turn it on on all devices I remember to turn it on. There are
plenty of iOS devices in the household, but also ECN enabled OSX machines.
Post by Dave Taht
I don't suppose you have someone else "across town" you could run some
benchmarks against?
Surely. I can run anything you need, I have 1GE ubuntu machine
~3ms away. What tests do you want me to run? I have ubuntu laptop here I
can run wired tests with. It already has flent installed, so just tell me
what you want me to do and test. If you want me to change qdisc settings
I'm going to need good instructions, I am not proficient in changing those
settings.
Post by Dave Taht
Similarly, a cpu number under load. I note here, that splitting GSO
has a big cost, (primarily in routing table lookup) and you can at
these speeds, probably disable it.
sirq% peaks out around 35-40% when doing download at 250 megabit/s. Around
10% when doing upload at 100 megabit/s. Armada 385 is nice.
Post by Dave Taht
I was also expecting 64k here. I imagine you are using modern linuxes
that don't overuse TSO anymore, and osx and windows never got into it to
the extreme that linux did.
***@wrt1200-hemma:~# uname -a
Linux wrt1200-hemma 4.14.63 #0 SMP Wed Aug 15 20:42:39 2018 armv7l GNU/Linux
--
Mikael Abrahamsson email: ***@swm.pp.se
Dave Taht
2018-11-12 17:00:05 UTC
Permalink
Post by Mikael Abrahamsson
Post by Dave Taht
Don't use this connection much, do you? :)
Last 4 week average is 300 kilobit/s up and 3000 kilobit/s down. So no.
Mostly streaming Netflix and similar things.
Post by Dave Taht
Post by Mikael Abrahamsson
marks 512
and that you have at least one device with ecn enabled. Would this be
OSX or IOS perhaps?
I typically turn it on on all devices I remember to turn it on. There are
plenty of iOS devices in the household, but also ECN enabled OSX machines.
Post by Dave Taht
I don't suppose you have someone else "across town" you could run some
benchmarks against?
Surely. I can run anything you need, I have 1GE ubuntu machine
~3ms away. What tests do you want me to run? I have ubuntu laptop here I
can run wired tests with. It already has flent installed, so just tell me
what you want me to do and test. If you want me to change qdisc settings
I'm going to need good instructions, I am not proficient in changing those
settings.
cake on or off would be a good start, then fq_codel with simple.qos.

irtt stats are more accurate at this high resolution,
irtt is now packaged and started automagically for ubuntu, and picked
up automatically by flent if available. do check to see if it's on
both boxes and that the relevant port is not firewalled.

#/bin/sh
F="flent -x -s .02" # run flent in high resolution - even 20ms samples
isn't good enough anymore!
S=your_server
T="a meaningful title"

$F -H $S -t "$T" rrul
$F -H $S -t "$T" rrul_be
$F -H $S -t "$T-100" -l 20 --socket-stats --te=upload_streams=100 tcp_nup
$F -H $S -t "$T-500" -l 20 --socket-stats --te=upload_streams=500 tcp_nup
$F -H $S -t "$T" --te=download_streams=100 tcp_ndown

Of these the most interesting one in cake's case is actually the
tcn_nup test, vs fq_codel, vs baseline

I note that -s .02 and socket stats will EAT GB of ram to process and
take minutes to finalize.

If you feel really ambitious, ecn/no behavior on your tcps on tcp_nup
is rather interesting at this load.
Post by Mikael Abrahamsson
Post by Dave Taht
Similarly, a cpu number under load. I note here, that splitting GSO
has a big cost, (primarily in routing table lookup) and you can at
these speeds, probably disable it.
sirq% peaks out around 35-40% when doing download at 250 megabit/s. Around
10% when doing upload at 100 megabit/s. Armada 385 is nice.
Yep. I've universaly switched to that on "the low end" when I can't
get an APU2. If only the wifi was better.
Post by Mikael Abrahamsson
Post by Dave Taht
I was also expecting 64k here. I imagine you are using modern linuxes
that don't overuse TSO anymore, and osx and windows never got into it to
the extreme that linux did.
Linux wrt1200-hemma 4.14.63 #0 SMP Wed Aug 15 20:42:39 2018 armv7l GNU/Linux
--
--
Dave Täht
CTO, TekLibre, LLC
http://www.teklibre.com
Tel: 1-831-205-9740
Loading...