Discussion:
[Bloat] DC behaviors today
Dave Taht
2017-12-04 04:19:33 UTC
Permalink
Changing the topic, adding bloat.
Just from a Telco/Industry perspective slant.
Everything in DC has moved to SFP28 interfaces at 25Gbit as the server
port of interconnect. Everything TOR wise is now QSFP28 - 100Gbit.
Mellanox X5 cards are the current hotness, and their offload
enhancements (ASAP2 - which is sorta like DPDK on steroids) allows for
OVS flow rules programming into the card. We have a lot of customers
chomping at the bit for that feature (disclaimer I work for Nuage
Networks, and we are working on enhanced OVS to do just that) for NFV
workloads.
What Jesper's been working on for ages has been to try and get linux's
PPS up for small packets, which last I heard was hovering at about 4Gbits.

The route table lookup also really expensive on the main cpu.
Does this stuff offload the route table lookup also?

Are there rate limiting features? (like on a per customer basis with
matches for ipv4 and ipv6 subnets)
SFP+ 10gbit MIGHT become the target for the home Handoff, but I see a
lot of startup ISP's doing inhouse built OVS controlled SD-WAN style
deployments.
I still however can't see a coherent edge at 10gbit ecosystem. Docsis
3.1 now has 10gbit channel support, which is fine and dandy but there
is literally only the atom board I mentioned before as the only edge
device out capable of doing that right now, and I am only seing that
used in Expensive SD-WAN + NFV boxen for corporates. The Home 10gbit
plus box is still up for grabs in the market and no one is biting.
I have never thought there was much of a market for gbit to or from the
home. 40Mbits is enough for nearly everybody until > 4k video with
smellovision and tactile feedback become a standard.

(I may regret saying this in 10 years)
Anyway the nanopi folk are now producing a wide range of boards I
https://www.amazon.com/gp/product/B0728LPB2R/ref=oh_aui_detailpage_o00_s01?ie=UTF8&psc=1
Is this the same thing for cheaper?
http://www.friendlyarm.com/index.php?route=product/product&product_id=180
(but slow and non-free shipping)
It appears to be the H5.
Yep. Shipping from china takes a while. They also have a nice one bay
nas case of the right form-factor for an SSD.
They have a new octocore board too, and at least the promise of a
mainlined kernel across much of their product line.
Ordered one of those too.
And the allwinner products (like the odroid) do look to be coming along, also.
http://linux-sunxi.org/Linux_mainlining_effort
--
Matt Taggart
_______________________________________________
Cerowrt-devel mailing list
https://lists.bufferbloat.net/listinfo/cerowrt-devel
--
Dave Täht
CEO, TekLibre, LLC
http://www.teklibre.com
Tel: 1-669-226-2619
_______________________________________________
Cerowrt-devel mailing list
https://lists.bufferbloat.net/listinfo/cerowrt-devel
_______________________________________________
Cerowrt-devel mailing list
https://lists.bufferbloat.net/listinfo/cerowrt-devel
Mikael Abrahamsson
2017-12-04 09:13:18 UTC
Permalink
Post by Dave Taht
What Jesper's been working on for ages has been to try and get linux's
PPS up for small packets, which last I heard was hovering at about 4Gbits.
You might want to look into what the VPP (https://fd.io/) peeps are doing.
They can at least forward packets at pretty impressive rates. 200Mpps zero
frame loss with 2M FIB, limited to NIC and PCIe, not CPU (on many-core
machine).
Post by Dave Taht
I have never thought there was much of a market for gbit to or from the
home. 40Mbits is enough for nearly everybody until > 4k video with
smellovision and tactile feedback become a standard.
I'd say the sweet spot right now is in the 100-250 megabit/s range,
considering "cost of production" and "what do people need/use". This means
it still can be done on 1 gigabit/s access links.

Anything faster than 1GE is going to be significantly more expensive than
1GE because 1GE is "good enough for most" when it comes to hundreds of
millions of households for their inter/intra home need. Also for SME use,
1GE is good enough for a lot of use cases.

I personally now have 250/50 which is good enough for me, and I don't want
to pay 2x my current MRC to get 1000/100. However, if I had to downgrade
to 30 megabit/s I would most certinaly notice it, and in my market that
would just be a 20-30% saving which definitely isn't worth it.
--
Mikael Abrahamsson email: ***@swm.pp.se
Joel Wirāmu Pauling
2017-12-04 09:31:00 UTC
Permalink
I'm not going to pretend that 1Gig isn't enough for most people. But I
refuse to believe it's the networks equivalent of a 10A power (20A
depending on where you live in the world) AC residential phase
distribution circuit.

This isn't a question about what people need, it's more about what the
market can deliver. 10GPON (GPON-X) and others now make it a viable
service that can and is being deployed in residential and commercial
access networks.

3 years ago delivering an access network Capable of anything beyond
2.5Gbit was pretty much a Business case non-starter.

The problem is now that Retail Servicer Provider X can deliver a post
Gigabit service... what is capable of taking it off the ONU/CMNT point
in the home? As usual it's a follow the money question, once RSP's can
deliver Gbit+ they will need an ecosystem in the home to feed into it,
and right now there isn't a good technology platform that supports it;
10GBase-X/10GBaseT is a non-starter due to the variability in home
wiring - arguably the 7 year leap from 100-1000mbit was easy It's mean
a gap of 12 years and counting for the same.. it's not just the NIC's
and CPU's in the gateways it's the connector and in-home wiring
problems as well.


Blatant Plug - request :
I'm interested to hear opinions on this as I have a talk on this very
topic 'The long and Winding Road to 10Gbit+ in the home'
https://linux.conf.au/ at Linuxconf in January. In particular if you
have any home network gore/horror stories and photos you would be
happy for me to include in my talk, please include.


-Joel
Post by Mikael Abrahamsson
What Jesper's been working on for ages has been to try and get linux's PPS
up for small packets, which last I heard was hovering at about 4Gbits.
You might want to look into what the VPP (https://fd.io/) peeps are doing.
They can at least forward packets at pretty impressive rates. 200Mpps zero
frame loss with 2M FIB, limited to NIC and PCIe, not CPU (on many-core
machine).
I have never thought there was much of a market for gbit to or from the
home. 40Mbits is enough for nearly everybody until > 4k video with
smellovision and tactile feedback become a standard.
I'd say the sweet spot right now is in the 100-250 megabit/s range,
considering "cost of production" and "what do people need/use". This means
it still can be done on 1 gigabit/s access links.
Anything faster than 1GE is going to be significantly more expensive than
1GE because 1GE is "good enough for most" when it comes to hundreds of
millions of households for their inter/intra home need. Also for SME use,
1GE is good enough for a lot of use cases.
I personally now have 250/50 which is good enough for me, and I don't want
to pay 2x my current MRC to get 1000/100. However, if I had to downgrade to
30 megabit/s I would most certinaly notice it, and in my market that would
just be a 20-30% saving which definitely isn't worth it.
--
_______________________________________________
Cerowrt-devel mailing list
https://lists.bufferbloat.net/listinfo/cerowrt-devel
Mikael Abrahamsson
2017-12-04 10:18:49 UTC
Permalink
Post by Joel Wirāmu Pauling
I'm not going to pretend that 1Gig isn't enough for most people. But I
refuse to believe it's the networks equivalent of a 10A power (20A
depending on where you live in the world) AC residential phase
distribution circuit.
That's a good analogy. I actually believe it is, at least for the near
5-10 years.
Post by Joel Wirāmu Pauling
This isn't a question about what people need, it's more about what the
market can deliver. 10GPON (GPON-X) and others now make it a viable
service that can and is being deployed in residential and commercial
access networks.
Well, you're sharing that bw with everybody else on that splitter. Sounds
to me that the service being delivered over that would instead be in the
2-3 gigabit/s range for the individual subscriber (this is what I
typically see on equivalent shared mediums, that the top speed individual
subscriptions are will be in the 20-40% of max theoretical speed the
entire solution can deliver).
Post by Joel Wirāmu Pauling
The problem is now that Retail Servicer Provider X can deliver a post
Gigabit service... what is capable of taking it off the ONU/CMNT point
in the home? As usual it's a follow the money question, once RSP's can
deliver Gbit+ they will need an ecosystem in the home to feed into it,
and right now there isn't a good technology platform that supports it;
10GBase-X/10GBaseT is a non-starter due to the variability in home
wiring - arguably the 7 year leap from 100-1000mbit was easy It's mean a
gap of 12 years and counting for the same.. it's not just the NIC's and
CPU's in the gateways it's the connector and in-home wiring problems as
well.
As soon as one goes above 1GE, prices increases A LOT on everything
involved. I doubt we'll see any 2.5G or higher speed equipment in wide use
in home/SME in the next 5 years.
Post by Joel Wirāmu Pauling
I'm interested to hear opinions on this as I have a talk on this very
topic 'The long and Winding Road to 10Gbit+ in the home'
https://linux.conf.au/ at Linuxconf in January. In particular if you
have any home network gore/horror stories and photos you would be
happy for me to include in my talk, please include.
I am still waiting for a decently priced 10GE switch. I can get 1GE
24port managed ones, fanless, for 100-200USD. As soon as I go 10GE, price
jumps up a lot, and I get fans. The NICs aren't widely available, even
though they're not the biggest problem. My in-house cabling can do 10GE,
but I guess I'm an outlier.
--
Mikael Abrahamsson email: ***@swm.pp.se
Joel Wirāmu Pauling
2017-12-04 10:27:19 UTC
Permalink
How to deliver a switch, when the wiring and port standard isn't
actually workable?

10GBase-T is out of Voltage Spec with SFP+ ; you can get copper SFP+
but they are out of spec... 10GbaseT doesn't really work over Cat5e
more than a couple of meters (if you are lucky) and even Cat6 is only
rated at 30M... there is a reason no-one is producing Home Copper
switches and it's not just the NIC Silicon cost (that was a factor
until Recently obviously, but only part of the equation).

On the flip side:
Right now I am typing this via a 40gbit network, comprised of the
cheap and readily available Tb3 port - it's daisy chained and limited
to 6 ports, but right now it's easily the cheapest and most effective
port. Pitty that the fabled optical tb3 cables are damn expensive...
so you're limited to daisy-chains of 2m. They seem to have screwed the
pooch on the USB-C network standard quite badly - which looked so
promising, so for the moment Tb3 it is for me at least.
Post by Joel Wirāmu Pauling
I'm not going to pretend that 1Gig isn't enough for most people. But I
refuse to believe it's the networks equivalent of a 10A power (20A
depending on where you live in the world) AC residential phase
distribution circuit.
That's a good analogy. I actually believe it is, at least for the near 5-10
years.
Post by Joel Wirāmu Pauling
This isn't a question about what people need, it's more about what the
market can deliver. 10GPON (GPON-X) and others now make it a viable
service that can and is being deployed in residential and commercial
access networks.
Well, you're sharing that bw with everybody else on that splitter. Sounds to
me that the service being delivered over that would instead be in the 2-3
gigabit/s range for the individual subscriber (this is what I typically see
on equivalent shared mediums, that the top speed individual subscriptions
are will be in the 20-40% of max theoretical speed the entire solution can
deliver).
Post by Joel Wirāmu Pauling
The problem is now that Retail Servicer Provider X can deliver a post
Gigabit service... what is capable of taking it off the ONU/CMNT point in
the home? As usual it's a follow the money question, once RSP's can deliver
Gbit+ they will need an ecosystem in the home to feed into it, and right now
there isn't a good technology platform that supports it; 10GBase-X/10GBaseT
is a non-starter due to the variability in home wiring - arguably the 7 year
leap from 100-1000mbit was easy It's mean a gap of 12 years and counting for
the same.. it's not just the NIC's and CPU's in the gateways it's the
connector and in-home wiring problems as well.
As soon as one goes above 1GE, prices increases A LOT on everything
involved. I doubt we'll see any 2.5G or higher speed equipment in wide use
in home/SME in the next 5 years.
Post by Joel Wirāmu Pauling
I'm interested to hear opinions on this as I have a talk on this very
topic 'The long and Winding Road to 10Gbit+ in the home'
https://linux.conf.au/ at Linuxconf in January. In particular if you
have any home network gore/horror stories and photos you would be
happy for me to include in my talk, please include.
I am still waiting for a decently priced 10GE switch. I can get 1GE 24port
managed ones, fanless, for 100-200USD. As soon as I go 10GE, price jumps up
a lot, and I get fans. The NICs aren't widely available, even though they're
not the biggest problem. My in-house cabling can do 10GE, but I guess I'm an
outlier.
--
Pedro Tumusok
2017-12-04 10:43:29 UTC
Permalink
For in home or even SMB, I doubt that 10G to the user PC is the main use
case.
Its having the uplink capable of support of more than1G, that 1G does not
necessarily need to be generated by only one host on the LAN.



Pedro
Post by Joel Wirāmu Pauling
How to deliver a switch, when the wiring and port standard isn't
actually workable?
10GBase-T is out of Voltage Spec with SFP+ ; you can get copper SFP+
but they are out of spec... 10GbaseT doesn't really work over Cat5e
more than a couple of meters (if you are lucky) and even Cat6 is only
rated at 30M... there is a reason no-one is producing Home Copper
switches and it's not just the NIC Silicon cost (that was a factor
until Recently obviously, but only part of the equation).
Right now I am typing this via a 40gbit network, comprised of the
cheap and readily available Tb3 port - it's daisy chained and limited
to 6 ports, but right now it's easily the cheapest and most effective
port. Pitty that the fabled optical tb3 cables are damn expensive...
so you're limited to daisy-chains of 2m. They seem to have screwed the
pooch on the USB-C network standard quite badly - which looked so
promising, so for the moment Tb3 it is for me at least.
Post by Mikael Abrahamsson
Post by Joel Wirāmu Pauling
I'm not going to pretend that 1Gig isn't enough for most people. But I
refuse to believe it's the networks equivalent of a 10A power (20A
depending on where you live in the world) AC residential phase
distribution circuit.
That's a good analogy. I actually believe it is, at least for the near
5-10
Post by Mikael Abrahamsson
years.
Post by Joel Wirāmu Pauling
This isn't a question about what people need, it's more about what the
market can deliver. 10GPON (GPON-X) and others now make it a viable
service that can and is being deployed in residential and commercial
access networks.
Well, you're sharing that bw with everybody else on that splitter.
Sounds to
Post by Mikael Abrahamsson
me that the service being delivered over that would instead be in the 2-3
gigabit/s range for the individual subscriber (this is what I typically
see
Post by Mikael Abrahamsson
on equivalent shared mediums, that the top speed individual subscriptions
are will be in the 20-40% of max theoretical speed the entire solution
can
Post by Mikael Abrahamsson
deliver).
Post by Joel Wirāmu Pauling
The problem is now that Retail Servicer Provider X can deliver a post
Gigabit service... what is capable of taking it off the ONU/CMNT point
in
Post by Mikael Abrahamsson
Post by Joel Wirāmu Pauling
the home? As usual it's a follow the money question, once RSP's can
deliver
Post by Mikael Abrahamsson
Post by Joel Wirāmu Pauling
Gbit+ they will need an ecosystem in the home to feed into it, and
right now
Post by Mikael Abrahamsson
Post by Joel Wirāmu Pauling
there isn't a good technology platform that supports it;
10GBase-X/10GBaseT
Post by Mikael Abrahamsson
Post by Joel Wirāmu Pauling
is a non-starter due to the variability in home wiring - arguably the 7
year
Post by Mikael Abrahamsson
Post by Joel Wirāmu Pauling
leap from 100-1000mbit was easy It's mean a gap of 12 years and
counting for
Post by Mikael Abrahamsson
Post by Joel Wirāmu Pauling
the same.. it's not just the NIC's and CPU's in the gateways it's the
connector and in-home wiring problems as well.
As soon as one goes above 1GE, prices increases A LOT on everything
involved. I doubt we'll see any 2.5G or higher speed equipment in wide
use
Post by Mikael Abrahamsson
in home/SME in the next 5 years.
Post by Joel Wirāmu Pauling
I'm interested to hear opinions on this as I have a talk on this very
topic 'The long and Winding Road to 10Gbit+ in the home'
https://linux.conf.au/ at Linuxconf in January. In particular if you
have any home network gore/horror stories and photos you would be
happy for me to include in my talk, please include.
I am still waiting for a decently priced 10GE switch. I can get 1GE
24port
Post by Mikael Abrahamsson
managed ones, fanless, for 100-200USD. As soon as I go 10GE, price jumps
up
Post by Mikael Abrahamsson
a lot, and I get fans. The NICs aren't widely available, even though
they're
Post by Mikael Abrahamsson
not the biggest problem. My in-house cabling can do 10GE, but I guess
I'm an
Post by Mikael Abrahamsson
outlier.
--
_______________________________________________
Bloat mailing list
https://lists.bufferbloat.net/listinfo/bloat
--
Best regards / Mvh
Jan Pedro Tumusok
Joel Wirāmu Pauling
2017-12-04 10:47:23 UTC
Permalink
Bingo; that's definitely step one - gateways capable of 10gbit
becoming the norm.
Post by Pedro Tumusok
For in home or even SMB, I doubt that 10G to the user PC is the main use
case.
Its having the uplink capable of support of more than1G, that 1G does not
necessarily need to be generated by only one host on the LAN.
Pedro
Post by Joel Wirāmu Pauling
How to deliver a switch, when the wiring and port standard isn't
actually workable?
10GBase-T is out of Voltage Spec with SFP+ ; you can get copper SFP+
but they are out of spec... 10GbaseT doesn't really work over Cat5e
more than a couple of meters (if you are lucky) and even Cat6 is only
rated at 30M... there is a reason no-one is producing Home Copper
switches and it's not just the NIC Silicon cost (that was a factor
until Recently obviously, but only part of the equation).
Right now I am typing this via a 40gbit network, comprised of the
cheap and readily available Tb3 port - it's daisy chained and limited
to 6 ports, but right now it's easily the cheapest and most effective
port. Pitty that the fabled optical tb3 cables are damn expensive...
so you're limited to daisy-chains of 2m. They seem to have screwed the
pooch on the USB-C network standard quite badly - which looked so
promising, so for the moment Tb3 it is for me at least.
Post by Joel Wirāmu Pauling
I'm not going to pretend that 1Gig isn't enough for most people. But I
refuse to believe it's the networks equivalent of a 10A power (20A
depending on where you live in the world) AC residential phase
distribution circuit.
That's a good analogy. I actually believe it is, at least for the near 5-10
years.
Post by Joel Wirāmu Pauling
This isn't a question about what people need, it's more about what the
market can deliver. 10GPON (GPON-X) and others now make it a viable
service that can and is being deployed in residential and commercial
access networks.
Well, you're sharing that bw with everybody else on that splitter. Sounds to
me that the service being delivered over that would instead be in the 2-3
gigabit/s range for the individual subscriber (this is what I typically see
on equivalent shared mediums, that the top speed individual
subscriptions
are will be in the 20-40% of max theoretical speed the entire solution can
deliver).
Post by Joel Wirāmu Pauling
The problem is now that Retail Servicer Provider X can deliver a post
Gigabit service... what is capable of taking it off the ONU/CMNT point in
the home? As usual it's a follow the money question, once RSP's can deliver
Gbit+ they will need an ecosystem in the home to feed into it, and right now
there isn't a good technology platform that supports it;
10GBase-X/10GBaseT
is a non-starter due to the variability in home wiring - arguably the 7 year
leap from 100-1000mbit was easy It's mean a gap of 12 years and counting for
the same.. it's not just the NIC's and CPU's in the gateways it's the
connector and in-home wiring problems as well.
As soon as one goes above 1GE, prices increases A LOT on everything
involved. I doubt we'll see any 2.5G or higher speed equipment in wide use
in home/SME in the next 5 years.
Post by Joel Wirāmu Pauling
I'm interested to hear opinions on this as I have a talk on this very
topic 'The long and Winding Road to 10Gbit+ in the home'
https://linux.conf.au/ at Linuxconf in January. In particular if you
have any home network gore/horror stories and photos you would be
happy for me to include in my talk, please include.
I am still waiting for a decently priced 10GE switch. I can get 1GE 24port
managed ones, fanless, for 100-200USD. As soon as I go 10GE, price jumps up
a lot, and I get fans. The NICs aren't widely available, even though they're
not the biggest problem. My in-house cabling can do 10GE, but I guess I'm an
outlier.
--
_______________________________________________
Bloat mailing list
https://lists.bufferbloat.net/listinfo/bloat
--
Best regards / Mvh
Jan Pedro Tumusok
_______________________________________________
Cerowrt-devel mailing list
https://lists.bufferbloat.net/listinfo/cerowrt-devel
Pedro Tumusok
2017-12-04 10:57:34 UTC
Permalink
Looking at chipsets coming/just arrived from the chipset vendors, I think
we will see CPE with 10G SFP+ and 802.11ax Q3/Q4 this year.
Price is of course a bit steeper than the 15USD USB DSL modem :P, but
probably fits nicely for the SMB segment.

Pedro
Post by Joel Wirāmu Pauling
Bingo; that's definitely step one - gateways capable of 10gbit
becoming the norm.
Post by Pedro Tumusok
For in home or even SMB, I doubt that 10G to the user PC is the main use
case.
Its having the uplink capable of support of more than1G, that 1G does not
necessarily need to be generated by only one host on the LAN.
Pedro
Post by Joel Wirāmu Pauling
How to deliver a switch, when the wiring and port standard isn't
actually workable?
10GBase-T is out of Voltage Spec with SFP+ ; you can get copper SFP+
but they are out of spec... 10GbaseT doesn't really work over Cat5e
more than a couple of meters (if you are lucky) and even Cat6 is only
rated at 30M... there is a reason no-one is producing Home Copper
switches and it's not just the NIC Silicon cost (that was a factor
until Recently obviously, but only part of the equation).
Right now I am typing this via a 40gbit network, comprised of the
cheap and readily available Tb3 port - it's daisy chained and limited
to 6 ports, but right now it's easily the cheapest and most effective
port. Pitty that the fabled optical tb3 cables are damn expensive...
so you're limited to daisy-chains of 2m. They seem to have screwed the
pooch on the USB-C network standard quite badly - which looked so
promising, so for the moment Tb3 it is for me at least.
Post by Joel Wirāmu Pauling
I'm not going to pretend that 1Gig isn't enough for most people. But
I
Post by Pedro Tumusok
Post by Joel Wirāmu Pauling
Post by Joel Wirāmu Pauling
refuse to believe it's the networks equivalent of a 10A power (20A
depending on where you live in the world) AC residential phase
distribution circuit.
That's a good analogy. I actually believe it is, at least for the near 5-10
years.
Post by Joel Wirāmu Pauling
This isn't a question about what people need, it's more about what
the
Post by Pedro Tumusok
Post by Joel Wirāmu Pauling
Post by Joel Wirāmu Pauling
market can deliver. 10GPON (GPON-X) and others now make it a viable
service that can and is being deployed in residential and commercial
access networks.
Well, you're sharing that bw with everybody else on that splitter. Sounds to
me that the service being delivered over that would instead be in the 2-3
gigabit/s range for the individual subscriber (this is what I
typically
Post by Pedro Tumusok
Post by Joel Wirāmu Pauling
see
on equivalent shared mediums, that the top speed individual subscriptions
are will be in the 20-40% of max theoretical speed the entire solution can
deliver).
Post by Joel Wirāmu Pauling
The problem is now that Retail Servicer Provider X can deliver a post
Gigabit service... what is capable of taking it off the ONU/CMNT
point
Post by Pedro Tumusok
Post by Joel Wirāmu Pauling
Post by Joel Wirāmu Pauling
in
the home? As usual it's a follow the money question, once RSP's can deliver
Gbit+ they will need an ecosystem in the home to feed into it, and right now
there isn't a good technology platform that supports it;
10GBase-X/10GBaseT
is a non-starter due to the variability in home wiring - arguably
the 7
Post by Pedro Tumusok
Post by Joel Wirāmu Pauling
Post by Joel Wirāmu Pauling
year
leap from 100-1000mbit was easy It's mean a gap of 12 years and counting for
the same.. it's not just the NIC's and CPU's in the gateways it's the
connector and in-home wiring problems as well.
As soon as one goes above 1GE, prices increases A LOT on everything
involved. I doubt we'll see any 2.5G or higher speed equipment in wide use
in home/SME in the next 5 years.
Post by Joel Wirāmu Pauling
I'm interested to hear opinions on this as I have a talk on this very
topic 'The long and Winding Road to 10Gbit+ in the home'
https://linux.conf.au/ at Linuxconf in January. In particular if you
have any home network gore/horror stories and photos you would be
happy for me to include in my talk, please include.
I am still waiting for a decently priced 10GE switch. I can get 1GE 24port
managed ones, fanless, for 100-200USD. As soon as I go 10GE, price
jumps
Post by Pedro Tumusok
Post by Joel Wirāmu Pauling
up
a lot, and I get fans. The NICs aren't widely available, even though they're
not the biggest problem. My in-house cabling can do 10GE, but I guess I'm an
outlier.
--
_______________________________________________
Bloat mailing list
https://lists.bufferbloat.net/listinfo/bloat
--
Best regards / Mvh
Jan Pedro Tumusok
_______________________________________________
Cerowrt-devel mailing list
https://lists.bufferbloat.net/listinfo/cerowrt-devel
--
Best regards / Mvh
Jan Pedro Tumusok
Joel Wirāmu Pauling
2017-12-04 10:59:14 UTC
Permalink
Oh we have these in the Enterprise segment already. The main use case
is VNF on edge device for SDN applications right now. But even so the
range of vendors/devices is pretty limited.
Looking at chipsets coming/just arrived from the chipset vendors, I think we
will see CPE with 10G SFP+ and 802.11ax Q3/Q4 this year.
Price is of course a bit steeper than the 15USD USB DSL modem :P, but
probably fits nicely for the SMB segment.
Pedro
Post by Joel Wirāmu Pauling
Bingo; that's definitely step one - gateways capable of 10gbit
becoming the norm.
Post by Pedro Tumusok
For in home or even SMB, I doubt that 10G to the user PC is the main use
case.
Its having the uplink capable of support of more than1G, that 1G does not
necessarily need to be generated by only one host on the LAN.
Pedro
Post by Joel Wirāmu Pauling
How to deliver a switch, when the wiring and port standard isn't
actually workable?
10GBase-T is out of Voltage Spec with SFP+ ; you can get copper SFP+
but they are out of spec... 10GbaseT doesn't really work over Cat5e
more than a couple of meters (if you are lucky) and even Cat6 is only
rated at 30M... there is a reason no-one is producing Home Copper
switches and it's not just the NIC Silicon cost (that was a factor
until Recently obviously, but only part of the equation).
Right now I am typing this via a 40gbit network, comprised of the
cheap and readily available Tb3 port - it's daisy chained and limited
to 6 ports, but right now it's easily the cheapest and most effective
port. Pitty that the fabled optical tb3 cables are damn expensive...
so you're limited to daisy-chains of 2m. They seem to have screwed the
pooch on the USB-C network standard quite badly - which looked so
promising, so for the moment Tb3 it is for me at least.
Post by Mikael Abrahamsson
Post by Joel Wirāmu Pauling
I'm not going to pretend that 1Gig isn't enough for most people. But I
refuse to believe it's the networks equivalent of a 10A power (20A
depending on where you live in the world) AC residential phase
distribution circuit.
That's a good analogy. I actually believe it is, at least for the near
5-10
years.
Post by Joel Wirāmu Pauling
This isn't a question about what people need, it's more about what the
market can deliver. 10GPON (GPON-X) and others now make it a viable
service that can and is being deployed in residential and commercial
access networks.
Well, you're sharing that bw with everybody else on that splitter. Sounds to
me that the service being delivered over that would instead be in the 2-3
gigabit/s range for the individual subscriber (this is what I typically
see
on equivalent shared mediums, that the top speed individual subscriptions
are will be in the 20-40% of max theoretical speed the entire solution
can
deliver).
Post by Joel Wirāmu Pauling
The problem is now that Retail Servicer Provider X can deliver a post
Gigabit service... what is capable of taking it off the ONU/CMNT point
in
the home? As usual it's a follow the money question, once RSP's can deliver
Gbit+ they will need an ecosystem in the home to feed into it, and right now
there isn't a good technology platform that supports it; 10GBase-X/10GBaseT
is a non-starter due to the variability in home wiring - arguably the 7
year
leap from 100-1000mbit was easy It's mean a gap of 12 years and counting for
the same.. it's not just the NIC's and CPU's in the gateways it's the
connector and in-home wiring problems as well.
As soon as one goes above 1GE, prices increases A LOT on everything
involved. I doubt we'll see any 2.5G or higher speed equipment in wide
use
in home/SME in the next 5 years.
Post by Joel Wirāmu Pauling
I'm interested to hear opinions on this as I have a talk on this very
topic 'The long and Winding Road to 10Gbit+ in the home'
https://linux.conf.au/ at Linuxconf in January. In particular if you
have any home network gore/horror stories and photos you would be
happy for me to include in my talk, please include.
I am still waiting for a decently priced 10GE switch. I can get 1GE 24port
managed ones, fanless, for 100-200USD. As soon as I go 10GE, price jumps
up
a lot, and I get fans. The NICs aren't widely available, even though they're
not the biggest problem. My in-house cabling can do 10GE, but I guess I'm an
outlier.
--
_______________________________________________
Bloat mailing list
https://lists.bufferbloat.net/listinfo/bloat
--
Best regards / Mvh
Jan Pedro Tumusok
_______________________________________________
Cerowrt-devel mailing list
https://lists.bufferbloat.net/listinfo/cerowrt-devel
--
Best regards / Mvh
Jan Pedro Tumusok
_______________________________________________
Cerowrt-devel mailing list
https://lists.bufferbloat.net/listinfo/cerowrt-devel
Mikael Abrahamsson
2017-12-04 12:44:09 UTC
Permalink
Post by Pedro Tumusok
Looking at chipsets coming/just arrived from the chipset vendors, I think
we will see CPE with 10G SFP+ and 802.11ax Q3/Q4 this year.
Price is of course a bit steeper than the 15USD USB DSL modem :P, but
probably fits nicely for the SMB segment.
https://kb.netgear.com/31408/What-SFP-modules-are-compatible-with-my-Nighthawk-X10-R9000-router

This has been available for a while now. Only use-case I see for it is
Comcast 2 gigabit/s service, that's the only one I know of that would fit
this product (since it has no downlink 10GE ports).
--
Mikael Abrahamsson email: ***@swm.pp.se
d***@reed.com
2017-12-04 19:59:57 UTC
Permalink
I suggest we stop talking about throughput, which has been the mistaken idea about networking for 30-40 years.

Almost all networking ends up being about end-to-end response time in a multiplexed system.

Or put another way: "It's the Latency, Stupid".

I get (and have come to expect) 27 msec. RTT's under significant load, from Boston suburb to Sunnyvale, CA.

I get 2 microsecond RTT's within my house (using 10 GigE).

What will we expect tomorrow?

This is related to Bufferbloat, because queueing delay is just not a good thing in these contexts - contexts where Latency Matters. We provision multiplexed networks based on "peak capacity" never being reached.

Consequently, 1 Gig to the home is "table stakes". And in DOCSIS 3.1 deployments that is what is being delivered, cheap, today.

And 10 Gig within the home is becoming "table stakes", especially for applications that need quick response to human interaction.

1 NvME drive already delivers around 11 Gb/sec at its interface. That's what is needed in the network to "impedance match".

802.11ax already gives around 10 Gb/sec. wireless (and will be on the market soon).

The folks who think that having 1 Gb/sec to the home would only be important if you had to transfer at that rate 8 hours a day are just not thinking clearly about what "responsiveness" means.

For a different angle on this, think about what the desirable "channel change time" is if a company like Netflix were covering all the football (I mean US's soccer) games in the world. You'd like to fill the "buffer" in 100 msec. so channel change to some new channel is responsive. 100 msec. of 4K sports, which you are watching in "real time" needs to be buffered, and you want no more than a second or two of delay from camera to your screen. So buffering up 1 second of a newly selected 4 K video stream in 100 msec. on demand is why you need such speeds. Do the math.

VR sports coverage - even moreso.
Post by Mikael Abrahamsson
Post by Pedro Tumusok
Looking at chipsets coming/just arrived from the chipset vendors, I think
we will see CPE with 10G SFP+ and 802.11ax Q3/Q4 this year.
Price is of course a bit steeper than the 15USD USB DSL modem :P, but
probably fits nicely for the SMB segment.
https://kb.netgear.com/31408/What-SFP-modules-are-compatible-with-my-Nighthawk-X10-R9000-router
This has been available for a while now. Only use-case I see for it is
Comcast 2 gigabit/s service, that's the only one I know of that would fit
this product (since it has no downlink 10GE ports).
--
_______________________________________________
Cerowrt-devel mailing list
https://lists.bufferbloat.net/listinfo/cerowrt-devel
David Collier-Brown
2017-12-04 20:29:46 UTC
Permalink
Do you think that "RTT to San Francisco" is a clear enough, predictable
enough measure that we can use it in the context of non-technical users
and obfuscating salescritters?

--dave
who vaguely watched RTT to Charlottetown PEI, Vancouver and Washington
DC in a previous life
Post by d***@reed.com
I suggest we stop talking about throughput, which has been the
mistaken idea about networking for 30-40 years.
Almost all networking ends up being about end-to-end response time in a multiplexed system.
Or put another way: "It's the Latency, Stupid".
I get (and have come to expect) 27 msec. RTT's under significant load,
from Boston suburb to Sunnyvale, CA.
I get 2 microsecond RTT's within my house (using 10 GigE).
What will we expect tomorrow?
This is related to Bufferbloat, because queueing delay is just not a
good thing in these contexts - contexts where Latency Matters. We
provision multiplexed networks based on "peak capacity" never being
reached.
Consequently, 1 Gig to the home is "table stakes". And in DOCSIS 3.1
deployments that is what is being delivered, cheap, today.
And 10 Gig within the home is becoming "table stakes", especially for
applications that need quick response to human interaction.
1 NvME drive already delivers around 11 Gb/sec at its interface.
That's what is needed in the network to "impedance match".
802.11ax already gives around 10 Gb/sec. wireless (and will be on the market soon).
The folks who think that having 1 Gb/sec to the home would only be
important if you had to transfer at that rate 8 hours a day are just
not thinking clearly about what "responsiveness" means.
For a different angle on this, think about what the desirable "channel
change time" is if a company like Netflix were covering all the
football (I mean US's soccer) games in the world. You'd like to fill
the "buffer" in 100 msec. so channel change to some new channel is
responsive. 100 msec. of 4K sports, which you are watching in "real
time" needs to be buffered, and you want no more than a second or two
of delay from camera to your screen. So buffering up 1 second of a
newly selected 4 K video stream in 100 msec. on demand is why you need
such speeds. Do the math.
VR sports coverage - even moreso.
On Monday, December 4, 2017 7:44am, "Mikael Abrahamsson"
Post by Mikael Abrahamsson
Post by Pedro Tumusok
Looking at chipsets coming/just arrived from the chipset vendors,
I think
Post by Mikael Abrahamsson
Post by Pedro Tumusok
we will see CPE with 10G SFP+ and 802.11ax Q3/Q4 this year.
Price is of course a bit steeper than the 15USD USB DSL modem :P, but
probably fits nicely for the SMB segment.
https://kb.netgear.com/31408/What-SFP-modules-are-compatible-with-my-Nighthawk-X10-R9000-router
Post by Mikael Abrahamsson
This has been available for a while now. Only use-case I see for it is
Comcast 2 gigabit/s service, that's the only one I know of that
would fit
Post by Mikael Abrahamsson
this product (since it has no downlink 10GE ports).
--
_______________________________________________
Cerowrt-devel mailing list
https://lists.bufferbloat.net/listinfo/cerowrt-devel
_______________________________________________
Bloat mailing list
https://lists.bufferbloat.net/listinfo/bloat
--
David Collier-Brown, | Always do right. This will gratify
System Programmer and Author | some people and astonish the rest
***@spamcop.net | -- Mark Twain
Mikael Abrahamsson
2017-12-08 07:05:39 UTC
Permalink
Post by d***@reed.com
I suggest we stop talking about throughput, which has been the mistaken
idea about networking for 30-40 years.
We need to talk both about latency and speed. Yes, speed is talked about
too much (relative to RTT), but it's not irrelevant.

Speed of light in fiber means RTT is approx 1ms per 100km, so from
Stockholm to SFO my RTT is never going to be significantly below 85ms
(8625km great circle). It's current twice that.

So we just have to accept that some services will never be deliverable
across the wider Internet, but have to be deployed closer to the customer
(as per your examples, some need 1ms RTT to work well), and we need lower
access latency and lower queuing delay. So yes, agreed.

However, I am not going to concede that speed is "mistaken idea about
networking". No amount of smarter queuing is going to fix the problem if I
don't have enough throughput available to me that I need for my
application.
--
Mikael Abrahamsson email: ***@swm.pp.se
Luca Muscariello
2017-12-12 15:09:34 UTC
Permalink
I think everything is about response time, even throughput.

If we compare the time to transmit a single packet from A to B, including
propagation delay, transmission delay and queuing delay,
to the time to move a much larger amount of data from A to B we use
throughput in this second case because it is a normalized
quantity w.r.t. response time (bytes over delivery time). For a single
transmission we tend to use latency.
But in the end response time is what matters.

Also, even instantaneous throughput is well defined only for a time scale
which has to be much larger than the min RTT (propagation + transmission
delays)
Agree also that looking at video, latency and latency budgets are better
quantities than throughput. At least more accurate.
Post by d***@reed.com
I suggest we stop talking about throughput, which has been the mistaken
Post by d***@reed.com
idea about networking for 30-40 years.
We need to talk both about latency and speed. Yes, speed is talked about
too much (relative to RTT), but it's not irrelevant.
Speed of light in fiber means RTT is approx 1ms per 100km, so from
Stockholm to SFO my RTT is never going to be significantly below 85ms
(8625km great circle). It's current twice that.
So we just have to accept that some services will never be deliverable
across the wider Internet, but have to be deployed closer to the customer
(as per your examples, some need 1ms RTT to work well), and we need lower
access latency and lower queuing delay. So yes, agreed.
However, I am not going to concede that speed is "mistaken idea about
networking". No amount of smarter queuing is going to fix the problem if I
don't have enough throughput available to me that I need for my application.
--
_______________________________________________
Bloat mailing list
https://lists.bufferbloat.net/listinfo/bloat
Dave Taht
2017-12-12 18:36:55 UTC
Permalink
Post by Luca Muscariello
I think everything is about response time, even throughput.
If we compare the time to transmit a single packet from A to B, including
propagation delay, transmission delay and queuing delay,
to the time to move a much larger amount of data from A to B we use throughput
in this second case because it is a normalized
quantity w.r.t. response time (bytes over delivery time). For a single
transmission we tend to use latency.
But in the end response time is what matters.
Also, even instantaneous throughput is well defined only for a time scale which
has to be much larger than the min RTT (propagation + transmission delays)
Agree also that looking at video, latency and latency budgets are better
quantities than throughput. At least more accurate.
I suggest we stop talking about throughput, which has been the mistaken
idea about networking for 30-40 years.
We need to talk both about latency and speed. Yes, speed is talked about too
much (relative to RTT), but it's not irrelevant.
Speed of light in fiber means RTT is approx 1ms per 100km, so from Stockholm
to SFO my RTT is never going to be significantly below 85ms (8625km great
circle). It's current twice that.
So we just have to accept that some services will never be deliverable
across the wider Internet, but have to be deployed closer to the customer
(as per your examples, some need 1ms RTT to work well), and we need lower
access latency and lower queuing delay. So yes, agreed.
However, I am not going to concede that speed is "mistaken idea about
networking". No amount of smarter queuing is going to fix the problem if I
don't have enough throughput available to me that I need for my application.
In terms of the bellcurve here, throughput has increased much more
rapidly than than latency has decreased, for most, and in an increasing
majority of human-interactive cases (like video streaming), we often
have enough throughput.

And the age old argument regarding "just have overcapacity, always"
tends to work in these cases.

I tend not to care as much about how long it takes for things that do
not need R/T deadlines as humans and as steering wheels do.

Propigation delay, while ultimately bound by the speed of light, is also
affected by the wires wrapping indirectly around the earth - much slower
than would be possible if we worked at it:

https://arxiv.org/pdf/1505.03449.pdf

Then there's inside the boxes themselves:

A lot of my struggles of late has been to get latencies and adaquate
sampling techniques down below 3ms (my previous value for starting to
reject things due to having too much noise) - and despite trying fairly
hard, well... a process can't even sleep accurately much below 1ms, on
bare metal linux. A dream of mine has been 8 channel high quality audio,
with a video delay of not much more than 2.7ms for AR applications.

For comparison, an idle quad core aarch64 and dual core x86_64:

***@nanopineo2:~# irtt sleep

Testing sleep accuracy...

Sleep Duration Mean Error % Error

1ns 13.353µs 1335336.9

10ns 14.34µs 143409.5

100ns 13.343µs 13343.9

1µs 12.791µs 1279.2

10µs 148.661µs 1486.6

100µs 150.907µs 150.9

1ms 168.001µs 16.8

10ms 131.235µs 1.3

100ms 145.611µs 0.1

200ms 162.917µs 0.1

500ms 169.885µs 0.0


***@nemesis:~$ irtt sleep

Testing sleep accuracy...


Sleep Duration Mean Error % Error

1ns 668ns 66831.9

10ns 672ns 6723.7

100ns 557ns 557.6

1µs 57.749µs 5774.9

10µs 63.063µs 630.6

100µs 67.737µs 67.7

1ms 153.978µs 15.4

10ms 169.709µs 1.7

100ms 186.685µs 0.2

200ms 176.859µs 0.1

500ms 177.271µs 0.0
Post by Luca Muscariello
--
_______________________________________________
Bloat mailing list
https://lists.bufferbloat.net/listinfo/bloat
_______________________________________________
Bloat mailing list
https://lists.bufferbloat.net/listinfo/bloat
d***@reed.com
2017-12-12 22:53:50 UTC
Permalink
Luca's point tends to be correct - variable latency destroys the stability of flow control loops, which destroys throughput, even when there is sufficient capacity to handle the load.

This is an indirect result of Little's Lemma (which is strictly true only for Poisson arrival, but almost any arrival process will have a similar interaction between latency and throughput).

However, the other reason I say what I say so strongly is this:

Rant on.

Peak/avg. load ratio always exceeds a factor of 10 or more, IRL. Only "benchmark setups" (or hot-rod races done for academic reasons or marketing reasons to claim some sort of "title") operate at peak supportable load any significant part of the time.

The reason for this is not just "fat pipes are better", but because bitrate of the underlying medium is an insignificant fraction of systems operational and capital expense.

SLA's are specified in "uptime" not "bits transported", and a clogged pipe is defined as down when latency exceeds a small number.

Typical operating points of corporate networks where the users are happy are single-digit percentage of max load.

This is also true of computer buses and memory controllers and storage interfaces IRL. Again, latency is the primary measure, and the system never focuses on operating points anywhere near max throughput.

Rant off.
Post by Dave Taht
Post by Luca Muscariello
I think everything is about response time, even throughput.
If we compare the time to transmit a single packet from A to B, including
propagation delay, transmission delay and queuing delay,
to the time to move a much larger amount of data from A to B we use
throughput
Post by Luca Muscariello
in this second case because it is a normalized
quantity w.r.t. response time (bytes over delivery time). For a single
transmission we tend to use latency.
But in the end response time is what matters.
Also, even instantaneous throughput is well defined only for a time scale
which
Post by Luca Muscariello
has to be much larger than the min RTT (propagation + transmission delays)
Agree also that looking at video, latency and latency budgets are better
quantities than throughput. At least more accurate.
I suggest we stop talking about throughput, which has been the
mistaken
Post by Luca Muscariello
idea about networking for 30-40 years.
We need to talk both about latency and speed. Yes, speed is talked about
too
Post by Luca Muscariello
much (relative to RTT), but it's not irrelevant.
Speed of light in fiber means RTT is approx 1ms per 100km, so from
Stockholm
Post by Luca Muscariello
to SFO my RTT is never going to be significantly below 85ms (8625km
great
Post by Luca Muscariello
circle). It's current twice that.
So we just have to accept that some services will never be deliverable
across the wider Internet, but have to be deployed closer to the
customer
Post by Luca Muscariello
(as per your examples, some need 1ms RTT to work well), and we need
lower
Post by Luca Muscariello
access latency and lower queuing delay. So yes, agreed.
However, I am not going to concede that speed is "mistaken idea about
networking". No amount of smarter queuing is going to fix the problem if
I
Post by Luca Muscariello
don't have enough throughput available to me that I need for my
application.
In terms of the bellcurve here, throughput has increased much more
rapidly than than latency has decreased, for most, and in an increasing
majority of human-interactive cases (like video streaming), we often
have enough throughput.
And the age old argument regarding "just have overcapacity, always"
tends to work in these cases.
I tend not to care as much about how long it takes for things that do
not need R/T deadlines as humans and as steering wheels do.
Propigation delay, while ultimately bound by the speed of light, is also
affected by the wires wrapping indirectly around the earth - much slower
https://arxiv.org/pdf/1505.03449.pdf
A lot of my struggles of late has been to get latencies and adaquate
sampling techniques down below 3ms (my previous value for starting to
reject things due to having too much noise) - and despite trying fairly
hard, well... a process can't even sleep accurately much below 1ms, on
bare metal linux. A dream of mine has been 8 channel high quality audio,
with a video delay of not much more than 2.7ms for AR applications.
Testing sleep accuracy...
Sleep Duration Mean Error % Error
1ns 13.353µs 1335336.9
10ns 14.34µs 143409.5
100ns 13.343µs 13343.9
1µs 12.791µs 1279.2
10µs 148.661µs 1486.6
100µs 150.907µs 150.9
1ms 168.001µs 16.8
10ms 131.235µs 1.3
100ms 145.611µs 0.1
200ms 162.917µs 0.1
500ms 169.885µs 0.0
Testing sleep accuracy...
Sleep Duration Mean Error % Error
1ns 668ns 66831.9
10ns 672ns 6723.7
100ns 557ns 557.6
1µs 57.749µs 5774.9
10µs 63.063µs 630.6
100µs 67.737µs 67.7
1ms 153.978µs 15.4
10ms 169.709µs 1.7
100ms 186.685µs 0.2
200ms 176.859µs 0.1
500ms 177.271µs 0.0
Post by Luca Muscariello
--
_______________________________________________
Bloat mailing list
https://lists.bufferbloat.net/listinfo/bloat
_______________________________________________
Bloat mailing list
https://lists.bufferbloat.net/listinfo/bloat
Jonathan Morton
2017-12-12 23:20:02 UTC
Permalink
This is also true in the consumer space, and is the reason why ISPs can
save money by taking advantage of statistical multiplexing. On average, I
personally could be satisfied with a megabit, but it's a real pain to
download gigabyte-class software updates at that speed.

If it takes me literally days to download a game's beta versions, any
comments I might have will be stale by the time they can be heard, and in
the meantime my other uses of the internet have been seriously impaired.
It's much more useful if I can download the same data in an hour, and spend
the remaining time evaluating. So throughput is indeed a factor in
response time, once the size of the response is sufficiently large.

Occasionally, of course, practically everyone in the country wants to tune
into coverage of some event at the same time. More commonly, they simply
get home from work and school at the same time every day. That breaks the
assumptions behind pure statistical multiplexing, and requires a greater
provisioning factor.

- Jonathan Morton
Mikael Abrahamsson
2017-12-13 10:20:35 UTC
Permalink
Post by Jonathan Morton
Occasionally, of course, practically everyone in the country wants to
tune into coverage of some event at the same time. More commonly, they
simply get home from work and school at the same time every day. That
breaks the assumptions behind pure statistical multiplexing, and
requires a greater provisioning factor.
Reasonable operators have provisioning guidelines that look at actual
usage, although they probably look at it in 5 minute averages and not
millisecond as done here in this context.

So they might say "if busy hour average is over 50% 3 days in a week" this
will trigger a provisioning alarm for that link, and the person (or
system) will take a more detailed look and look at 5minute average graph
and decide if this needs to be upgraded or not.

For me the interesting point is always "what's going on in busy hour of
the day" and never "what's the monthly average transferred amount of
data".

Of course, this can hide subsecond bufferbloat extremely well (and has),
but at least this is typically how statistical overprovisioning is done.
You look at actual usage and make sure your network is never full for any
sustained amount of time, in normal operation, and make sure you perform
upgrades well before the growth has resulted in network being full.
--
Mikael Abrahamsson email: ***@swm.pp.se
Luca Muscariello
2017-12-13 10:45:54 UTC
Permalink
+1 on all.
Except that Little's Law is very general as it applies to any ergodic
process.
It just derives from the law of large numbers. And BTW, Little's law is a
very powerful law.
We use it unconsciously all the time.
Post by d***@reed.com
Luca's point tends to be correct - variable latency destroys the stability
of flow control loops, which destroys throughput, even when there is
sufficient capacity to handle the load.
This is an indirect result of Little's Lemma (which is strictly true only
for Poisson arrival, but almost any arrival process will have a similar
interaction between latency and throughput).
Rant on.
Peak/avg. load ratio always exceeds a factor of 10 or more, IRL. Only
"benchmark setups" (or hot-rod races done for academic reasons or marketing
reasons to claim some sort of "title") operate at peak supportable load any
significant part of the time.
The reason for this is not just "fat pipes are better", but because
bitrate of the underlying medium is an insignificant fraction of systems
operational and capital expense.
SLA's are specified in "uptime" not "bits transported", and a clogged pipe
is defined as down when latency exceeds a small number.
Typical operating points of corporate networks where the users are happy
are single-digit percentage of max load.
This is also true of computer buses and memory controllers and storage
interfaces IRL. Again, latency is the primary measure, and the system never
focuses on operating points anywhere near max throughput.
Rant off.
Post by Dave Taht
Post by Luca Muscariello
I think everything is about response time, even throughput.
If we compare the time to transmit a single packet from A to B,
including
Post by Dave Taht
Post by Luca Muscariello
propagation delay, transmission delay and queuing delay,
to the time to move a much larger amount of data from A to B we use
throughput
Post by Luca Muscariello
in this second case because it is a normalized
quantity w.r.t. response time (bytes over delivery time). For a single
transmission we tend to use latency.
But in the end response time is what matters.
Also, even instantaneous throughput is well defined only for a time
scale
Post by Dave Taht
which
Post by Luca Muscariello
has to be much larger than the min RTT (propagation + transmission
delays)
Post by Dave Taht
Post by Luca Muscariello
Agree also that looking at video, latency and latency budgets are
better
Post by Dave Taht
Post by Luca Muscariello
quantities than throughput. At least more accurate.
I suggest we stop talking about throughput, which has been the
mistaken
Post by Luca Muscariello
idea about networking for 30-40 years.
We need to talk both about latency and speed. Yes, speed is talked
about
Post by Dave Taht
too
Post by Luca Muscariello
much (relative to RTT), but it's not irrelevant.
Speed of light in fiber means RTT is approx 1ms per 100km, so from
Stockholm
Post by Luca Muscariello
to SFO my RTT is never going to be significantly below 85ms (8625km
great
Post by Luca Muscariello
circle). It's current twice that.
So we just have to accept that some services will never be deliverable
across the wider Internet, but have to be deployed closer to the
customer
Post by Luca Muscariello
(as per your examples, some need 1ms RTT to work well), and we need
lower
Post by Luca Muscariello
access latency and lower queuing delay. So yes, agreed.
However, I am not going to concede that speed is "mistaken idea about
networking". No amount of smarter queuing is going to fix the problem
if
Post by Dave Taht
I
Post by Luca Muscariello
don't have enough throughput available to me that I need for my
application.
In terms of the bellcurve here, throughput has increased much more
rapidly than than latency has decreased, for most, and in an increasing
majority of human-interactive cases (like video streaming), we often
have enough throughput.
And the age old argument regarding "just have overcapacity, always"
tends to work in these cases.
I tend not to care as much about how long it takes for things that do
not need R/T deadlines as humans and as steering wheels do.
Propigation delay, while ultimately bound by the speed of light, is also
affected by the wires wrapping indirectly around the earth - much slower
https://arxiv.org/pdf/1505.03449.pdf
A lot of my struggles of late has been to get latencies and adaquate
sampling techniques down below 3ms (my previous value for starting to
reject things due to having too much noise) - and despite trying fairly
hard, well... a process can't even sleep accurately much below 1ms, on
bare metal linux. A dream of mine has been 8 channel high quality audio,
with a video delay of not much more than 2.7ms for AR applications.
Testing sleep accuracy...
Sleep Duration Mean Error % Error
1ns 13.353µs 1335336.9
10ns 14.34µs 143409.5
100ns 13.343µs 13343.9
1µs 12.791µs 1279.2
10µs 148.661µs 1486.6
100µs 150.907µs 150.9
1ms 168.001µs 16.8
10ms 131.235µs 1.3
100ms 145.611µs 0.1
200ms 162.917µs 0.1
500ms 169.885µs 0.0
Testing sleep accuracy...
Sleep Duration Mean Error % Error
1ns 668ns 66831.9
10ns 672ns 6723.7
100ns 557ns 557.6
1µs 57.749µs 5774.9
10µs 63.063µs 630.6
100µs 67.737µs 67.7
1ms 153.978µs 15.4
10ms 169.709µs 1.7
100ms 186.685µs 0.2
200ms 176.859µs 0.1
500ms 177.271µs 0.0
Post by Luca Muscariello
--
_______________________________________________
Bloat mailing list
https://lists.bufferbloat.net/listinfo/bloat
_______________________________________________
Bloat mailing list
https://lists.bufferbloat.net/listinfo/bloat
Neil Davies
2017-12-13 15:26:50 UTC
Permalink
Post by d***@reed.com
Luca's point tends to be correct - variable latency destroys the stability of flow control loops, which destroys throughput, even when there is sufficient capacity to handle the load.
This is an indirect result of Little's Lemma (which is strictly true only for Poisson arrival, but almost any arrival process will have a similar interaction between latency and throughput).
Actually it is true for general arrival patterns (can’t lay my hands on the reference for the moment - but it was a while back that was shown) - what this points to is an underlying conservation law - that “delay and loss” are conserved in a scheduling process. This comes out of the M/M/1/K/K queueing system and associated analysis.

There is conservation law (and Klienrock refers to this - at least in terms of delay - in 1965 - http://onlinelibrary.wiley.com/doi/10.1002/nav.3800120206/abstract <http://onlinelibrary.wiley.com/doi/10.1002/nav.3800120206/abstract>) at work here.

All scheduling systems can do is “distribute” the resulting “delay and loss” differentially amongst the (instantaneous set of) competing streams.

Let me just repeat that - The “delay and loss” are a conserved quantity - scheduling can’t “destroy” it (they can influence higher level protocol behaviour) but not reduce the total amount of “delay and loss” that is being induced into the collective set of streams...
Post by d***@reed.com
Rant on.
Peak/avg. load ratio always exceeds a factor of 10 or more, IRL. Only "benchmark setups" (or hot-rod races done for academic reasons or marketing reasons to claim some sort of "title") operate at peak supportable load any significant part of the time.
Have you considered what this means for the economics of the operation of networks? What other industry that “moves things around” (i.e logistical or similar) system creates a solution in which they have 10x as much infrastructure than their peak requirement?
Post by d***@reed.com
The reason for this is not just "fat pipes are better", but because bitrate of the underlying medium is an insignificant fraction of systems operational and capital expense.
Agree that (if you are the incumbent that ‘owns’ the low level transmission medium) that this is true (though the costs of lighting a new lambda are not trivial) - but that is not the experience of anyone else in the digital supply time
Post by d***@reed.com
SLA's are specified in "uptime" not "bits transported", and a clogged pipe is defined as down when latency exceeds a small number.
Do you have any evidence you can reference for an SLA that treats a few ms as “down”? Most of the SLAs I’ve had dealings with use averages over fairly long time periods (e.g. a month) - and there is no quality in averages.
Post by d***@reed.com
Typical operating points of corporate networks where the users are happy are single-digit percentage of max load.
Or less - they also detest the costs that they have to pay the network providers to try and de-risk their applications. There is also the issue that they measure averages (over 5min to 15min) they completely fail to capture (for example) the 15seconds when delay and jitter was high so the CEO’s video conference broke up.
Post by d***@reed.com
This is also true of computer buses and memory controllers and storage interfaces IRL. Again, latency is the primary measure, and the system never focuses on operating points anywhere near max throughput.
Agreed - but wouldn’t it be nice if they could? I’ve worked on h/w systems where we have designed system to run near limits (the set-top box market is pretty cut-throat and the closer to saturation you can run and still deliver the acceptable outcome the cheaper the box the greater the profit margin for the set-top box provider)
Post by d***@reed.com
Rant off.
Cheers

Neil
Post by d***@reed.com
Post by Dave Taht
Post by Luca Muscariello
I think everything is about response time, even throughput.
If we compare the time to transmit a single packet from A to B, including
propagation delay, transmission delay and queuing delay,
to the time to move a much larger amount of data from A to B we use
throughput
Post by Luca Muscariello
in this second case because it is a normalized
quantity w.r.t. response time (bytes over delivery time). For a single
transmission we tend to use latency.
But in the end response time is what matters.
Also, even instantaneous throughput is well defined only for a time scale
which
Post by Luca Muscariello
has to be much larger than the min RTT (propagation + transmission delays)
Agree also that looking at video, latency and latency budgets are better
quantities than throughput. At least more accurate.
I suggest we stop talking about throughput, which has been the
mistaken
Post by Luca Muscariello
idea about networking for 30-40 years.
We need to talk both about latency and speed. Yes, speed is talked about
too
Post by Luca Muscariello
much (relative to RTT), but it's not irrelevant.
Speed of light in fiber means RTT is approx 1ms per 100km, so from
Stockholm
Post by Luca Muscariello
to SFO my RTT is never going to be significantly below 85ms (8625km
great
Post by Luca Muscariello
circle). It's current twice that.
So we just have to accept that some services will never be deliverable
across the wider Internet, but have to be deployed closer to the
customer
Post by Luca Muscariello
(as per your examples, some need 1ms RTT to work well), and we need
lower
Post by Luca Muscariello
access latency and lower queuing delay. So yes, agreed.
However, I am not going to concede that speed is "mistaken idea about
networking". No amount of smarter queuing is going to fix the problem if
I
Post by Luca Muscariello
don't have enough throughput available to me that I need for my
application.
In terms of the bellcurve here, throughput has increased much more
rapidly than than latency has decreased, for most, and in an increasing
majority of human-interactive cases (like video streaming), we often
have enough throughput.
And the age old argument regarding "just have overcapacity, always"
tends to work in these cases.
I tend not to care as much about how long it takes for things that do
not need R/T deadlines as humans and as steering wheels do.
Propigation delay, while ultimately bound by the speed of light, is also
affected by the wires wrapping indirectly around the earth - much slower
https://arxiv.org/pdf/1505.03449.pdf
A lot of my struggles of late has been to get latencies and adaquate
sampling techniques down below 3ms (my previous value for starting to
reject things due to having too much noise) - and despite trying fairly
hard, well... a process can't even sleep accurately much below 1ms, on
bare metal linux. A dream of mine has been 8 channel high quality audio,
with a video delay of not much more than 2.7ms for AR applications.
Testing sleep accuracy...
Sleep Duration Mean Error % Error
1ns 13.353µs 1335336.9
10ns 14.34µs 143409.5
100ns 13.343µs 13343.9
1µs 12.791µs 1279.2
10µs 148.661µs 1486.6
100µs 150.907µs 150.9
1ms 168.001µs 16.8
10ms 131.235µs 1.3
100ms 145.611µs 0.1
200ms 162.917µs 0.1
500ms 169.885µs 0.0
Testing sleep accuracy...
Sleep Duration Mean Error % Error
1ns 668ns 66831.9
10ns 672ns 6723.7
100ns 557ns 557.6
1µs 57.749µs 5774.9
10µs 63.063µs 630.6
100µs 67.737µs 67.7
1ms 153.978µs 15.4
10ms 169.709µs 1.7
100ms 186.685µs 0.2
200ms 176.859µs 0.1
500ms 177.271µs 0.0
Post by Luca Muscariello
--
_______________________________________________
Bloat mailing list
https://lists.bufferbloat.net/listinfo/bloat
_______________________________________________
Bloat mailing list
https://lists.bufferbloat.net/listinfo/bloat
Spam <https://portal.roaringpenguin.co.uk/canit/b.php?c=s&i=03UJaRTkO&m=5027f7184ff5&rlm=pnsol-com&t=20171212>
Not spam <https://portal.roaringpenguin.co.uk/canit/b.php?c=n&i=03UJaRTkO&m=5027f7184ff5&rlm=pnsol-com&t=20171212>
Forget previous vote <https://portal.roaringpenguin.co.uk/canit/b.php?c=f&i=03UJaRTkO&m=5027f7184ff5&rlm=pnsol-com&t=20171212>
_______________________________________________
Bloat mailing list
https://lists.bufferbloat.net/listinfo/bloat <https://lists.bufferbloat.net/listinfo/bloat>
Jonathan Morton
2017-12-13 16:41:16 UTC
Permalink
Post by Neil Davies
Have you considered what this means for the economics of the operation of
networks? What other industry that “moves things around” (i.e logistical or
similar) system creates a solution in which they have 10x as much
infrastructure than their peak requirement?

Ten times peak demand? No.

Ten times average demand estimated at time of deployment, and struggling
badly with peak demand a decade later, yes. And this is the transportation
industry, where a decade is a *short* time - like less than a year in
telecoms.

- Jonathan Morton
Post by Neil Davies
Luca's point tends to be correct - variable latency destroys the stability
of flow control loops, which destroys throughput, even when there is
sufficient capacity to handle the load.
This is an indirect result of Little's Lemma (which is strictly true only
for Poisson arrival, but almost any arrival process will have a similar
interaction between latency and throughput).
Actually it is true for general arrival patterns (can’t lay my hands on
the reference for the moment - but it was a while back that was shown) -
what this points to is an underlying conservation law - that “delay and
loss” are conserved in a scheduling process. This comes out of the
M/M/1/K/K queueing system and associated analysis.
There is conservation law (and Klienrock refers to this - at least in
terms of delay - in 1965 - http://onlinelibrary.wiley.com/doi/10.1002/nav.
3800120206/abstract) at work here.
All scheduling systems can do is “distribute” the resulting “delay and
loss” differentially amongst the (instantaneous set of) competing streams.
Let me just repeat that - The “delay and loss” are a conserved quantity -
scheduling can’t “destroy” it (they can influence higher level protocol
behaviour) but not reduce the total amount of “delay and loss” that is
being induced into the collective set of streams...
Rant on.
Peak/avg. load ratio always exceeds a factor of 10 or more, IRL. Only
"benchmark setups" (or hot-rod races done for academic reasons or marketing
reasons to claim some sort of "title") operate at peak supportable load any
significant part of the time.
Have you considered what this means for the economics of the operation of
networks? What other industry that “moves things around” (i.e logistical or
similar) system creates a solution in which they have 10x as much
infrastructure than their peak requirement?
The reason for this is not just "fat pipes are better", but because
bitrate of the underlying medium is an insignificant fraction of systems
operational and capital expense.
Agree that (if you are the incumbent that ‘owns’ the low level
transmission medium) that this is true (though the costs of lighting a new
lambda are not trivial) - but that is not the experience of anyone else in
the digital supply time
SLA's are specified in "uptime" not "bits transported", and a clogged pipe
is defined as down when latency exceeds a small number.
Do you have any evidence you can reference for an SLA that treats a few ms
as “down”? Most of the SLAs I’ve had dealings with use averages over fairly
long time periods (e.g. a month) - and there is no quality in averages.
Typical operating points of corporate networks where the users are happy
are single-digit percentage of max load.
Or less - they also detest the costs that they have to pay the network
providers to try and de-risk their applications. There is also the issue
that they measure averages (over 5min to 15min) they completely fail to
capture (for example) the 15seconds when delay and jitter was high so the
CEO’s video conference broke up.
This is also true of computer buses and memory controllers and storage
interfaces IRL. Again, latency is the primary measure, and the system never
focuses on operating points anywhere near max throughput.
Agreed - but wouldn’t it be nice if they could? I’ve worked on h/w systems
where we have designed system to run near limits (the set-top box market is
pretty cut-throat and the closer to saturation you can run and still
deliver the acceptable outcome the cheaper the box the greater the profit
margin for the set-top box provider)
Rant off.
Cheers
Neil
Post by Dave Taht
Post by Luca Muscariello
I think everything is about response time, even throughput.
If we compare the time to transmit a single packet from A to B,
including
Post by Dave Taht
Post by Luca Muscariello
propagation delay, transmission delay and queuing delay,
to the time to move a much larger amount of data from A to B we use
throughput
Post by Luca Muscariello
in this second case because it is a normalized
quantity w.r.t. response time (bytes over delivery time). For a single
transmission we tend to use latency.
But in the end response time is what matters.
Also, even instantaneous throughput is well defined only for a time
scale
Post by Dave Taht
which
Post by Luca Muscariello
has to be much larger than the min RTT (propagation + transmission
delays)
Post by Dave Taht
Post by Luca Muscariello
Agree also that looking at video, latency and latency budgets are
better
Post by Dave Taht
Post by Luca Muscariello
quantities than throughput. At least more accurate.
I suggest we stop talking about throughput, which has been the
mistaken
Post by Luca Muscariello
idea about networking for 30-40 years.
We need to talk both about latency and speed. Yes, speed is talked
about
Post by Dave Taht
too
Post by Luca Muscariello
much (relative to RTT), but it's not irrelevant.
Speed of light in fiber means RTT is approx 1ms per 100km, so from
Stockholm
Post by Luca Muscariello
to SFO my RTT is never going to be significantly below 85ms (8625km
great
Post by Luca Muscariello
circle). It's current twice that.
So we just have to accept that some services will never be deliverable
across the wider Internet, but have to be deployed closer to the
customer
Post by Luca Muscariello
(as per your examples, some need 1ms RTT to work well), and we need
lower
Post by Luca Muscariello
access latency and lower queuing delay. So yes, agreed.
However, I am not going to concede that speed is "mistaken idea about
networking". No amount of smarter queuing is going to fix the problem
if
Post by Dave Taht
I
Post by Luca Muscariello
don't have enough throughput available to me that I need for my
application.
In terms of the bellcurve here, throughput has increased much more
rapidly than than latency has decreased, for most, and in an increasing
majority of human-interactive cases (like video streaming), we often
have enough throughput.
And the age old argument regarding "just have overcapacity, always"
tends to work in these cases.
I tend not to care as much about how long it takes for things that do
not need R/T deadlines as humans and as steering wheels do.
Propigation delay, while ultimately bound by the speed of light, is also
affected by the wires wrapping indirectly around the earth - much slower
https://arxiv.org/pdf/1505.03449.pdf
A lot of my struggles of late has been to get latencies and adaquate
sampling techniques down below 3ms (my previous value for starting to
reject things due to having too much noise) - and despite trying fairly
hard, well... a process can't even sleep accurately much below 1ms, on
bare metal linux. A dream of mine has been 8 channel high quality audio,
with a video delay of not much more than 2.7ms for AR applications.
Testing sleep accuracy...
Sleep Duration Mean Error % Error
1ns 13.353µs 1335336.9
10ns 14.34µs 143409.5
100ns 13.343µs 13343.9
1µs 12.791µs 1279.2
10µs 148.661µs 1486.6
100µs 150.907µs 150.9
1ms 168.001µs 16.8
10ms 131.235µs 1.3
100ms 145.611µs 0.1
200ms 162.917µs 0.1
500ms 169.885µs 0.0
Testing sleep accuracy...
Sleep Duration Mean Error % Error
1ns 668ns 66831.9
10ns 672ns 6723.7
100ns 557ns 557.6
1µs 57.749µs 5774.9
10µs 63.063µs 630.6
100µs 67.737µs 67.7
1ms 153.978µs 15.4
10ms 169.709µs 1.7
100ms 186.685µs 0.2
200ms 176.859µs 0.1
500ms 177.271µs 0.0
Post by Luca Muscariello
--
_______________________________________________
Bloat mailing list
https://lists.bufferbloat.net/listinfo/bloat
_______________________________________________
Bloat mailing list
https://lists.bufferbloat.net/listinfo/bloat
------------------------------
Spam
<https://portal.roaringpenguin.co.uk/canit/b.php?c=s&i=03UJaRTkO&m=5027f7184ff5&rlm=pnsol-com&t=20171212>
Not spam
<https://portal.roaringpenguin.co.uk/canit/b.php?c=n&i=03UJaRTkO&m=5027f7184ff5&rlm=pnsol-com&t=20171212>
Forget previous vote
<https://portal.roaringpenguin.co.uk/canit/b.php?c=f&i=03UJaRTkO&m=5027f7184ff5&rlm=pnsol-com&t=20171212>
_______________________________________________
Bloat mailing list
https://lists.bufferbloat.net/listinfo/bloat
_______________________________________________
Bloat mailing list
https://lists.bufferbloat.net/listinfo/bloat
d***@reed.com
2017-12-13 18:08:14 UTC
Permalink
Just to be clear, I have built and operated a whole range of network platforms, as well as diagnosing problems and planning deployments of systems that include digital packet delivery in real contexts where cost and performance matter, for nearly 40 years now. So this isn't only some kind of radical opinion, but hard-won knowledge across my entire career. I also havea very strong theoretical background in queueing theory and control theory -- enough to teach a graduate seminar, anyway.
That said, there are lots of folks out there who have opinions different than mine. But far too many (such as those who think big buffers are "good", who brought us bufferbloat) are not aware of how networks are really used or the practical effects of their poor models of usage.

If it comforts you to think that I am just stating an "opinion", which must be wrong because it is not the "conventional wisdom" in the circles where you travel, fine. You are entitled to dismiss any ideas you don't like. But I would suggest you get data about your assumptions.

I don't know if I'm being trolled, but a couple of comments on the recent comments:

1. Statistical multiplexing viewed as an averaging/smoothing as an idea is, in my personal opinion and experience measuring real network behavior, a description of a theoretical phenomenon that is not real (e.g. "consider a spherical cow") that is amenable to theoretical analysis. Such theoretical analysis can make some gross estimates, but it breaks down quickly. The same thing is true of common economic theory that models practical markets by linear models (linear systems of differential equations are common) and gaussian probability distributions (gaussians are easily analyzed, but wrong. You can read the popular books by Nassim Taleb for an entertaining and enlightening deeper understanding of the economic problems with such modeling).

One of the features well observed in real measurements of real systems is that packet flows are "fractal", which means that there is a self-similarity of rate variability all time scales from micro to macro. As you look at smaller and smaller time scales, or larger and larger time scales, the packet request density per unit time never smooths out due to "averaging over sources". That is, there's no practical "statistical multiplexing" effect. There's also significant correlation among many packet arrivals - assuming they are statistically independent (which is required for the "law of large numbers" to apply) is often far from the real situation - flows that are assumed to be independent are usually strongly coupled.

The one exception where flows average out at a constant rate is when there is a "bottleneck". Then, there being no more capacity, the constant rate is forced, not by statistical averaging but by a very different process. One that is almost never desirable.

This is just what is observed in case after case. Designers may imagine that their networks have "smooth averaging" properties. There's a strong thread in networking literature that makes this pretty-much-always-false assumption the basis of protocol designs, thinking about "Quality of Service" and other sorts of things. You can teach graduate students about a reality that does not exist, and get papers accepted in conferences where the reviewers have been trained in the same tradition of unreal assumptions.

2. I work every day with "datacenter" networking and distributed systems on 10 GigE and faster Ethernet fabrics with switches and trunking. I see the packet flows driven by distributed computing in real systems. Whenever the sustained peak load on a switch path reaches 100%, that's not "good", that's not "efficient" resource usage. That is a situation where computing is experiencing huge wasted capacity due to network congestion that is dramatically slowing down the desired workload.

Again this is because *real workloads* in distributed computation don't have smooth or averagable rates over interconnects. Latency is everything in that application too!

Yes, because one buys switches from vendors who don't know how to build or operate a server or a database at all, you see vendors trying to demonstrate their amazing throughput, but the people who build these systems (me, for example) are not looking at throughput or statistical multiplexing at all! We use "throughput" as a proxy for "latency under load". (and it is a poor proxy! Because vendors throw in big buffers, causing bufferbloat. See Arista Networks' attempts to justify their huge buffers as a "good thing" -- when it is just a case of something you have to design around by clocking the packets so they never accumulate in a buffer).

So, yes, the peak transfer rate matters, of course. And sometimes it is utilized for very good reason (when the latency of a file transfer as a whole is the latency that matters). But to be clear, just because as a user I want to download a Linux distro update as quickly as possible when it happens does NOT imply that the average load at any time scale is "statistically averaged" for residential networking. Quite the opposite! I buy Gigabit service to my house because I cannot predict when I will need it, but I almost never need it. My average rate (except once a month or so) is miniscule. This is true even though my house is a heavy user of Netflix.

The way that Gigbit residential service affects my "quality of service" is almost entirely that I get good "response time" to unpredictable demands. How quickly a Netflix stream can fill its play buffer is the measure. The data rate of any Netflix stream is, on average much, much less than a Gigabit. Buffers in the network would ruin my Netflix experience, because the buffering is best done at the "edge" as the End-to-End argument usually suggests. It's certainly NOT because of statistical multiplexing.

So when you are tempted to talk about "statistical multiplexing" smoothing out traffic flow take a pause and think about whether that really makes sense as a description of reality.

fq_codel is a good thing because it handles the awkward behavior at "peak load". It smooths out the impact of running out of resources. But that impact is still undesirable - if many Netflix flows are adding up to peak load, a new Netflix flow can't start very quickly. That results in terrible QoS from a Netflix user's point of view.
Post by Neil Davies
Have you considered what this means for the economics of the operation of networks? What other industry that “moves things around” (i.e logistical or similar) system creates a solution in which they have 10x as much infrastructure than their peak requirement?
Ten times peak demand? No.
Ten times average demand estimated at time of deployment, and struggling badly with peak demand a decade later, yes. And this is the transportation industry, where a decade is a *short* time - like less than a year in telecoms.
- Jonathan Morton


On 13 Dec 2017 17:27, "Neil Davies" <[ ***@pnsol.com ]( mailto:***@pnsol.com )> wrote:




On 12 Dec 2017, at 22:53, [ ***@reed.com ]( mailto:***@reed.com ) wrote:

Luca's point tends to be correct - variable latency destroys the stability of flow control loops, which destroys throughput, even when there is sufficient capacity to handle the load.

This is an indirect result of Little's Lemma (which is strictly true only for Poisson arrival, but almost any arrival process will have a similar interaction between latency and throughput).
Actually it is true for general arrival patterns (can’t lay my hands on the reference for the moment - but it was a while back that was shown) - what this points to is an underlying conservation law - that “delay and loss” are conserved in a scheduling process. This comes out of the M/M/1/K/K queueing system and associated analysis.
There is conservation law (and Klienrock refers to this - at least in terms of delay - in 1965 - [ http://onlinelibrary.wiley.com/doi/10.1002/nav.3800120206/abstract ]( http://onlinelibrary.wiley.com/doi/10.1002/nav.3800120206/abstract )) at work here.
All scheduling systems can do is “distribute” the resulting “delay and loss” differentially amongst the (instantaneous set of) competing streams.
Let me just repeat that - The “delay and loss” are a conserved quantity - scheduling can’t “destroy” it (they can influence higher level protocol behaviour) but not reduce the total amount of “delay and loss” that is being induced into the collective set of streams...



However, the other reason I say what I say so strongly is this:

Rant on.

Peak/avg. load ratio always exceeds a factor of 10 or more, IRL. Only "benchmark setups" (or hot-rod races done for academic reasons or marketing reasons to claim some sort of "title") operate at peak supportable load any significant part of the time.
Have you considered what this means for the economics of the operation of networks? What other industry that “moves things around” (i.e logistical or similar) system creates a solution in which they have 10x as much infrastructure than their peak requirement?



The reason for this is not just "fat pipes are better", but because bitrate of the underlying medium is an insignificant fraction of systems operational and capital expense.
Agree that (if you are the incumbent that ‘owns’ the low level transmission medium) that this is true (though the costs of lighting a new lambda are not trivial) - but that is not the experience of anyone else in the digital supply time



SLA's are specified in "uptime" not "bits transported", and a clogged pipe is defined as down when latency exceeds a small number.
Do you have any evidence you can reference for an SLA that treats a few ms as “down”? Most of the SLAs I’ve had dealings with use averages over fairly long time periods (e.g. a month) - and there is no quality in averages.



Typical operating points of corporate networks where the users are happy are single-digit percentage of max load.
Or less - they also detest the costs that they have to pay the network providers to try and de-risk their applications. There is also the issue that they measure averages (over 5min to 15min) they completely fail to capture (for example) the 15seconds when delay and jitter was high so the CEO’s video conference broke up.



This is also true of computer buses and memory controllers and storage interfaces IRL. Again, latency is the primary measure, and the system never focuses on operating points anywhere near max throughput.
Agreed - but wouldn’t it be nice if they could? I’ve worked on h/w systems where we have designed system to run near limits (the set-top box market is pretty cut-throat and the closer to saturation you can run and still deliver the acceptable outcome the cheaper the box the greater the profit margin for the set-top box provider)



Rant off.


Cheers
Neil
Post by Neil Davies
Post by Luca Muscariello
I think everything is about response time, even throughput.
If we compare the time to transmit a single packet from A to B, including
propagation delay, transmission delay and queuing delay,
to the time to move a much larger amount of data from A to B we use
throughput
Post by Luca Muscariello
in this second case because it is a normalized
quantity w.r.t. response time (bytes over delivery time). For a single
transmission we tend to use latency.
But in the end response time is what matters.
Also, even instantaneous throughput is well defined only for a time scale
which
Post by Luca Muscariello
has to be much larger than the min RTT (propagation + transmission delays)
Agree also that looking at video, latency and latency budgets are better
quantities than throughput. At least more accurate.
I suggest we stop talking about throughput, which has been the
mistaken
Post by Luca Muscariello
idea about networking for 30-40 years.
We need to talk both about latency and speed. Yes, speed is talked about
too
Post by Luca Muscariello
much (relative to RTT), but it's not irrelevant.
Speed of light in fiber means RTT is approx 1ms per 100km, so from
Stockholm
Post by Luca Muscariello
to SFO my RTT is never going to be significantly below 85ms (8625km
great
Post by Luca Muscariello
circle). It's current twice that.
So we just have to accept that some services will never be deliverable
across the wider Internet, but have to be deployed closer to the
customer
Post by Luca Muscariello
(as per your examples, some need 1ms RTT to work well), and we need
lower
Post by Luca Muscariello
access latency and lower queuing delay. So yes, agreed.
However, I am not going to concede that speed is "mistaken idea about
networking". No amount of smarter queuing is going to fix the problem if
I
Post by Luca Muscariello
don't have enough throughput available to me that I need for my
application.
In terms of the bellcurve here, throughput has increased much more
rapidly than than latency has decreased, for most, and in an increasing
majority of human-interactive cases (like video streaming), we often
have enough throughput.
And the age old argument regarding "just have overcapacity, always"
tends to work in these cases.
I tend not to care as much about how long it takes for things that do
not need R/T deadlines as humans and as steering wheels do.
Propigation delay, while ultimately bound by the speed of light, is also
affected by the wires wrapping indirectly around the earth - much slower
[ https://arxiv.org/pdf/1505.03449.pdf ]( https://arxiv.org/pdf/1505.03449.pdf )
A lot of my struggles of late has been to get latencies and adaquate
sampling techniques down below 3ms (my previous value for starting to
reject things due to having too much noise) - and despite trying fairly
hard, well... a process can't even sleep accurately much below 1ms, on
bare metal linux. A dream of mine has been 8 channel high quality audio,
with a video delay of not much more than 2.7ms for AR applications.
Testing sleep accuracy...
Sleep Duration Mean Error % Error
1ns 13.353µs 1335336.9
10ns 14.34µs 143409.5
100ns 13.343µs 13343.9
1µs 12.791µs 1279.2
10µs 148.661µs 1486.6
100µs 150.907µs 150.9
1ms 168.001µs 16.8
10ms 131.235µs 1.3
100ms 145.611µs 0.1
200ms 162.917µs 0.1
500ms 169.885µs 0.0
Testing sleep accuracy...
Sleep Duration Mean Error % Error
1ns 668ns 66831.9
10ns 672ns 6723.7
100ns 557ns 557.6
1µs 57.749µs 5774.9
10µs 63.063µs 630.6
100µs 67.737µs 67.7
1ms 153.978µs 15.4
10ms 169.709µs 1.7
100ms 186.685µs 0.2
200ms 176.859µs 0.1
500ms 177.271µs 0.0
Post by Luca Muscariello
--
_______________________________________________
Bloat mailing list
[ https://lists.bufferbloat.net/listinfo/bloat ]( https://lists.bufferbloat.net/listinfo/bloat )
_______________________________________________
Bloat mailing list
[ https://lists.bufferbloat.net/listinfo/bloat ]( https://lists.bufferbloat.net/listinfo/bloat )
[ Spam ]( https://portal.roaringpenguin.co.uk/canit/b.php?c=s&i=03UJaRTkO&m=5027f7184ff5&rlm=pnsol-com&t=20171212 )
[ Not spam ]( https://portal.roaringpenguin.co.uk/canit/b.php?c=n&i=03UJaRTkO&m=5027f7184ff5&rlm=pnsol-com&t=20171212 )
[ Forget previous vote ]( https://portal.roaringpenguin.co.uk/canit/b.php?c=f&i=03UJaRTkO&m=5027f7184ff5&rlm=pnsol-com&t=20171212 )
_______________________________________________Bloat mailing list[ ***@lists.bufferbloat.net ]( mailto:***@lists.bufferbloat.net )[ https://lists.bufferbloat.net/listinfo/bloat ]( https://lists.bufferbloat.net/listinfo/bloat )
Neil Davies
2017-12-13 19:55:29 UTC
Permalink
Please - my email was not an intention to troll - I wanted to establish a dialogue, I am sorry if I’ve offended.
Post by d***@reed.com
Just to be clear, I have built and operated a whole range of network platforms, as well as diagnosing problems and planning deployments of systems that include digital packet delivery in real contexts where cost and performance matter, for nearly 40 years now. So this isn't only some kind of radical opinion, but hard-won knowledge across my entire career. I also havea very strong theoretical background in queueing theory and control theory -- enough to teach a graduate seminar, anyway.
I accept that - if we are laying out bona fides, I have acted as thesis advisor to people working in this area over 20 years, and I continue to work with network operators, system designers and research organisations (mainly in the EU) in this area.
Post by d***@reed.com
That said, there are lots of folks out there who have opinions different than mine. But far too many (such as those who think big buffers are "good", who brought us bufferbloat) are not aware of how networks are really used or the practical effects of their poor models of usage.
If it comforts you to think that I am just stating an "opinion", which must be wrong because it is not the "conventional wisdom" in the circles where you travel, fine. You are entitled to dismiss any ideas you don't like. But I would suggest you get data about your assumptions.
1. Statistical multiplexing viewed as an averaging/smoothing as an idea is, in my personal opinion and experience measuring real network behavior, a description of a theoretical phenomenon that is not real (e.g. "consider a spherical cow") that is amenable to theoretical analysis. Such theoretical analysis can make some gross estimates, but it breaks down quickly. The same thing is true of common economic theory that models practical markets by linear models (linear systems of differential equations are common) and gaussian probability distributions (gaussians are easily analyzed, but wrong. You can read the popular books by Nassim Taleb for an entertaining and enlightening deeper understanding of the economic problems with such modeling).
I would fully accept that seeing statistical (or perhaps, better named, stochastic) multiplexing as an averaging process is vast over simplification of the complexity. However, I see the underlying mathematics as capturing a much richer description(s), for example of the transient behaviour - queuing theory (in its usual undergraduate formulation) tends to gloss over the edge / extreme conditions as well as dealing with non-stationary arrival phenomena (such as can occur in the presence of adaptive protocols).

For example - one approach to solve the underlying Markov Chain systems (as the operational semantic representation of a queueing system) is to represent them as transition matrices and then “solve” those matrices for steady state [as you probably know - think of that as backstory for the interested reader].

We’ve used such transition matrices to examine “relaxation times” of queueing / scheduling algorithms - i.e given that a buffer has filled, how quickly will the system relax back towards “steady state”. There are assumptions behind this, of course, but viewing the buffer state as a probability distribution and seeing how that distribution evolves after, say, an impulse change in load helps at lot to generate new approaches.

Cards on the table - I don’t see networks as a (purely) natural phenomena (as, say, chemistry or physics) - but as a more mathematical one. Queuing systems are (relatively) simple automata being pushed through their states by (non-stationary but broadly characterisable in stochastic terms) arrivals and departures (which are not so stochastically varied as they are related to the actual packet sizes.). There are rules to that mathematical game imposed by real-world physics, but there are other ways of constructing (and configuring) the actions of those automata to create “better” solutions (for various types of “better”).
Post by d***@reed.com
One of the features well observed in real measurements of real systems is that packet flows are "fractal", which means that there is a self-similarity of rate variability all time scales from micro to macro. As you look at smaller and smaller time scales, or larger and larger time scales, the packet request density per unit time never smooths out due to "averaging over sources". That is, there's no practical "statistical multiplexing" effect. There's also significant correlation among many packet arrivals - assuming they are statistically independent (which is required for the "law of large numbers" to apply) is often far from the real situation - flows that are assumed to be independent are usually strongly coupled.
I remember this debate and its evolution, Hurst parameters and all that. I also understand that a collection of on/off Poisson sources looks fractal - I found that “the universe if fractal - live with it” ethos of limited practical use (except to help people say it was not solvable). When I saw those results the question I asked myself (because not seeing them a “natural” phenomena) "what is the right way to interact with the traffic patterns to regain acceptable levels of mathematical understanding?” - i.e what is the right intervention.

I agree that flows become coupled - every time two flows share a common path/resource they have that potential, the strength of that coupling and how to decouple them is what is useful to understand. It does not take much “randomness” (i.e perturbation of streams arrival patterns) to radically reduce that coupling - thankfully such randomness tends to occur due to issues of differential path length (hence delay).

Must admit I like randomness (in limited amounts) - it is very useful - CDMA is just one example of such.
Post by d***@reed.com
The one exception where flows average out at a constant rate is when there is a "bottleneck". Then, there being no more capacity, the constant rate is forced, not by statistical averaging but by a very different process. One that is almost never desirable.
This is just what is observed in case after case. Designers may imagine that their networks have "smooth averaging" properties. There's a strong thread in networking literature that makes this pretty-much-always-false assumption the basis of protocol designs, thinking about "Quality of Service" and other sorts of things. You can teach graduate students about a reality that does not exist, and get papers accepted in conferences where the reviewers have been trained in the same tradition of unreal assumptions.
Agreed - there is a massive disconnect between a lot of the literature (and the people who make their living generating it - [to those people, please don’t take offence, queueing theory is really useful it is just the real world is a lot more non-stationary than you model]) and reality.
Post by d***@reed.com
2. I work every day with "datacenter" networking and distributed systems on 10 GigE and faster Ethernet fabrics with switches and trunking. I see the packet flows driven by distributed computing in real systems. Whenever the sustained peak load on a switch path reaches 100%, that's not "good", that's not "efficient" resource usage. That is a situation where computing is experiencing huge wasted capacity due to network congestion that is dramatically slowing down the desired workload.
Imagine that there were two flows - one that required low latency (e.g a real-time response as it was part of a large distributed computation) and other flows that could make useful progress if they suffered the delay (and to some extent, the loss effects) of the other traffic.

If the operational scenario you are working in consists of “mono service” (as you describe above) then there is no room for any differential service - I would content that (as important as data centres style systems are) they are not a universal phenomenon.

It is my understanding that Google uses this two tier notion to get high utilisation from their network interconnects while still preserving the performance of their services. I see large scale (i.e. public internets) not as a mono-service but as a “poly service” - there are multiple demands for timeliness etc that exist out there for “real services”.
Post by d***@reed.com
Again this is because *real workloads* in distributed computation don't have smooth or averagable rates over interconnects. Latency is everything in that application too!
Yep - understand that - designed and built large scale message passing supercomputers in the ‘80s and ‘90s - even wrote a book on how to construct, measure and analyse their interconnects. Still have 70+ Inmos transputers (and the cross-bar switching infrastructure) in the garage.
Post by d***@reed.com
Yes, because one buys switches from vendors who don't know how to build or operate a server or a database at all, you see vendors trying to demonstrate their amazing throughput, but the people who build these systems (me, for example) are not looking at throughput or statistical multiplexing at all! We use "throughput" as a proxy for "latency under load". (and it is a poor proxy! Because vendors throw in big buffers, causing bufferbloat. See Arista Networks' attempts to justify their huge buffers as a "good thing" -- when it is just a case of something you have to design around by clocking the packets so they never accumulate in a buffer).
Again - we are in violent agreement - this is the (misguided) belief of product managers that “more is better” - so they put more and more buffering in to their systems
Post by d***@reed.com
So, yes, the peak transfer rate matters, of course. And sometimes it is utilized for very good reason (when the latency of a file transfer as a whole is the latency that matters). But to be clear, just because as a user I want to download a Linux distro update as quickly as possible when it happens does NOT imply that the average load at any time scale is "statistically averaged" for residential networking. Quite the opposite! I buy Gigabit service to my house because I cannot predict when I will need it, but I almost never need it. My average rate (except once a month or so) is miniscule. This is true even though my house is a heavy user of Netflix.
Again - violent agreement - what matters is “the outcome”; bulk data transport is just one case (and, unfortunately the one that appears most frequently in those papers mentioned above); what the Netflix user is interested in is “probability of buffering event per watched hour” or “time to first frame being displayed”

Take heart - you are really not alone here, there are plenty of people in the Telecoms industry that understand this (engineering, not marketing or senior management). What has happened is that people have been sold “top speed” and others (like the Googles and Netflix of this world) are _extremely_ worried that if the transport quality of their data suffers their business models disappear.

Capacity planning this is difficult - undressing the behavioural dynamics of (application level) demand is what is needed. This is a large weakness in the planning of the digital supply chains of today.
Post by d***@reed.com
The way that Gigbit residential service affects my "quality of service" is almost entirely that I get good "response time" to unpredictable demands. How quickly a Netflix stream can fill its play buffer is the measure. The data rate of any Netflix stream is, on average much, much less than a Gigabit. Buffers in the network would ruin my Netflix experience, because the buffering is best done at the "edge" as the End-to-End argument usually suggests. It's certainly NOT because of statistical multiplexing.
Not quite as violent agreement here - Netflix (once streaming) is not that sensitive to delay - a burst of a 100ms-500ms for a second or so does not put their key outcome (assuring that the payout buffer does not empty) at too much risk.

We’ve worked with people who have created risks for Netflix delivery (accidentally I might add - they though they were doing “the right thing”) by increasing their network infrastructure to 100G delivery everywhere. That change (combined with others made by CDN people - TCP offload engines) created so much non-stationarity in the load so as to cause delay and loss spikes that *did* cause VoD playout buffers to empty. This is an example of where “more capacity” produced worse outcomes.

This is still a pretty young industry - plenty of room for new original research out there (but for those paper creators reading this out there - step away from the TCP bulk streams, they are not the thing that is really interesting, the dynamic behavioural aspects are much more interesting to mine for new papers)
Post by d***@reed.com
So when you are tempted to talk about "statistical multiplexing" smoothing out traffic flow take a pause and think about whether that really makes sense as a description of reality.
I see “trad” statistical multiplexing as the way that the industry has conned itself into creating (probably) unsustainable delivery models - it has put itself on a “keep building bigger” approach to just stand still - all because it doesn’t face up to issues of managing “delay and loss” coherently. The inherent two-degrees of freedom and the fact that such attenuation is conserved.
Post by d***@reed.com
fq_codel is a good thing because it handles the awkward behavior at "peak load". It smooths out the impact of running out of resources. But that impact is still undesirable - if many Netflix flows are adding up to peak load, a new Netflix flow can't start very quickly. That results in terrible QoS from a Netflix user's point of view.
I would suggest that there are other ways of dealing with the impact of “peak” (i.e where instantaneous demand exceeds supply over a long enough timescale to start effecting the most delay/loss sensitive application in the collective multiplexed stream). I would also agree that if all the streams are of the same “bound on delay and loss” requirements (i.e *all* Netflix) then 100%+ of all the same load (over, again the appropriate timescale - which for Netflix VoD in streaming is about 20s to 30s) then end-user disappointment is the only thing that can occur.

Again, not intended to troll - I think we are agreeing that current (as per most literature / received wisdom) have just about run their course - my assertion is that mathematics needed is out there (it is _not_ traditional queueing theory - but does spring from similar roots).

Cheers

Neil
Post by d***@reed.com
Post by Neil Davies
Have you considered what this means for the economics of the operation of networks? What other industry that “moves things around” (i.e logistical or similar) system creates a solution in which they have 10x as much infrastructure than their peak requirement?
Ten times peak demand? No.
Ten times average demand estimated at time of deployment, and struggling badly with peak demand a decade later, yes. And this is the transportation industry, where a decade is a *short* time - like less than a year in telecoms.
- Jonathan Morton
Luca's point tends to be correct - variable latency destroys the stability of flow control loops, which destroys throughput, even when there is sufficient capacity to handle the load.
This is an indirect result of Little's Lemma (which is strictly true only for Poisson arrival, but almost any arrival process will have a similar interaction between latency and throughput).
Actually it is true for general arrival patterns (can’t lay my hands on the reference for the moment - but it was a while back that was shown) - what this points to is an underlying conservation law - that “delay and loss” are conserved in a scheduling process. This comes out of the M/M/1/K/K queueing system and associated analysis.
There is conservation law (and Klienrock refers to this - at least in terms of delay - in 1965 - http://onlinelibrary.wiley.com/doi/10.1002/nav.3800120206/abstract <http://onlinelibrary.wiley.com/doi/10.1002/nav.3800120206/abstract>) at work here.
All scheduling systems can do is “distribute” the resulting “delay and loss” differentially amongst the (instantaneous set of) competing streams.
Let me just repeat that - The “delay and loss” are a conserved quantity - scheduling can’t “destroy” it (they can influence higher level protocol behaviour) but not reduce the total amount of “delay and loss” that is being induced into the collective set of streams...
Rant on.
Peak/avg. load ratio always exceeds a factor of 10 or more, IRL. Only "benchmark setups" (or hot-rod races done for academic reasons or marketing reasons to claim some sort of "title") operate at peak supportable load any significant part of the time.
Have you considered what this means for the economics of the operation of networks? What other industry that “moves things around” (i.e logistical or similar) system creates a solution in which they have 10x as much infrastructure than their peak requirement?
The reason for this is not just "fat pipes are better", but because bitrate of the underlying medium is an insignificant fraction of systems operational and capital expense.
Agree that (if you are the incumbent that ‘owns’ the low level transmission medium) that this is true (though the costs of lighting a new lambda are not trivial) - but that is not the experience of anyone else in the digital supply time
SLA's are specified in "uptime" not "bits transported", and a clogged pipe is defined as down when latency exceeds a small number.
Do you have any evidence you can reference for an SLA that treats a few ms as “down”? Most of the SLAs I’ve had dealings with use averages over fairly long time periods (e.g. a month) - and there is no quality in averages.
Typical operating points of corporate networks where the users are happy are single-digit percentage of max load.
Or less - they also detest the costs that they have to pay the network providers to try and de-risk their applications. There is also the issue that they measure averages (over 5min to 15min) they completely fail to capture (for example) the 15seconds when delay and jitter was high so the CEO’s video conference broke up.
This is also true of computer buses and memory controllers and storage interfaces IRL. Again, latency is the primary measure, and the system never focuses on operating points anywhere near max throughput.
Agreed - but wouldn’t it be nice if they could? I’ve worked on h/w systems where we have designed system to run near limits (the set-top box market is pretty cut-throat and the closer to saturation you can run and still deliver the acceptable outcome the cheaper the box the greater the profit margin for the set-top box provider)
Rant off.
Cheers
Neil
Post by Neil Davies
Post by Luca Muscariello
I think everything is about response time, even throughput.
If we compare the time to transmit a single packet from A to B, including
propagation delay, transmission delay and queuing delay,
to the time to move a much larger amount of data from A to B we use
throughput
Post by Luca Muscariello
in this second case because it is a normalized
quantity w.r.t. response time (bytes over delivery time). For a single
transmission we tend to use latency.
But in the end response time is what matters.
Also, even instantaneous throughput is well defined only for a time scale
which
Post by Luca Muscariello
has to be much larger than the min RTT (propagation + transmission delays)
Agree also that looking at video, latency and latency budgets are better
quantities than throughput. At least more accurate.
I suggest we stop talking about throughput, which has been the
mistaken
Post by Luca Muscariello
idea about networking for 30-40 years.
We need to talk both about latency and speed. Yes, speed is talked about
too
Post by Luca Muscariello
much (relative to RTT), but it's not irrelevant.
Speed of light in fiber means RTT is approx 1ms per 100km, so from
Stockholm
Post by Luca Muscariello
to SFO my RTT is never going to be significantly below 85ms (8625km
great
Post by Luca Muscariello
circle). It's current twice that.
So we just have to accept that some services will never be deliverable
across the wider Internet, but have to be deployed closer to the
customer
Post by Luca Muscariello
(as per your examples, some need 1ms RTT to work well), and we need
lower
Post by Luca Muscariello
access latency and lower queuing delay. So yes, agreed.
However, I am not going to concede that speed is "mistaken idea about
networking". No amount of smarter queuing is going to fix the problem if
I
Post by Luca Muscariello
don't have enough throughput available to me that I need for my
application.
In terms of the bellcurve here, throughput has increased much more
rapidly than than latency has decreased, for most, and in an increasing
majority of human-interactive cases (like video streaming), we often
have enough throughput.
And the age old argument regarding "just have overcapacity, always"
tends to work in these cases.
I tend not to care as much about how long it takes for things that do
not need R/T deadlines as humans and as steering wheels do.
Propigation delay, while ultimately bound by the speed of light, is also
affected by the wires wrapping indirectly around the earth - much slower
https://arxiv.org/pdf/1505.03449.pdf <https://arxiv.org/pdf/1505.03449.pdf>
A lot of my struggles of late has been to get latencies and adaquate
sampling techniques down below 3ms (my previous value for starting to
reject things due to having too much noise) - and despite trying fairly
hard, well... a process can't even sleep accurately much below 1ms, on
bare metal linux. A dream of mine has been 8 channel high quality audio,
with a video delay of not much more than 2.7ms for AR applications.
Testing sleep accuracy...
Sleep Duration Mean Error % Error
1ns 13.353µs 1335336.9
10ns 14.34µs 143409.5
100ns 13.343µs 13343.9
1µs 12.791µs 1279.2
10µs 148.661µs 1486.6
100µs 150.907µs 150.9
1ms 168.001µs 16.8
10ms 131.235µs 1.3
100ms 145.611µs 0.1
200ms 162.917µs 0.1
500ms 169.885µs 0.0
Testing sleep accuracy...
Sleep Duration Mean Error % Error
1ns 668ns 66831.9
10ns 672ns 6723.7
100ns 557ns 557.6
1µs 57.749µs 5774.9
10µs 63.063µs 630.6
100µs 67.737µs 67.7
1ms 153.978µs 15.4
10ms 169.709µs 1.7
100ms 186.685µs 0.2
200ms 176.859µs 0.1
500ms 177.271µs 0.0
Post by Luca Muscariello
--
_______________________________________________
Bloat mailing list
https://lists.bufferbloat.net/listinfo/bloat <https://lists.bufferbloat.net/listinfo/bloat>
_______________________________________________
Bloat mailing list
https://lists.bufferbloat.net/listinfo/bloat <https://lists.bufferbloat.net/listinfo/bloat>
_______________________________________________
Bloat mailing list
https://lists.bufferbloat.net/listinfo/bloat <https://lists.bufferbloat.net/listinfo/bloat>
_______________________________________________
Bloat mailing list
https://lists.bufferbloat.net/listinfo/bloat <https://lists.bufferbloat.net/listinfo/bloat>
Jonathan Morton
2017-12-13 21:06:15 UTC
Permalink
(Awesome development - I have a computer with a sane e-mail client again. One that doesn’t assume I want to top-post if I quote anything at all, *and* lets me type with an actual keyboard. Luxury!)
Post by d***@reed.com
One of the features well observed in real measurements of real systems is that packet flows are "fractal", which means that there is a self-similarity of rate variability all time scales from micro to macro.
I remember this debate and its evolution, Hurst parameters and all that. I also understand that a collection of on/off Poisson sources looks fractal - I found that “the universe if fractal - live with it” ethos of limited practical use (except to help people say it was not solvable).
Post by d***@reed.com
Designers may imagine that their networks have "smooth averaging" properties. There's a strong thread in networking literature that makes this pretty-much-always-false assumption the basis of protocol designs, thinking about "Quality of Service" and other sorts of things. You can teach graduate students about a reality that does not exist, and get papers accepted in conferences where the reviewers have been trained in the same tradition of unreal assumptions.
Agreed - there is a massive disconnect between a lot of the literature (and the people who make their living generating it - [to those people, please don’t take offence, queueing theory is really useful it is just the real world is a lot more non-stationary than you model]) and reality.
Probably a lot of theoreticians would be horrified at the extent to which I ignored mathematics and relied on intuition (and observations of real traffic, ie. eating my own dogfood) while building Cake.

That approach, however, led me to some novel algorithms and combinations thereof which seem to work well in practice, as well as to some practical observations about the present state of the Internet. I’ve also used some contributions from others, but only where they made sense at an intuitive level.

However, Cake isn’t designed to work in the datacentre. Nor is it likely to work optimally in an ISP’s core networks. The combination of features in Cake is not optimised for those environments, rather for last-mile links which are typically the bottlenecks experienced by ordinary Internet users. Some of Cake's algorithms could reasonably be reused in a different combination for a different environment.
I see large scale (i.e. public internets) not as a mono-service but as a “poly service” - there are multiple demands for timeliness etc that exist out there for “real services”.
This is definitely true. However, the biggest problem I’ve noticed is with distinguishing these traffic types from each other. In some cases there are heuristics which are accurate enough to be useful. In others, there are not. Rarely is the distinction *explicitly* marked in any way, and some common protocols explicitly obscure themselves due to historical mistreatment.

Diffserv is very hard to use in practice. There’s a controversial fact for you to chew on.
We’ve worked with people who have created risks for Netflix delivery (accidentally I might add - they though they were doing “the right thing”) by increasing their network infrastructure to 100G delivery everywhere. That change (combined with others made by CDN people - TCP offload engines) created so much non-stationarity in the load so as to cause delay and loss spikes that *did* cause VoD playout buffers to empty. This is an example of where “more capacity” produced worse outcomes.
That’s an interesting and counter-intuitive result. I’ll hazard a guess that it had something to do with burst loss in dumb tail-drop FIFOs? Offload engines tend to produce extremely bursty traffic which - with a nod to another thread presently ongoing - makes a mockery of any ack-clocking or pacing which TCP designers normally assume is in effect.

One of the things that fq_codel and Cake can do really well is to take a deep queue full of consecutive line-rate bursts and turn them into interleaved packet streams, which are at least slightly better “paced” than the originals. They also specifically try to avoid burst loss and (at least in Cake’s case) tail loss.

It is of course regrettable that this behaviour conflicts with the assumptions of most network acceleration hardware, and that maximum throughput might therefore be compromised. The *qualitative* behaviour is however improved.
I would suggest that there are other ways of dealing with the impact of “peak” (i.e where instantaneous demand exceeds supply over a long enough timescale to start effecting the most delay/loss sensitive application in the collective multiplexed stream).
Such as signalling to the traffic that congestion exists, and to please slow down a bit to make room? ECN and AQM are great ways of doing that, especially in combination with flow isolation - the latter shares out the capacity fairly on short timescales, *and* avoids the need to signal congestion to flows which are already using less than their fair share.
I would also agree that if all the streams are of the same “bound on delay and loss” requirements (i.e *all* Netflix) then 100%+ of all the same load (over, again the appropriate timescale - which for Netflix VoD in streaming is about 20s to 30s) then end-user disappointment is the only thing that can occur.
I think emphasising the importance of measurement timescales is consistently underrated in the industry and in academia alike. An hour-long bucket of traffic tells you about a very different set of characteristics than a millisecond-long bucket, and there are several timescales between those extremes of great practical interest.

- Jonathan Morton
Mikael Abrahamsson
2017-12-14 08:22:20 UTC
Permalink
Post by Jonathan Morton
Ten times average demand estimated at time of deployment, and struggling
badly with peak demand a decade later, yes. And this is the
transportation industry, where a decade is a *short* time - like less
than a year in telecoms.
I've worked in ISPs since 1999 or so. I've been at startups and I've been
at established ISPs.

It's kind of an S curve when it comes to traffic growth, when you're
adding customers you can easily see 100%-300% growth per year (or more).
Then after market becomes saturated growth comes from per-customer
increased usage, and for the past 20 years or so, this has been in the
neighbourhood of 20-30% per year.

Running a network that congests parts of the day, it's hard to tell what
"Quality of Experience" your customers will have. I've heard of horror
stories from the 90ties where a then large US ISP was running an OC3 (155
megabit/s) full most of the day. So someone said "oh, we need to upgrade
this", and after a while, they did, to 2xOC3. Great, right? No, after that
upgrade both OC3:s were completely congested. Ok, then upgrade to OC12
(622 megabit/s). After that upgrade, evidently that link was not congested
a few hours of the day, and of course needed more upgrades.

So at the places I've been, I've advocated for planning rules that say
that when the link is peaking at 5 minute averages of more than 50% of
link capacity, then upgrade needs to be ordered. This 50% number can be
larger if the link aggregates larger number of customers, because
typically your "statistical overbooking" varies less the more customers
participates.

These devices do not do per-flow anything. They might have 10G or 100G
link to/from it with many many millions of flows, and it's all NPU
forwarding. Typically they might do DIFFserv-based queueing and WRED to
mitigate excessive buffering. Today, they typically don't even do ECN
marking (which I have advocated for, but there is not much support from
other ISPs in this mission).

Now, on the customer access line it's a completely different matter.
Typically people build with BRAS or similar, where (tens of) thousands of
customers might sit on a (very expensive) access card with hundreds of
thousands of queues per NPU. This still leaves just a few queues per
customer, unfortunately. So these do not do per-flow anything either. This
is where PIE comes in, because these devices like these can do PIE in the
NPU fairly easily because it's kind of like WRED.

So back to the capacity issue. Since these devices typically aren't good
at assuring per-customer access to the shared medium (backbone links),
it's easier to just make sure the backbone links are not regularily full.
This doesn't mean you're going to have 10x capacity all the time, it
probably means you're going to be bouncing between 25-70% utilization of
your links (for the normal case, because you need spare capacity to handle
events that increase traffic temporarily, plus handle loss of capacity in
case of a link fault). The upgrade might be to add another link, or a
higher tier speed interface, bringing down the utilization to typically
half or quarter of what you had before.
--
Mikael Abrahamsson email: ***@swm.pp.se
Benjamin Cronce
2017-12-17 21:37:28 UTC
Permalink
This is an interesting topic to me. Over the past 5+ years, I've been
reading about GPON fiber aggregators(GPON chassis for lack of a proper
term) with 400Gb-1Tb/s of uplink, 1-2Tb/s line-cards, and enough GPON ports
for several thousand customers.

When my current ISP started rolling out fiber(all of it underground, no
above-ground fiber), I called support during a graveyard hour on the
weekend, and I got a senior network admin answering the phone instead of
normal tech support. When talking to him, I asked him what they claimed by
"guaranteed" bandwidth. I guess I should mention that my ISP claims
dedicated bandwidth for everyone. He told me that they played with
over-subscription for a while, but it just resulted in complex situations
that caused customers to complain. Complaining customers are expensive
because they eat up support phone time. They eventually went to a
non-oversubscribed flat model. He told me that the GPON chassis plugs
strait into the core router. I asked him about GPON port shared bandwidth
and the GPON uplink. He said they will not over-subscribe a GPON port, so
all ONTs on the port can use 100% of their provisioned rate, and they will
not place more provision bandwidth on a single GPON chassis than what they
uplink can support.

For the longest time, their max sold bandwidth was 50Mb/s. After some time,
they were having some issues resulting in packet-loss during peak hours.
Turned out their old core router could not support all of the new customers
in the ARP cache and was causing massive amounts of broadcasted packets. I
actually helped them solve this issue. They had me work with a hired
consulting service that was having issues diagnosing the problem, much
because of the older hardware not supporting modern diagnostic features.
They fixed the problem by upgrading the core router. Because I was already
in contact with them during this issue, I was made privy that their new
core router could handle about 10Tb/s with a lot of room for 100Gb+ ports.
No exact details, but told their slowest internal link was now 100Gb.

Their core router actually had traffic shaping and an AQM built in. They
switched from using ONT rate limiting for provisioning to letting the core
router handle provisioning. I can actually see 1Gb bursts as their shaping
seems to be like a sliding window over a few tens of ms. I have actually
tested their AQM a bit via a DOS testing service. At the time, I had a
100Mb/100Mb service, and externally flooding my connection with 110Mb/s
resulted in about 10% packetloss, but my ping stayed under 20ms. I tried
200Mb/s for about 20 seconds, which resulted in about 50% loss and still
~20ms pings. For about 10 seconds I tested 1Gb/s DOS and had about 90%
loss(not a long time to sample, but was sampled at a rate of 10pps against
their speed-test server), but 20-40ms pings. I tested this during off
hours, like 1am.

A few months after the upgrade, I got upgraded to a 100Mb connection with
no change in price and several new higher tiers were added, all the way up
to 1Gb/s. I asked them about this. Yes, the 1Gb tier was also not
over-subscribed. I'm not sure if some lone customer pretty much got their
own GPON port or they had some WDM-PON linecards.

I'm currently paying about $40/m for 150/150 for a "dedicated" connection.
I'm currently getting about 1ms+-0.1ms pings to my ISP's speedtest server
24/7. If I do a ping flood, I can get my avg ping down near 0.12ms. I
assume this is because of GPON scheduling. Of course I only test this
against their speedtest server and during off hours.

As for the trunk, I've also talked to them about that, at least in the past
and I can't speak for more current times. They had 3 trunks, 2 to Level 3
Chicago and one to Global Crossing Minnesota. I was told each link was a
paired link for immediate fail-over. I was told that in some cases, they've
bonded the links, primarily due to DDOS attacks, to quickly double their
bandwidth. Their GX link was their fail-over and the two Chicago Level 3
links were the load balanced primaries. Based on trace-routes, they seemed
to be load-balanced by some lower-bits in the IP address. This gave a total
of 6 links. The network admin told me that any given link had enough
bandwidth provisioned, that if all 5 other links were down, that one link
would have a 95th percentile below 80% during peak hours, and customers
should be completely unaffected.

They've been advertising guaranteed dedicated bandwidth for over 15 years
now. They recently had a marketing campaign against the local incumbent
where they poked fun at them for only selling "up to" bandwidth. This went
on for at least a year. They openly advertised that their bandwidth was not
"up to", but that customers will always get all of their bandwidth all the
time. In the small print it said "to their transit provider". In short, my
ISP is claiming I should always get my provisioned bandwidth to Level 3
24/7. As far as I have cared to measure, this is true. At one point I had a
month long ping of 2 pps against AWS Frunkfurt. ~140ms avg, ~139ms min,
~std-dev 3ms, max ping of ~160ms, and fewer than 100 lost packets. 6-12ms
to Chicago, depending on which link, 30-35ms to New York City depending on
the link, 90ms to London, and 110 to Paris. Interesting note, AWS Frankfurt
was only claiming about 6 hops from Midwest USA. That's impressive.

Back when I was load testing my 100Mb connection, I queued up a bunch of
well seeded large Linux ISOs, and downloaded to my SSDs. Between my traffic
shaping via pfSense and my ISP's unknown AQM, I averaged 99.8Mb/s with a
max of 99.5Mb/s and a min of 99.7Mb/s, sampled over a 1.5 hour window from
8:30p to 10p. Those averages were as reported by pfSense in 1min slices. 0
ping packets lost to my ISP with no ping more than ~10ms and the
avg/std-dev was identical to idle to with-in 0.1ms. When doing the DDOS,
pfSense reported exactly 100.0Mb/s hitting the WAN with zero dips.

In short, if I wanted to, I could purchase a 500/500 "dedicated" connection
for $110/m, plus tax but no other fees, free install, passive
point-to-point self-healing ring back to the CO from my house, and a /29
static block for an additional $10/m, and told I can do do web-hosting, but
no SLA, even though I get near perfect connectivity and single digit
minutes 1a-2a yearly downtime.

This is all from a local private ISP that openly brags that they do no
accept any government grants, loans, or other subsidies. My ISP is about
120 years old and started off as a telegraph service. I've gotten the
feeling that fast dedicated bandwidth is cheap and easy, assuming you're an
established ISP that doesn't have to fight through red-tape. We've got
farmers with 1Gb/1Gb dedicated fiber connections, all without government
support.

About 3 years ago I was reading about petabit core routers with 1Tb/s ports
and single-fiber ~40Tb/s multiplexers. Recently I heard that 100Gb PON with
2.5Tb/s of bandwidth is already partially working in labs, with an expected
cost not much more than current day XG2-PON, which is what... 300Gb/s or so
split among 32 customers?. As far as I can tell, last mile bandwidth is a
solved problem short of incompetence, greed, or extreme circumstances.

Ahh yes. Statistical over-subscription was the topic. This works well for
backbone providers where they have many peering links with a heavy mix of
flows. Level 3 has a blog where they were showing off a 10Gb link where
below the 95th percentile, the link had zero total packets lost and a
queuing delay of less than 0.1ms. But above 80%, suddenly loss and jitter
went up with a hockey-stick curve. Then they showed a 400Gb link. It was at
98% utilization for the 95th percentile and it had zero total packets lost
and a max queuing delay of 0.01ms with an average of 0.00ms.

There was a major European IX that had a blog about bandwidth planning and
over-provisioning. They had a 95th percentile in the many-terabits, and
they said they said they could always predict peak bandwidth to within 1%
for any given day. Given a large mix of flow types, statistics is very good.

On a slightly different topic, I wonder what trunk providers are using for
AQMs. My ISP was under a massive DDOS some time in the past year and I use
a Level 3 looking glass from Chicago, which showed only a 40ms delta
between the pre-hop and hitting my ISP, where it was normally about 11ms
for that link. You could say about 30ms of buffering was going on. The
really interesting thing is I was only getting about 5-10Mb/s, which means
there was virtually zero free bandwidth. but I had almost no packet-loss. I
called my ISP shortly after the issue started and that's when they told me
they were under a DDOS and were at 100% trunk, and they said they were
going to have their trunk bandwidth increased shortly. 5 minutes later, the
issue was gone. About 30 minutes later I was called back and told the DDOS
was still on-going, they just upgraded to enough bandwidth to soak it all.
I found it very interesting that a DDOS large enough to effectively kill
95% of my provisioned bandwidth and increase my ping 30ms over normal, did
not seem to affect packet-loss almost at all. It was well under 0.1%. Is
this due to the statistical nature of large links or did Level 3 have an
AQM to my ISP?
Post by Jonathan Morton
Ten times average demand estimated at time of deployment, and struggling
Post by Jonathan Morton
badly with peak demand a decade later, yes. And this is the transportation
industry, where a decade is a *short* time - like less than a year in
telecoms.
I've worked in ISPs since 1999 or so. I've been at startups and I've been
at established ISPs.
It's kind of an S curve when it comes to traffic growth, when you're
adding customers you can easily see 100%-300% growth per year (or more).
Then after market becomes saturated growth comes from per-customer
increased usage, and for the past 20 years or so, this has been in the
neighbourhood of 20-30% per year.
Running a network that congests parts of the day, it's hard to tell what
"Quality of Experience" your customers will have. I've heard of horror
stories from the 90ties where a then large US ISP was running an OC3 (155
megabit/s) full most of the day. So someone said "oh, we need to upgrade
this", and after a while, they did, to 2xOC3. Great, right? No, after that
upgrade both OC3:s were completely congested. Ok, then upgrade to OC12 (622
megabit/s). After that upgrade, evidently that link was not congested a few
hours of the day, and of course needed more upgrades.
So at the places I've been, I've advocated for planning rules that say
that when the link is peaking at 5 minute averages of more than 50% of link
capacity, then upgrade needs to be ordered. This 50% number can be larger
if the link aggregates larger number of customers, because typically your
"statistical overbooking" varies less the more customers participates.
These devices do not do per-flow anything. They might have 10G or 100G
link to/from it with many many millions of flows, and it's all NPU
forwarding. Typically they might do DIFFserv-based queueing and WRED to
mitigate excessive buffering. Today, they typically don't even do ECN
marking (which I have advocated for, but there is not much support from
other ISPs in this mission).
Now, on the customer access line it's a completely different matter.
Typically people build with BRAS or similar, where (tens of) thousands of
customers might sit on a (very expensive) access card with hundreds of
thousands of queues per NPU. This still leaves just a few queues per
customer, unfortunately. So these do not do per-flow anything either. This
is where PIE comes in, because these devices like these can do PIE in the
NPU fairly easily because it's kind of like WRED.
So back to the capacity issue. Since these devices typically aren't good
at assuring per-customer access to the shared medium (backbone links), it's
easier to just make sure the backbone links are not regularily full. This
doesn't mean you're going to have 10x capacity all the time, it probably
means you're going to be bouncing between 25-70% utilization of your links
(for the normal case, because you need spare capacity to handle events that
increase traffic temporarily, plus handle loss of capacity in case of a
link fault). The upgrade might be to add another link, or a higher tier
speed interface, bringing down the utilization to typically half or quarter
of what you had before.
--
_______________________________________________
Bloat mailing list
https://lists.bufferbloat.net/listinfo/bloat
Mikael Abrahamsson
2017-12-18 08:11:40 UTC
Permalink
Post by Benjamin Cronce
This is an interesting topic to me. Over the past 5+ years, I've been
reading about GPON fiber aggregators(GPON chassis for lack of a proper
term) with 400Gb-1Tb/s of uplink, 1-2Tb/s line-cards, and enough GPON
ports for several thousand customers.
Yep, they're available if people want to pay for them.
Post by Benjamin Cronce
because they eat up support phone time. They eventually went to a
non-oversubscribed flat model. He told me that the GPON chassis plugs
strait into the core router. I asked him about GPON port shared bandwidth
and the GPON uplink. He said they will not over-subscribe a GPON port, so
all ONTs on the port can use 100% of their provisioned rate, and they will
not place more provision bandwidth on a single GPON chassis than what they
uplink can support.
Yes, it makes sense to have no or very small oversubscription of the GPON
port. The aggregation uplink is another beast. In order to do statistical
oversubscription, you need lots of customers so the statitics turn into an
average that doesn't fluctuate too much.
Post by Benjamin Cronce
loss(not a long time to sample, but was sampled at a rate of 10pps against
their speed-test server), but 20-40ms pings. I tested this during off
hours, like 1am.
Yep, sounds like they configured their queues correctly then.

Historically we had 600ms of buffering in backbone platforms in the end of
the 90ties and beginning of 2000ds, then this has slowly eroded over time
so now the "golden standard" is down to 30-50ms of buffer. There are
platforms that have basically no buffer at all, such as 64 port 100G
equipment with 12-24 megabyte of buffer. These are typically for DC
deployment where RTTs are extremely low and special congestion algorithms
are used. They don't work very well for ISP deployments.
Post by Benjamin Cronce
to 1Gb/s. I asked them about this. Yes, the 1Gb tier was also not
over-subscribed. I'm not sure if some lone customer pretty much got their
own GPON port or they had some WDM-PON linecards.
Sounds like it, non-oversubscribing gig customers on GPON sounds
expensive.
Post by Benjamin Cronce
I'm currently paying about $40/m for 150/150 for a "dedicated"
connection. I'm currently getting about 1ms+-0.1ms pings to my ISP's
speedtest server 24/7. If I do a ping flood, I can get my avg ping down
near 0.12ms. I assume this is because of GPON scheduling. Of course I
only test this against their speedtest server and during off hours.
Yes, sounds like GPON scheduler indeed.
Post by Benjamin Cronce
to be load-balanced by some lower-bits in the IP address. This gave a total
Load-balancing can be done a lot of different ways. Typically for
ISP-speak this is called "hashing", some do it on L3, some on L3/L4
information, some do it even deeper into the packet.
Post by Benjamin Cronce
of 6 links. The network admin told me that any given link had enough
bandwidth provisioned, that if all 5 other links were down, that one link
would have a 95th percentile below 80% during peak hours, and customers
should be completely unaffected.
That's a hefty margin. I'd say prudent is to make sure you can handle
single fault without customer degradation. However, transit isn't that
expensive so it might make sense.
Post by Benjamin Cronce
what... 300Gb/s or so split among 32 customers?. As far as I can tell,
last mile bandwidth is a solved problem short of incompetence, greed, or
extreme circumstances.
Sure, it's not a technical problem. We have the technology. It's a money,
politics, law, regulation and will problem. So yes, what you said.
Post by Benjamin Cronce
Ahh yes. Statistical over-subscription was the topic. This works well for
backbone providers where they have many peering links with a heavy mix of
flows. Level 3 has a blog where they were showing off a 10Gb link where
below the 95th percentile, the link had zero total packets lost and a
queuing delay of less than 0.1ms. But above 80%, suddenly loss and jitter
went up with a hockey-stick curve. Then they showed a 400Gb link. It was at
98% utilization for the 95th percentile and it had zero total packets lost
and a max queuing delay of 0.01ms with an average of 0.00ms.
Yes, this is still true today as it was 2002 when this presentation was
done:

https://internetdagarna.se/arkiv/2002/15-bygganat/id2002-peter-lothberg.pdf

(slide 44-47). The speeds have only increased, but this premise still is
the same.
Post by Benjamin Cronce
There was a major European IX that had a blog about bandwidth planning and
over-provisioning. They had a 95th percentile in the many-terabits, and
they said they said they could always predict peak bandwidth to within 1%
for any given day. Given a large mix of flow types, statistics is very good.
Indeed, the bigger the aggreation, the bigger the statistics show same
behaviour every day.
Post by Benjamin Cronce
On a slightly different topic, I wonder what trunk providers are using for
AQMs. My ISP was under a massive DDOS some time in the past year and I use
a Level 3 looking glass from Chicago, which showed only a 40ms delta
between the pre-hop and hitting my ISP, where it was normally about 11ms
for that link. You could say about 30ms of buffering was going on. The
really interesting thing is I was only getting about 5-10Mb/s, which means
there was virtually zero free bandwidth. but I had almost no packet-loss. I
called my ISP shortly after the issue started and that's when they told me
they were under a DDOS and were at 100% trunk, and they said they were
going to have their trunk bandwidth increased shortly. 5 minutes later, the
issue was gone. About 30 minutes later I was called back and told the DDOS
was still on-going, they just upgraded to enough bandwidth to soak it all.
I found it very interesting that a DDOS large enough to effectively kill
95% of my provisioned bandwidth and increase my ping 30ms over normal, did
not seem to affect packet-loss almost at all. It was well under 0.1%. Is
this due to the statistical nature of large links or did Level 3 have an
AQM to my ISP?
This is interesting. I thought about this for several minutes, but can't
come up with an explanation to this behaviour, at least not from the
typical kind of DDOS that's going around. If there was some kind of ddos
mitigration equipment put into the mix, that might explain what you were
seeing.
--
Mikael Abrahamsson email: ***@swm.pp.se
Matthias Tafelmeier
2017-12-17 11:52:23 UTC
Permalink
Post by Dave Taht
I tend not to care as much about how long it takes for things that do
not need R/T deadlines as humans and as steering wheels do.
Propigation delay, while ultimately bound by the speed of light, is also
affected by the wires wrapping indirectly around the earth - much slower
https://arxiv.org/pdf/1505.03449.pdf
Enchanting that somebody actually quantified this intricacy!

*My* Addendum/Errata:

The alternative to a 'fast lane' backbone is not necessarily a mast
based microwave link as stated, which is probably infeasible/inflexible etc.

They were mentioning 'weather-balloons' as well- which actually boils
down to - I'm presuming this - the probably ongoing airborne platform
internet extension attempts by Goggle/Lockheed Martin etc., you call
them ...

These attempts are not based on balloons, but airships
('dirigible'-balloons so to speak), being able to stay aloft for
potentially years. That's widely known and so let's get them away, not
out for that.

NB. Airships are quite impressive for its own and therefore worth bein
recherched for!!!

What I actually wanted to posit in relation to that is that one could
get sooner a c-cabable backbone sibling by marrying two ideas: the
airborne concept ongoing as outlined plus what NASA is planning to bring
about for the space backbone, e.g [1][2]. It's laser based instead of
directed radio-wave only. Sure, both is in the speed range of c,
apparantely, laser transmission has in addition a significantly higher
bandwidth to offer. "10 to 100 times as much data at a time as
radio-frequency systems"[3]. Attenuations to photons in clean
atmospheric air are neglible (few mps - refractive index of about
1.0003), so actually a neglible slowdown - easily competing with top
notch fibres (99.7% the vacuum speed of light). Sure, that's the ideal
case, though, if cleverly done from the procurement of platforms and
overall system steering perspective, might feasible.


[1]
https://www.nasa.gov/feature/goddard/2017/nasa-taking-first-steps-toward-high-speed-space-internet

[2]
https://www.nasa.gov/feature/new-solar-system-internet-technology-debuts-on-the-international-space-station

[3]
https://www.nasa.gov/feature/goddard/2017/tdrs-an-era-of-continuous-space-communications
--
Besten Gruß

Matthias Tafelmeier
Mikael Abrahamsson
2017-12-18 07:50:10 UTC
Permalink
Post by Matthias Tafelmeier
What I actually wanted to posit in relation to that is that one could
get sooner a c-cabable backbone sibling by marrying two ideas: the
airborne concept ongoing as outlined plus what NASA is planning to bring
about for the space backbone, e.g [1][2]. It's laser based instead of
directed radio-wave only. Sure, both is in the speed range of c,
apparantely, laser transmission has in addition a significantly higher
bandwidth to offer. "10 to 100 times as much data at a time as
radio-frequency systems"[3]. Attenuations to photons in clean
atmospheric air are neglible (few mps - refractive index of about
1.0003), so actually a neglible slowdown - easily competing with top
notch fibres (99.7% the vacuum speed of light). Sure, that's the ideal
case, though, if cleverly done from the procurement of platforms and
overall system steering perspective, might feasible.
Todays laser links are in the few km per hop range, with is easily at
least one magnitude shorter than radio based equivalents.

I don't know the physics behind it, but people who have better insight
than I do tell me "it's hard" to run longer hops (if one wants any kind of
high bitrate).
--
Mikael Abrahamsson email: ***@swm.pp.se
Matthias Tafelmeier
2017-12-19 17:55:18 UTC
Permalink
Post by Mikael Abrahamsson
Post by Matthias Tafelmeier
What I actually wanted to posit in relation to that is that one could
get sooner a c-cabable backbone sibling by marrying two ideas: the
airborne concept ongoing as outlined plus what NASA is planning to
bring about for the space backbone, e.g [1][2]. It's laser based
instead of directed radio-wave only. Sure, both is in the speed range
of c, apparantely, laser transmission has in addition a significantly
higher bandwidth to offer. "10 to 100 times as much data at a time as
radio-frequency systems"[3]. Attenuations to photons in clean
atmospheric air are neglible (few mps - refractive index of about
1.0003), so actually a neglible slowdown - easily competing with top
notch fibres (99.7% the vacuum speed of light). Sure, that's the
ideal case, though, if cleverly done from the procurement of
platforms and overall system steering perspective, might feasible.
Todays laser links are in the few km per hop range, with is easily at
least one magnitude shorter than radio based equivalents.
Hold on! This is a severe oversimplifcation, isn' it. The devices you're
probably referring to are in the low-end segment, dillentically and
maybe terrestrially operated only - to mention a few limiting factor
conceivable possibly being perceived.

Certainly, there are range limiting factors when fully submerged in the
near-ground atmospheric ranges. E.g. in the darkest snow storm, one
cannot expect optics to be reliablly working - admitting that.
Nothwithstanding, recent research[1] showed astounding achievements of
FSOs even in harsh atmospheric conditions - "up to 10 gigabits per
second" while in vivid movement, in heavy fog ... for a single pathed laser.

90% mass of the atmosphere  is below 16 km (52,000 ft), therefore also
most of it's randomness[2]. Meaning, one only had to surpass this
distance to more decently unfold the capabilities of an airborne
backbone. Therefore, a hierarchy of airborne vessels might be necessary.
Might smaller, more numerous ones gatewaying the optics out of the dense
parts of the atmosphere to the actual backbone-net borne lasers, might
by doing this relaying not laser beam based. Far more mitigation
techniques are conceivable. From there on, the shortcomings appear
controllable.
Post by Mikael Abrahamsson
I don't know the physics behind it, but people who have better insight
than I do tell me "it's hard" to run longer hops (if one wants any
kind of high bitrate).
If one looks up what is achievable in space, where the conditions
shouldn't be too different from earth atmosphere over 16 km. Thousands
of kilometres for a single hop, single path. Now imagine a decent degree
of multipathing.

Physical intricacies are certainly a headache in this topic, though
shouldn't be decisive, I'd dare to categorize the largest complexity
compartment of such a system into the algorithmics for steering,
converging or stabilizing the airborne components, directing the optics
problerly and in time. The overall automatic or even autonomic
operations to abstract it.

Probably, me forming some papers wrapping this up would be worthwile.

[1]https://phys.org/news/2017-08-high-bandwidth-capability-ships.html

[2]https://arxiv.org/pdf/1705.10630.pdf
--
Besten Gruß

Matthias Tafelmeier
Matthias Tafelmeier
2017-12-27 15:15:25 UTC
Permalink
Post by Matthias Tafelmeier
Probably, me forming some papers wrapping this up would be worthwile.
[1]https://phys.org/news/2017-08-high-bandwidth-capability-ships.html
[2]https://arxiv.org/pdf/1705.10630.pdf
I couldn't refrain. Currently, I've no affiliation rights, therefore I'm
not eligible to push up to arxiv.org or equivalent: this I why I went
for a blog article here [1] in favour of a paer. It's mediocre, thin,
yes ... nevertheless, was astounded during recherche that the idea is
actually already ~ 15 y. old [2]. Back then, this was really an
'intellectual step'.


[1]
https://matthias0tafelmeier.wordpress.com/2017/12/26/speed-of-light-in-the-air-towards-an-airborne-internet-backbone

[2] https://www.researchgate.net/publication/224793937_Stratospheric_Optical_Inter-Platform_Links_for_High_Altitude_Platforms
--
Besten Gruß

Matthias Tafelmeier
Joel Wirāmu Pauling
2018-01-20 11:55:04 UTC
Permalink
As I am writing up my slide-pack for LCA2018 this reminded me to test out
irtt sleep bench against my running system.

Seems either the Skylake Parts are much better in Combination with current
kernels at this than what you were running on - what is the kernel of the
x86 result?
---
***@kiorewha:~/go/bin$ inxi -C
CPU: Quad core Intel Core i7-7700K (-HT-MCP-) cache: 8192 KB
clock speeds: max: 4700 MHz 1: 4603 MHz 2: 4616 MHz 3: 4600 MHz
4: 4602 MHz 5: 4601 MHz 6: 4612 MHz
7: 4601 MHz 8: 4600 MHz
***@kiorewha:~/go/bin$ uname -a
Linux kiorewha 4.15.0-rc7+ #12 SMP Tue Jan 16 20:16:35 NZDT 2018 x86_64
x86_64 x86_64 GNU/Linux
***@kiorewha:~/go/bin$ irtt sleep
Testing sleep accuracy...

Sleep Duration Mean Error % Error
1ns 245ns 24586.2
10ns 234ns 2345.1
100ns 10.272µs 10272.9
1µs 52.995µs 5299.6
10µs 53.189µs 531.9
100µs 53.926µs 53.9
1ms 80.973µs 8.1
10ms 86.933µs 0.9
100ms 86.563µs 0.1
200ms 66.967µs 0.0
500ms 64.883µs 0.0
Post by Dave Taht
Post by Luca Muscariello
I think everything is about response time, even throughput.
If we compare the time to transmit a single packet from A to B, including
propagation delay, transmission delay and queuing delay,
to the time to move a much larger amount of data from A to B we use
throughput
Post by Luca Muscariello
in this second case because it is a normalized
quantity w.r.t. response time (bytes over delivery time). For a single
transmission we tend to use latency.
But in the end response time is what matters.
Also, even instantaneous throughput is well defined only for a time
scale which
Post by Luca Muscariello
has to be much larger than the min RTT (propagation + transmission
delays)
Post by Luca Muscariello
Agree also that looking at video, latency and latency budgets are better
quantities than throughput. At least more accurate.
I suggest we stop talking about throughput, which has been the
mistaken
Post by Luca Muscariello
idea about networking for 30-40 years.
We need to talk both about latency and speed. Yes, speed is talked
about too
Post by Luca Muscariello
much (relative to RTT), but it's not irrelevant.
Speed of light in fiber means RTT is approx 1ms per 100km, so from
Stockholm
Post by Luca Muscariello
to SFO my RTT is never going to be significantly below 85ms (8625km
great
Post by Luca Muscariello
circle). It's current twice that.
So we just have to accept that some services will never be
deliverable
Post by Luca Muscariello
across the wider Internet, but have to be deployed closer to the
customer
Post by Luca Muscariello
(as per your examples, some need 1ms RTT to work well), and we need
lower
Post by Luca Muscariello
access latency and lower queuing delay. So yes, agreed.
However, I am not going to concede that speed is "mistaken idea about
networking". No amount of smarter queuing is going to fix the
problem if I
Post by Luca Muscariello
don't have enough throughput available to me that I need for my
application.
In terms of the bellcurve here, throughput has increased much more
rapidly than than latency has decreased, for most, and in an increasing
majority of human-interactive cases (like video streaming), we often
have enough throughput.
And the age old argument regarding "just have overcapacity, always"
tends to work in these cases.
I tend not to care as much about how long it takes for things that do
not need R/T deadlines as humans and as steering wheels do.
Propigation delay, while ultimately bound by the speed of light, is also
affected by the wires wrapping indirectly around the earth - much slower
https://arxiv.org/pdf/1505.03449.pdf
A lot of my struggles of late has been to get latencies and adaquate
sampling techniques down below 3ms (my previous value for starting to
reject things due to having too much noise) - and despite trying fairly
hard, well... a process can't even sleep accurately much below 1ms, on
bare metal linux. A dream of mine has been 8 channel high quality audio,
with a video delay of not much more than 2.7ms for AR applications.
Testing sleep accuracy...
Sleep Duration Mean Error % Error
1ns 13.353µs 1335336.9
10ns 14.34µs 143409.5
100ns 13.343µs 13343.9
1µs 12.791µs 1279.2
10µs 148.661µs 1486.6
100µs 150.907µs 150.9
1ms 168.001µs 16.8
10ms 131.235µs 1.3
100ms 145.611µs 0.1
200ms 162.917µs 0.1
500ms 169.885µs 0.0
Testing sleep accuracy...
Sleep Duration Mean Error % Error
1ns 668ns 66831.9
10ns 672ns 6723.7
100ns 557ns 557.6
1µs 57.749µs 5774.9
10µs 63.063µs 630.6
100µs 67.737µs 67.7
1ms 153.978µs 15.4
10ms 169.709µs 1.7
100ms 186.685µs 0.2
200ms 176.859µs 0.1
500ms 177.271µs 0.0
Post by Luca Muscariello
--
_______________________________________________
Bloat mailing list
https://lists.bufferbloat.net/listinfo/bloat
_______________________________________________
Bloat mailing list
https://lists.bufferbloat.net/listinfo/bloat
_______________________________________________
Cerowrt-devel mailing list
https://lists.bufferbloat.net/listinfo/cerowrt-devel
Mikael Abrahamsson
2017-12-04 12:41:28 UTC
Permalink
Post by Joel Wirāmu Pauling
How to deliver a switch, when the wiring and port standard isn't
actually workable?
Not workable?
Post by Joel Wirāmu Pauling
10GBase-T is out of Voltage Spec with SFP+ ; you can get copper SFP+
Yep, the "Cu SFP" was a luxury for a while. Physics is harsh mistress
though.
Post by Joel Wirāmu Pauling
but they are out of spec... 10GbaseT doesn't really work over Cat5e
more than a couple of meters (if you are lucky) and even Cat6 is only
rated at 30M... there is a reason no-one is producing Home Copper
switches and it's not just the NIC Silicon cost (that was a factor
until Recently obviously, but only part of the equation).
I have CAT6 in my home, and not more than 30 meters anywhere. So it would
work for me. You need CAT6e for 100M, so anyone doing new installs should
use that. Stiff cable, though.
Post by Joel Wirāmu Pauling
Right now I am typing this via a 40gbit network, comprised of the
cheap and readily available Tb3 port - it's daisy chained and limited
to 6 ports, but right now it's easily the cheapest and most effective
port. Pitty that the fabled optical tb3 cables are damn expensive...
so you're limited to daisy-chains of 2m. They seem to have screwed the
pooch on the USB-C network standard quite badly - which looked so
promising, so for the moment Tb3 it is for me at least.
With that distance, you could probably run 10GE over CAT3 wiring. So there
is a reason 10GE requires more for longer distances, because it's bad
cable so instead you need lots of power and DSPs to figure out what's
going on.
--
Mikael Abrahamsson email: ***@swm.pp.se
Jesper Dangaard Brouer
2017-12-04 10:56:51 UTC
Permalink
Post by Dave Taht
Changing the topic, adding bloat.
Adding netdev, and also adjust the topic to be a rant on that the Linux
kernel network stack is actually damn fast, and if you need something
faster then XDP can solved your needs...
Post by Dave Taht
Just from a Telco/Industry perspective slant.
Everything in DC has moved to SFP28 interfaces at 25Gbit as the server
port of interconnect. Everything TOR wise is now QSFP28 - 100Gbit.
Mellanox X5 cards are the current hotness, and their offload
enhancements (ASAP2 - which is sorta like DPDK on steroids) allows for
OVS flow rules programming into the card. We have a lot of customers
chomping at the bit for that feature (disclaimer I work for Nuage
Networks, and we are working on enhanced OVS to do just that) for NFV
workloads.
What Jesper's been working on for ages has been to try and get linux's
PPS up for small packets, which last I heard was hovering at about 4Gbits.
I hope you made a typo here Dave, the normal Linux kernel is definitely
way beyond 4Gbit/s, you must have misunderstood something, maybe you
meant 40Gbit/s? (which is also too low)

Scaling up to more CPUs and TCP-stream, Tariq[1] and I have showed the
Linux kernel network stack scales to 94Gbit/s (linerate minus overhead).
But when the drivers page-recycler fails, we hit bottlenecks in the
page-allocator, that cause negative scaling to around 43Gbit/s.

[1] http://lkml.kernel.org/r/cef85936-10b2-5d76-9f97-***@mellanox.com

Linux have for a _long_ time been doing 10Gbit/s TCP-stream easily, on
a SINGLE CPU. This is mostly thanks to TSO/GRO aggregating packets,
but last couple of years the network stack have been optimized (with
UDP workloads), and as a result we can do 10G without TSO/GRO on a
single-CPU. This is "only" 812Kpps with MTU size frames.

It is important to NOTICE that I'm mostly talking about SINGLE-CPU
performance. But the Linux kernel scales very well to more CPUs, and
you can scale this up, although we are starting to hit scalability
issues in MM-land[1].

I've also demonstrated that netdev-community have optimized the kernels
per-CPU processing power to around 2Mpps. What does this really
mean... well with MTU size packets 812Kpps was 10Gbit/s, thus 25Gbit/s
should be around 2Mpps.... That implies Linux can do 25Gbit/s on a
single CPU without GRO (MTU size frames). Do you need more I ask?
Post by Dave Taht
The route table lookup also really expensive on the main cpu.
Well, it used-to-be very expensive. Vincent Bernat wrote some excellent
blogposts[2][3] on the recent improvements over kernel versions, and
gave due credit to people involved.

[2] https://vincent.bernat.im/en/blog/2017-performance-progression-ipv4-route-lookup-linux
[3] https://vincent.bernat.im/en/blog/2017-performance-progression-ipv6-route-lookup-linux

He measured around 25 to 35 nanosec cost of route lookups. My own
recent measurements were 36.9 ns cost of fib_table_lookup.
Post by Dave Taht
Does this stuff offload the route table lookup also?
If you have not heard, the netdev-community have worked on something
called XDP (eXpress Data Path). This is a new layer in the network
stack, that basically operates a the same "layer"/level as DPDK.
Thus, surprise we get the same performance numbers as DPDK. E.g. I can
do 13.4 Mpps forwarding with ixgbe on a single CPU (more CPUs=14.6Mps)

We can actually use XDP for (software) offloading the Linux routing
table. There are two methods we are experimenting with:

(1) externally monitor route changes from userspace and update BPF-maps
to reflect this. That approach is already accepted upstream[4][5]. I'm
measuring 9,513,746 pps per CPU with that approach.

(2) add a bpf helper to simply call fib_table_lookup() from the XDP hook.
This is still experimental patches (credit to David Ahern), and I've
measured 9,350,160 pps with this approach in a single CPU. Using more
CPUs we hit 14.6Mpps (only used 3 CPUs in that test)


[4] https://github.com/torvalds/linux/blob/master/samples/bpf/xdp_router_ipv4_user.c
[5] https://github.com/torvalds/linux/blob/master/samples/bpf/xdp_router_ipv4_kern.c
--
Best regards,
Jesper Dangaard Brouer
MSc.CS, Principal Kernel Engineer at Red Hat
LinkedIn: http://www.linkedin.com/in/brouer
Dave Taht
2017-12-04 17:00:41 UTC
Permalink
Jesper:

I have a tendency to deal with netdev by itself and never cross post
there, as the bufferbloat.net servers (primarily to combat spam)
mandate starttls and vger doesn't support it at all, thus leading to
raising davem blood pressure which I'd rather not do.

But moving on...

On Mon, Dec 4, 2017 at 2:56 AM, Jesper Dangaard Brouer
Post by Jesper Dangaard Brouer
Post by Dave Taht
Changing the topic, adding bloat.
Adding netdev, and also adjust the topic to be a rant on that the Linux
kernel network stack is actually damn fast, and if you need something
faster then XDP can solved your needs...
Post by Dave Taht
Just from a Telco/Industry perspective slant.
Everything in DC has moved to SFP28 interfaces at 25Gbit as the server
port of interconnect. Everything TOR wise is now QSFP28 - 100Gbit.
Mellanox X5 cards are the current hotness, and their offload
enhancements (ASAP2 - which is sorta like DPDK on steroids) allows for
OVS flow rules programming into the card. We have a lot of customers
chomping at the bit for that feature (disclaimer I work for Nuage
Networks, and we are working on enhanced OVS to do just that) for NFV
workloads.
What Jesper's been working on for ages has been to try and get linux's
PPS up for small packets, which last I heard was hovering at about 4Gbits.
I hope you made a typo here Dave, the normal Linux kernel is definitely
way beyond 4Gbit/s, you must have misunderstood something, maybe you
meant 40Gbit/s? (which is also too low)
The context here was PPS for *non-gro'd* tcp ack packets, in the
further context of
the increasingly epic "benefits of ack filtering" thread on the bloat
list, in the context
that for 50x1 end-user-asymmetry we were seeing 90% less acks with the new
sch_cake ack-filter code, double the throughput...

The kind of return traffic you see from data sent outside the DC, with
tons of flows.

What's that number?
Post by Jesper Dangaard Brouer
Scaling up to more CPUs and TCP-stream, Tariq[1] and I have showed the
Linux kernel network stack scales to 94Gbit/s (linerate minus overhead).
But when the drivers page-recycler fails, we hit bottlenecks in the
page-allocator, that cause negative scaling to around 43Gbit/s.
So I divide by 94/22 and get 4gbit for acks. Or I look at PPS * 66. Or?
Post by Jesper Dangaard Brouer
Linux have for a _long_ time been doing 10Gbit/s TCP-stream easily, on
a SINGLE CPU. This is mostly thanks to TSO/GRO aggregating packets,
but last couple of years the network stack have been optimized (with
UDP workloads), and as a result we can do 10G without TSO/GRO on a
single-CPU. This is "only" 812Kpps with MTU size frames.
acks.
Post by Jesper Dangaard Brouer
It is important to NOTICE that I'm mostly talking about SINGLE-CPU
performance. But the Linux kernel scales very well to more CPUs, and
you can scale this up, although we are starting to hit scalability
issues in MM-land[1].
I've also demonstrated that netdev-community have optimized the kernels
per-CPU processing power to around 2Mpps. What does this really
mean... well with MTU size packets 812Kpps was 10Gbit/s, thus 25Gbit/s
should be around 2Mpps.... That implies Linux can do 25Gbit/s on a
single CPU without GRO (MTU size frames). Do you need more I ask?
The benchmark I had in mind was, say, 100k flows going out over the internet,
and the characteristics of the ack flows on the return path.
Post by Jesper Dangaard Brouer
Post by Dave Taht
The route table lookup also really expensive on the main cpu.
To clarify the context here, I was asking specifically if the X5 mellonox card
did routing table offlload or only switching.
Post by Jesper Dangaard Brouer
Well, it used-to-be very expensive. Vincent Bernat wrote some excellent
blogposts[2][3] on the recent improvements over kernel versions, and
gave due credit to people involved.
[2] https://vincent.bernat.im/en/blog/2017-performance-progression-ipv4-route-lookup-linux
[3] https://vincent.bernat.im/en/blog/2017-performance-progression-ipv6-route-lookup-linux
He measured around 25 to 35 nanosec cost of route lookups. My own
recent measurements were 36.9 ns cost of fib_table_lookup.
On intel hw.
Post by Jesper Dangaard Brouer
Post by Dave Taht
Does this stuff offload the route table lookup also?
If you have not heard, the netdev-community have worked on something
called XDP (eXpress Data Path). This is a new layer in the network
stack, that basically operates a the same "layer"/level as DPDK.
Thus, surprise we get the same performance numbers as DPDK. E.g. I can
do 13.4 Mpps forwarding with ixgbe on a single CPU (more CPUs=14.6Mps)
We can actually use XDP for (software) offloading the Linux routing
(1) externally monitor route changes from userspace and update BPF-maps
to reflect this. That approach is already accepted upstream[4][5]. I'm
measuring 9,513,746 pps per CPU with that approach.
(2) add a bpf helper to simply call fib_table_lookup() from the XDP hook.
This is still experimental patches (credit to David Ahern), and I've
measured 9,350,160 pps with this approach in a single CPU. Using more
CPUs we hit 14.6Mpps (only used 3 CPUs in that test)
Neat. Perhaps trying xdp on the itty bitty routers I usually work on
would be a win.
quad arm cores are increasingy common there.
Post by Jesper Dangaard Brouer
[4] https://github.com/torvalds/linux/blob/master/samples/bpf/xdp_router_ipv4_user.c
[5] https://github.com/torvalds/linux/blob/master/samples/bpf/xdp_router_ipv4_kern.c
thx very much for the update.
Post by Jesper Dangaard Brouer
--
Best regards,
Jesper Dangaard Brouer
MSc.CS, Principal Kernel Engineer at Red Hat
LinkedIn: http://www.linkedin.com/in/brouer
_______________________________________________
Bloat mailing list
https://lists.bufferbloat.net/listinfo/bloat
--
Dave Täht
CEO, TekLibre, LLC
http://www.teklibre.com
Tel: 1-669-226-2619
Joel Wirāmu Pauling
2017-12-04 20:49:43 UTC
Permalink
Post by Dave Taht
Post by Dave Taht
The route table lookup also really expensive on the main cpu.
To clarify the context here, I was asking specifically if the X5 mellonox card
did routing table offlload or only switching.
To clarify what I know the X5 using it's smart offload engine CAN do
L3 offload into the NIC - the X4's can't.

So for the Nuage OVS -> Eswitch (what mellanox calls the flow
programming) magic to happen and be useful we are going to need X5.

Mark Iskra gave a talk at Openstack summit which can be found here:

https://www.openstack.org/videos/sydney-2017/warp-speed-openvswitch-turbo-charge-vnfs-to-100gbps-in-nextgen-sdnnfv-datacenter

Slides here:

https://www.openstack.org/assets/presentation-media/OSS-Nov-2017-Warp-speed-Openvswitch-v6.pptx

Mark's local to you (Mountain View) - and is a nice guy, is probably
the better person to answer specifics.

-Joel
Jesper Dangaard Brouer
2017-12-07 08:43:05 UTC
Permalink
On Mon, 4 Dec 2017 09:00:41 -0800
Post by Dave Taht
I have a tendency to deal with netdev by itself and never cross post
there, as the bufferbloat.net servers (primarily to combat spam)
mandate starttls and vger doesn't support it at all, thus leading to
raising davem blood pressure which I'd rather not do.
Sorry, I didn't know. I've removed the bloat-lists from the reply I
just gave to Matthias on netdev:

http://lkml.kernel.org/r/***@redhat.com

And I'll refrain from cross-posting between these lists in the future.
--
Best regards,
Jesper Dangaard Brouer
MSc.CS, Principal Kernel Engineer at Red Hat
LinkedIn: http://www.linkedin.com/in/brouer
Jesper Dangaard Brouer
2017-12-07 08:49:49 UTC
Permalink
(Removed netdev list)
Post by Dave Taht
Post by Jesper Dangaard Brouer
If you have not heard, the netdev-community have worked on something
called XDP (eXpress Data Path). This is a new layer in the network
stack, that basically operates a the same "layer"/level as DPDK.
Thus, surprise we get the same performance numbers as DPDK. E.g. I can
do 13.4 Mpps forwarding with ixgbe on a single CPU (more CPUs=14.6Mps)
We can actually use XDP for (software) offloading the Linux routing
(1) externally monitor route changes from userspace and update BPF-maps
to reflect this. That approach is already accepted upstream[4][5]. I'm
measuring 9,513,746 pps per CPU with that approach.
(2) add a bpf helper to simply call fib_table_lookup() from the XDP hook.
This is still experimental patches (credit to David Ahern), and I've
measured 9,350,160 pps with this approach in a single CPU. Using more
CPUs we hit 14.6Mpps (only used 3 CPUs in that test)
Neat. Perhaps trying xdp on the itty bitty routers I usually work on
would be a win.
Definitely. It will be a huge win for small routers. This is part of my
grand scheme. We/I just need to implement XDP in one of these small
router's driver.

That said, XDP skip many layers and features of the network stack that
you likely need on these small routers e.g. like NAT...
Post by Dave Taht
Post by Jesper Dangaard Brouer
[4] https://github.com/torvalds/linux/blob/master/samples/bpf/xdp_router_ipv4_user.c
[5] https://github.com/torvalds/linux/blob/master/samples/bpf/xdp_router_ipv4_kern.c
thx very much for the update.
--
Best regards,
Jesper Dangaard Brouer
MSc.CS, Principal Kernel Engineer at Red Hat
LinkedIn: http://www.linkedin.com/in/brouer
Matthias Tafelmeier
2017-12-04 17:19:09 UTC
Permalink
Hello,
Post by Jesper Dangaard Brouer
Scaling up to more CPUs and TCP-stream, Tariq[1] and I have showed the
Linux kernel network stack scales to 94Gbit/s (linerate minus overhead).
But when the drivers page-recycler fails, we hit bottlenecks in the
page-allocator, that cause negative scaling to around 43Gbit/s.
Linux have for a _long_ time been doing 10Gbit/s TCP-stream easily, on
a SINGLE CPU. This is mostly thanks to TSO/GRO aggregating packets,
but last couple of years the network stack have been optimized (with
UDP workloads), and as a result we can do 10G without TSO/GRO on a
single-CPU. This is "only" 812Kpps with MTU size frames.
Cannot find the reference anymore, but there was once some workshop held
by you during some netdev where you were stating that you're practially
in rigorous exchange with NIC vendors as to having them tremendously
increase the RX/TX rings(queues) numbers. Further, that there are hardly
any limits to the number other than FPGA magic/physical HW - up to
millions is viable was coined back then.  May I ask were this ended up?
Wouldn't that be key for massive parallelization either - With having a
queue(producer), a CPU (consumer)  - vice versa - per flow at the
extreme? Did this end up in this SMART-NIC thingummy? The latter is
rather trageted at XDP, no?
--
Besten Gruß

Matthias Tafelmeier
Loading...