Discussion:
one benefit of turning off shaping + fq_codel
Add Reply
Dave Taht
2018-11-13 16:54:15 UTC
Reply
Permalink
It turns out we are contributing to global warming.

https://community.ubnt.com/t5/UniFi-Routing-Switching/USG-temperature/m-p/2547046/highlight/true#M115060
--
Dave Täht
CTO, TekLibre, LLC
http://www.teklibre.com
Tel: 1-831-205-9740
Mikael Abrahamsson
2018-11-14 17:09:07 UTC
Reply
Permalink
Post by Dave Taht
It turns out we are contributing to global warming.
https://community.ubnt.com/t5/UniFi-Routing-Switching/USG-temperature/m-p/2547046/highlight/true#M115060
There is a reason vendors have packet accelerators. It's more efficient
compared to doing everything in CPU.
--
Mikael Abrahamsson email: ***@swm.pp.se
David Lang
2018-11-15 00:56:10 UTC
Reply
Permalink
Post by Dave Taht
Post by Dave Taht
It turns out we are contributing to global warming.
https://community.ubnt.com/t5/UniFi-Routing-Switching/USG-temperature/m-p/2547046/highlight/true#M115060
so how much power is wasted in re-transmitting packets due to bloat?
Dave Taht
2018-11-15 03:44:38 UTC
Reply
Permalink
Post by David Lang
Post by Dave Taht
Post by Dave Taht
It turns out we are contributing to global warming.
https://community.ubnt.com/t5/UniFi-Routing-Switching/USG-temperature/m-p/2547046/highlight/true#M115060
so how much power is wasted in re-transmitting packets due to bloat?
That might be a good way to look at it also. It seems possible
to make the calculation in time for midnight of march 31st.
Post by David Lang
_______________________________________________
Bloat mailing list
https://lists.bufferbloat.net/listinfo/bloat
Pete Heist
2018-11-23 11:47:57 UTC
Reply
Permalink
Post by Dave Taht
It turns out we are contributing to global warming.
https://community.ubnt.com/t5/UniFi-Routing-Switching/USG-temperature/m-p/2547046/highlight/true#M115060 <https://community.ubnt.com/t5/UniFi-Routing-Switching/USG-temperature/m-p/2547046/highlight/true#M115060>
Would it be right to say that the biggest opportunity for reducing consumption is to avoid shaping, i.e. by adding BQL-like functionality to all classes of device drivers, and/or by deploying congestion control globally that avoids the need for it?

Other ideas: move queue management into hardware, power network equipment with renewables, or just use the Internet less. :)

Pete

(I noticed an audience member brought this up in Toke’s thesis defense)
Dave Taht
2018-11-23 16:26:43 UTC
Reply
Permalink
Post by Dave Taht
It turns out we are contributing to global warming.
https://community.ubnt.com/t5/UniFi-Routing-Switching/USG-temperature/m-p/2547046/highlight/true#M115060
Would it be right to say that the biggest opportunity for reducing
consumption is to avoid shaping, i.e. by adding BQL-like functionality
to all classes of device drivers
Shaping outbound with BQL's support for a dynamic interrupt would be
*free*. A few ethernet chips already have that. Basically you set a
register saying "you are really a 200Mbit interface, return a completion
interrupt after the equivalent of that amount of time has passed".

I can neither remember what chips can do this already, or the name of
the bql feature that does it, this morning.

But it's a register you twiddle and a simple divider circuit.

But outbound is not the problem for us from a heat generation standpoint...
Post by Dave Taht
and/or by deploying congestion control globally that avoids the need for it?
I think it would be interesting to compare energy per byte successfully
delivered across various technologies. Driving fiber lines is pretty
high energy, though, and I think (without a back of envelope handy),
that that would be far more expensive than shaping currently is.

still, adding 6 C to everybody's home router to shape inbound under
heavy load is pretty costly both in energy and reduced service life.
Post by Dave Taht
Other ideas: move queue management into hardware
I have increasingly high hopes for P4 and other forms of hardware to
finally do shaping and queue management right.

https://github.com/ralfkundel/p4-codel/issues/2

Back in the day, I was a huge fan of async logic, which I first
encountered via caltech's cpu and later the amulet.

https://en.wikipedia.org/wiki/Asynchronous_circuit#Asynchronous_CPU

This reduces power consumption enormously. The caltech logic design
system is now open source, and I'd looked it over a few years ago hoping
I could use it to ressurect my ancient skills in this department. I
can't find it this morning, either. there's coffee around here
somewhere... My *big* interest in this tech was because it essentially
eliminates clock noise and you can build a much more sensitive wireless
reciever with it. I got bit by DRAMs being "too loud" on several occasions.

Fulcrum (before they got bought by intel) used async logic in their switch chips.

I think (but am not sure) that the technique is undergoing a
renanassance in the AI chips. The big IBM chip uses it, and it just
totally makes sense if you have zillions of small cpus doing neaural
networks, to only power them up when needed. No crazy P1,P2,P3 etc clock
states are needed, the chip just speeds up or slows down as a function
of heat.

I've never really understood why it didn't take off, I think, in part,
it doesn't scale to wide busses well, and that centrally clocked designs
are how most engineers and fpgas and code got designed since. Anything
with delay built into it seems hard for EEs to grasp.... but I wish I
knew why, or had the time to go play with circuits again at a reasonable
scale.
Post by Dave Taht
power network
equipment with renewables, or just use the Internet less. :)
I am glad to see more of the former happening. A recent data center
design in singapore basically needed it's own nuclear power plant.

In my case I've always wanted the computing to take place under the
users fingers, I do not like the centralization trend we are in today at
all. I like that apple seems to be leading the way to be putting all
these cool new AI tools in your own hands.

As for the latter... I'm using browsers less now (emacs rocks), and
seem to be getting more done.
Post by Dave Taht
Pete
(I noticed an audience member brought this up in Toke’s thesis
defense)
I sadly slept through that. I hope it was recorded.
Jonathan Morton
2018-11-23 16:43:02 UTC
Reply
Permalink
Post by Dave Taht
Post by Dave Taht
(I noticed an audience member brought this up in Toke’s thesis
defense)
I sadly slept through that. I hope it was recorded.
As it was streamed on YT, YT archived it.

- Jonathan Morton
Dave Taht
2018-11-23 16:48:43 UTC
Reply
Permalink
Ahhh.... good. My morning has improved. I found the coffee, and I have
something way more interesting that CNN on.


Post by Jonathan Morton
Post by Dave Taht
Post by Dave Taht
(I noticed an audience member brought this up in Toke’s thesis
defense)
I sadly slept through that. I hope it was recorded.
As it was streamed on YT, YT archived it.
- Jonathan Morton
_______________________________________________
Bloat mailing list
https://lists.bufferbloat.net/listinfo/bloat
--
Dave Täht
CTO, TekLibre, LLC
http://www.teklibre.com
Tel: 1-831-205-9740
Luca Muscariello
2018-11-23 17:16:54 UTC
Reply
Permalink
Yes there was some discussion about that.
Moving things to hardware should fix that.

Evens traffic management in NPU based routers makes use of hardware based
polling for shaping. These are trade offs one has to face all the time.

There has been a discussion at the defense about hardware vs software,
hardware + software, when one, when the other.


BTW Toke is Doctor Toke now :-)
Post by Dave Taht
Ahhh.... good. My morning has improved. I found the coffee, and I have
something way more interesting that CNN on.
http://youtu.be/upvx6rpSLSw
Post by Jonathan Morton
Post by Dave Taht
Post by Pete Heist
(I noticed an audience member brought this up in Toke’s thesis defense)
I sadly slept through that. I hope it was recorded.
As it was streamed on YT, YT archived it.
- Jonathan Morton
_______________________________________________
Bloat mailing list
https://lists.bufferbloat.net/listinfo/bloat
--
Dave TÀht
CTO, TekLibre, LLC
http://www.teklibre.com
Tel: 1-831-205-9740
_______________________________________________
Bloat mailing list
https://lists.bufferbloat.net/listinfo/bloat
Dave Taht
2018-11-23 17:27:42 UTC
Reply
Permalink
On Fri, Nov 23, 2018 at 9:17 AM Luca Muscariello
Post by Luca Muscariello
Yes there was some discussion about that.
Moving things to hardware should fix that.
Evens traffic management in NPU based routers makes use of hardware based polling for shaping. These are trade offs one has to face all the time.
There has been a discussion at the defense about hardware vs software, hardware + software, when one, when the other.
I'm still listening/watching.
Post by Luca Muscariello
BTW Toke is Doctor Toke now :-)
I hope you made him sweat, at least a little. :)
Post by Luca Muscariello
Post by Dave Taht
Ahhh.... good. My morning has improved. I found the coffee, and I have
something way more interesting that CNN on.
http://youtu.be/upvx6rpSLSw
Post by Jonathan Morton
Post by Dave Taht
Post by Dave Taht
(I noticed an audience member brought this up in Toke’s thesis
defense)
I sadly slept through that. I hope it was recorded.
As it was streamed on YT, YT archived it.
- Jonathan Morton
_______________________________________________
Bloat mailing list
https://lists.bufferbloat.net/listinfo/bloat
--
Dave Täht
CTO, TekLibre, LLC
http://www.teklibre.com
Tel: 1-831-205-9740
_______________________________________________
Bloat mailing list
https://lists.bufferbloat.net/listinfo/bloat
--
Dave Täht
CTO, TekLibre, LLC
http://www.teklibre.com
Tel: 1-831-205-9740
Luca Muscariello
2018-11-23 17:32:38 UTC
Reply
Permalink
Post by Dave Taht
On Fri, Nov 23, 2018 at 9:17 AM Luca Muscariello
Post by Luca Muscariello
Yes there was some discussion about that.
Moving things to hardware should fix that.
Evens traffic management in NPU based routers makes use of hardware
based polling for shaping. These are trade offs one has to face all the
time.
Post by Luca Muscariello
There has been a discussion at the defense about hardware vs software,
hardware + software, when one, when the other.
I'm still listening/watching.
Post by Luca Muscariello
BTW Toke is Doctor Toke now :-)
I hope you made him sweat, at least a little. :)
Difficult to sweet here. It’s freezing, but I’m leaving already. Toke I
think has already started to eat and drink too much...
Post by Dave Taht
Post by Luca Muscariello
Post by Dave Taht
Ahhh.... good. My morning has improved. I found the coffee, and I have
something way more interesting that CNN on.
http://youtu.be/upvx6rpSLSw
Post by Jonathan Morton
Post by Dave Taht
Post by Pete Heist
(I noticed an audience member brought this up in Toke’s thesis
defense)
I sadly slept through that. I hope it was recorded.
As it was streamed on YT, YT archived it.
- Jonathan Morton
_______________________________________________
Bloat mailing list
https://lists.bufferbloat.net/listinfo/bloat
--
Dave TÀht
CTO, TekLibre, LLC
http://www.teklibre.com
Tel: 1-831-205-9740
_______________________________________________
Bloat mailing list
https://lists.bufferbloat.net/listinfo/bloat
--
Dave TÀht
CTO, TekLibre, LLC
http://www.teklibre.com
Tel: 1-831-205-9740
Toke Høiland-Jørgensen
2018-11-25 21:14:17 UTC
Reply
Permalink
Post by Dave Taht
On Fri, Nov 23, 2018 at 9:17 AM Luca Muscariello
Post by Luca Muscariello
Yes there was some discussion about that.
Moving things to hardware should fix that.
Evens traffic management in NPU based routers makes use of hardware
based polling for shaping. These are trade offs one has to face all the
time.
Post by Luca Muscariello
There has been a discussion at the defense about hardware vs software,
hardware + software, when one, when the other.
I'm still listening/watching.
Post by Luca Muscariello
BTW Toke is Doctor Toke now :-)
I hope you made him sweat, at least a little. :)
Difficult to sweet here. It’s freezing, but I’m leaving already. Toke I
think has already started to eat and drink too much...
Haha, indeed. Only catching up to email now... :D

Thanks for a fun discussion, Luca, and to everyone who listened in. I
had a tremendously fun three hours, hope everyone else did as well! :)

-Toke
Pete Heist
2018-11-26 12:52:39 UTC
Reply
Permalink
Post by Toke Høiland-Jørgensen
Post by Luca Muscariello
Post by Dave Taht
On Fri, Nov 23, 2018 at 9:17 AM Luca Muscariello
Post by Luca Muscariello
BTW Toke is Doctor Toke now :-)
I hope you made him sweat, at least a little. :)
Difficult to sweet here. It’s freezing, but I’m leaving already. Toke I
think has already started to eat and drink too much...
Haha, indeed. Only catching up to email now... :D
Thanks for a fun discussion, Luca, and to everyone who listened in. I
had a tremendously fun three hours, hope everyone else did as well! :)
I did- great work Toke and congratulations on the result! Wish there were more interesting discussions like that. :)

Pete
Dave Taht
2018-11-26 12:54:38 UTC
Reply
Permalink
Post by Dave Taht
On Fri, Nov 23, 2018 at 9:17 AM Luca Muscariello
BTW Toke is Doctor Toke now :-)
I hope you made him sweat, at least a little. :)
Difficult to sweet here. It’s freezing, but I’m leaving already. Toke I
think has already started to eat and drink too much...
Haha, indeed. Only catching up to email now... :D
Thanks for a fun discussion, Luca, and to everyone who listened in. I
had a tremendously fun three hours, hope everyone else did as well! :)
I did- great work Toke and congratulations on the result! Wish there were more interesting discussions like that. :)
I had great difficulty making it out. Would it be possible to get a transcript?
Post by Dave Taht
Pete
_______________________________________________
Bloat mailing list
https://lists.bufferbloat.net/listinfo/bloat
--
Dave Täht
CTO, TekLibre, LLC
http://www.teklibre.com
Tel: 1-831-205-9740
Toke Høiland-Jørgensen
2018-11-26 13:30:14 UTC
Reply
Permalink
Post by Dave Taht
Post by Dave Taht
On Fri, Nov 23, 2018 at 9:17 AM Luca Muscariello
BTW Toke is Doctor Toke now :-)
I hope you made him sweat, at least a little. :)
Difficult to sweet here. It’s freezing, but I’m leaving already. Toke I
think has already started to eat and drink too much...
Haha, indeed. Only catching up to email now... :D
Thanks for a fun discussion, Luca, and to everyone who listened in. I
had a tremendously fun three hours, hope everyone else did as well! :)
I did- great work Toke and congratulations on the result! Wish there
were more interesting discussions like that. :)
I had great difficulty making it out. Would it be possible to get a transcript?
Yeah, we only had the one mic, unfortunately. Youtube does an automatic
transcription that you can find if you press the three dots beneath the
video (or just turn on subtitles); but I don't think we have any
volunteers to do a manual one...

-Toke
Toke Høiland-Jørgensen
2018-11-26 13:26:57 UTC
Reply
Permalink
Post by Pete Heist
Post by Toke Høiland-Jørgensen
Post by Dave Taht
On Fri, Nov 23, 2018 at 9:17 AM Luca Muscariello
Post by Luca Muscariello
BTW Toke is Doctor Toke now :-)
I hope you made him sweat, at least a little. :)
Difficult to sweet here. It’s freezing, but I’m leaving already. Toke I
think has already started to eat and drink too much...
Haha, indeed. Only catching up to email now... :D
Thanks for a fun discussion, Luca, and to everyone who listened in. I
had a tremendously fun three hours, hope everyone else did as well! :)
I did- great work Toke and congratulations on the result! Wish there
were more interesting discussions like that. :)
Thanks! :)

-Toke
Pete Heist
2018-11-24 11:49:49 UTC
Reply
Permalink
Post by Dave Taht
Post by Pete Heist
Would it be right to say that the biggest opportunity for reducing
consumption is to avoid shaping, i.e. by adding BQL-like functionality
to all classes of device drivers
Shaping outbound with BQL's support for a dynamic interrupt would be
*free*. A few ethernet chips already have that. Basically you set a
register saying "you are really a 200Mbit interface, return a completion
interrupt after the equivalent of that amount of time has passed”.
Ok, for Intel I see something called “Interrupt Rate Limiting” on the XL710 which sets the number of microseconds between interrupts (section 4.2 in https://www.intel.com/content/dam/www/public/us/en/documents/reference-guides/xl710-x710-performance-tuning-linux-guide.pdf <https://www.intel.com/content/dam/www/public/us/en/documents/reference-guides/xl710-x710-performance-tuning-linux-guide.pdf>). I don’t think that’s exactly it though.

I also wanted to suggest that something “BQL-like” be added to WiFi (I already saw discussion of that in make-wifi-fast), ADSL (I guess that’s mostly proprietary stuff though?) or other techs where it’s needed, so that we stop shaping whenever possible, which as Toke mentioned in his defense is really a workaround anyway. I feel guilty now shaping.
Post by Dave Taht
I can neither remember what chips can do this already, or the name of
the bql feature that does it, this morning.
But it's a register you twiddle and a simple divider circuit.
It sounds like there won’t always be fine-grained control over the rate.
Post by Dave Taht
But outbound is not the problem for us from a heat generation standpoint

Actually, why is inbound shaping that much harder on the CPU than outbound?
Post by Dave Taht
Post by Pete Heist
and/or by deploying congestion control globally that avoids the need for it?
I think it would be interesting to compare energy per byte successfully
delivered across various technologies. Driving fiber lines is pretty
high energy, though, and I think (without a back of envelope handy),
that that would be far more expensive than shaping currently is.
still, adding 6 C to everybody's home router to shape inbound under
heavy load is pretty costly both in energy and reduced service life.
I’m intrigued, and care about this topic. A few watts on millions of devices might at least make some difference. Analyzing where we stand in terms of energy per byte for different techs, and also what shaping does to this, might be a place to start.
Post by Dave Taht
Post by Pete Heist
Other ideas: move queue management into hardware
I have increasingly high hopes for P4 and other forms of hardware to
finally do shaping and queue management right.
https://github.com/ralfkundel/p4-codel/issues/2
Back in the day, I was a huge fan of async logic, which I first
encountered via caltech's cpu and later the amulet.
https://en.wikipedia.org/wiki/Asynchronous_circuit#Asynchronous_CPU
This reduces power consumption enormously.
There are things that make so much sense as to seem that they must eventually happen, and this is one of those things.
Post by Dave Taht
Post by Pete Heist
power network
equipment with renewables, or just use the Internet less. :)
I am glad to see more of the former happening. A recent data center
design in singapore basically needed it's own nuclear power plant.
At least it doesn’t emit CO2. :) I’m in the process of trying to make a low-cost solar/battery setup for my home equipment. It seems that a system that works ~100% of the time can be much more expensive than one that works ~95% of the time, especially in central Europe’s winters where cloudy streaks can last weeks, so I’m probably accepting grid as a backup.
Post by Dave Taht
In my case I've always wanted the computing to take place under the
users fingers, I do not like the centralization trend we are in today at
all. I like that apple seems to be leading the way to be putting all
these cool new AI tools in your own hands.
As for the latter... I'm using browsers less now (emacs rocks), and
seem to be getting more done.
I’m with you, only ‘:s/emacs/vim/g’, but we won’t start that. :)
Holland, Jake
2018-11-27 18:14:01 UTC
Reply
Permalink
On 2018-11-23, 08:33, "Dave Taht" <***@taht.net> wrote:
Back in the day, I was a huge fan of async logic, which I first
encountered via caltech's cpu and later the amulet.

https://en.wikipedia.org/wiki/Asynchronous_circuit#Asynchronous_CPU

...

I've never really understood why it didn't take off, I think, in part,
it doesn't scale to wide busses well, and that centrally clocked designs
are how most engineers and fpgas and code got designed since. Anything
with delay built into it seems hard for EEs to grasp.... but I wish I
knew why, or had the time to go play with circuits again at a reasonable
scale.

At the time, I was told the objections they got were that it uses about 2x the space for the same functionality, and space usage is approximately linear with the chip cost, and when under load you still need reasonable cooling, so it was only considered maybe worthwhile for some narrow use cases.

I don't really know enough to confirm or deny the claim, and the use cases may have gotten a lot closer to a good match by now, but this was the opinion of at least some of the people involved with the work, IIRC.
Stephen Hemminger
2018-11-27 18:31:14 UTC
Reply
Permalink
On Tue, 27 Nov 2018 18:14:01 +0000
Post by Dave Taht
Back in the day, I was a huge fan of async logic, which I first
encountered via caltech's cpu and later the amulet.
https://en.wikipedia.org/wiki/Asynchronous_circuit#Asynchronous_CPU
...
I've never really understood why it didn't take off, I think, in part,
it doesn't scale to wide busses well, and that centrally clocked designs
are how most engineers and fpgas and code got designed since. Anything
with delay built into it seems hard for EEs to grasp.... but I wish I
knew why, or had the time to go play with circuits again at a reasonable
scale.
At the time, I was told the objections they got were that it uses about 2x the space for the same functionality, and space usage is approximately linear with the chip cost, and when under load you still need reasonable cooling, so it was only considered maybe worthwhile for some narrow use cases.
I don't really know enough to confirm or deny the claim, and the use cases may have gotten a lot closer to a good match by now, but this was the opinion of at least some of the people involved with the work, IIRC.
_______________________________________________
Bloat mailing list
https://lists.bufferbloat.net/listinfo/bloat
With asynchronous circuits there is too much unpredictablity and instability.
Seem to remember there are even cases where two inputs arrive at once and output is non-determistic.
Dave Taht
2018-11-27 19:09:44 UTC
Reply
Permalink
On Tue, Nov 27, 2018 at 10:31 AM Stephen Hemminger
Post by Stephen Hemminger
On Tue, 27 Nov 2018 18:14:01 +0000
Post by Dave Taht
Back in the day, I was a huge fan of async logic, which I first
encountered via caltech's cpu and later the amulet.
https://en.wikipedia.org/wiki/Asynchronous_circuit#Asynchronous_CPU
...
I've never really understood why it didn't take off, I think, in part,
it doesn't scale to wide busses well, and that centrally clocked designs
are how most engineers and fpgas and code got designed since. Anything
with delay built into it seems hard for EEs to grasp.... but I wish I
knew why, or had the time to go play with circuits again at a reasonable
scale.
At the time, I was told the objections they got were that it uses about 2x the space for the same functionality, and space usage is approximately linear with the chip cost, and when under load you still need reasonable cooling, so it was only considered maybe worthwhile for some narrow use cases.
And the pentultimate cost here was unpredictable and many power
states, hyperthreading (which is looking to die post spectre), and
things like ddpk which spin processors madly to keep up. I always
liked things like

I wish I knew more about what fulcrum did in their switch designs...

everybody knows I'm a fan of the mill cpu which has lots of little
optimizations close to each functional unit (among many other things
using virtual memory internally for everything,
and separating out the PLB (protection level buffer) from the TLB). I
would really like to bring back an era where cpus could context or
security level switch in 5 clocks.

Someday something like that will be built. Til then, the closest chip
to something I'd like to be working on for networks is how the xmos is
designed: https://en.wikipedia.org/wiki/XMOS#xCORE_multicore_microcontrollers
- or https://www.xmos.com/developer/silicon/xcore200-ethernet which
has 1MByte of single-clock sram on it.

"The xCORE architecture delivers, in hardware, many of the elements
that are usually seen in a real-time operating system (RTOS). This
includes the task scheduler, timers, I/O operations, and channel
communication. By eliminating sources of timing uncertainty
(interrupts, caches, buses and other shared resources), xCORE can
provide deterministic and predictable performance for many
applications. A task can typically respond in nanoseconds to events
such as external I/O or timers. This makes it possible to program
xCORE devices to perform hard real-time tasks that would otherwise
require dedicated hardware."

Nobody else's ethernet controllers work this way.
Post by Stephen Hemminger
Post by Dave Taht
I don't really know enough to confirm or deny the claim, and the use cases may have gotten a lot closer to a good match by now, but this was the opinion of at least some of the people involved with the work, IIRC.
_______________________________________________
Bloat mailing list
https://lists.bufferbloat.net/listinfo/bloat
With asynchronous circuits there is too much unpredictablity and instability.
Seem to remember there are even cases where two inputs arrive at once and output is non-determistic.
Yes, that was a big problem... in the 90s... but cpus *were*
successfully designed that didn't do that.

I am the sort of character that is totally willing to toss out decades
of evolution in chip design in order to get better SNR for wireless.
:)

I wish I knew of a mailing list where I could get a definitive answer
on "modern problems with async circuits", or an update on the kind of
techniques the new AI chips were using to keep their power consumption
so low. I'll keep googling.
Post by Stephen Hemminger
_______________________________________________
Bloat mailing list
https://lists.bufferbloat.net/listinfo/bloat
--
Dave Täht
CTO, TekLibre, LLC
http://www.teklibre.com
Tel: 1-831-205-9740
Pete Heist
2018-11-27 22:07:12 UTC
Reply
Permalink
Post by Dave Taht
I wish I knew of a mailing list where I could get a definitive answer
on "modern problems with async circuits", or an update on the kind of
techniques the new AI chips were using to keep their power consumption
so low. I'll keep googling.
I’d be interested in knowing this as well. This gives some examples of async circuits: https://web.stanford.edu/class/archive/ee/ee371/ee371.1066/lectures/lect_12.pdf <https://web.stanford.edu/class/archive/ee/ee371/ee371.1066/lectures/lect_12.pdf>

Page 43, “Bottom Line” mentions that asynchronous design has “some delay matching / overhead issues”. Apparently delay matching means getting the signal outputs on two separate paths to arrive at the same time(?) Presumably overhead refers to the 2x space on the die previously mentioned, for completion detection. Pages 23-25 on “data-bundling constraints” might also highlight some other challenges. Some more current material would be interesting though...
Jonathan Morton
2018-11-27 22:33:45 UTC
Reply
Permalink
Post by Dave Taht
I wish I knew of a mailing list where I could get a definitive answer
on "modern problems with async circuits", or an update on the kind of
techniques the new AI chips were using to keep their power consumption
so low. I'll keep googling.
I’d be interested in knowing this as well. This gives some examples of async circuits: https://web.stanford.edu/class/archive/ee/ee371/ee371.1066/lectures/lect_12.pdf
Page 43, “Bottom Line” mentions that asynchronous design has “some delay matching / overhead issues”. Apparently delay matching means getting the signal outputs on two separate paths to arrive at the same time(?) Presumably overhead refers to the 2x space on the die previously mentioned, for completion detection. Pages 23-25 on “data-bundling constraints” might also highlight some other challenges. Some more current material would be interesting though...
The area overhead is at least partly mitigated by the major advantage of not having to distribute and gate a coherent clock signal across the entire chip. I half-remember seeing a quote that distributing the clock represents about 30% of the area and/or power consumption of a modern deep-sub-micron design. This is area and power that is not directly contributing to functionality.

Generally there are two major styles of asynchronous logic:

1: Standard combinatorial logic stages accompanied by self-timing circuits with a matched delay, generally known as "bundled data". This style has little overhead (probably less than the clock distribution it replaces) but requires local timing closure (the timing circuit must have strictly *more* delay than the logic it accompanies) to assure correct functionality. I suspect that achieving local timing closure is easier than the global timing closure required by conventional synchronous logic.

2: Dual-rail QDI logic, in which completion is explicitly signalled by the arrival of a result. This almost completely eliminates timing closure from the logic correctness equation, but the area overhead can be substantial. Achieving maximum performance in this style can also be challenging, but suitable approaches do exist, eg:

https://brej.org/papers/mapld.pdf

Both styles can inherently adapt timings to thermal and voltage conditions within a design range without much explicit provisioning, and typically have much cleaner power load and EMI characteristics than synchronous logic. But as you can see from the above, the downsides typically associated with async logic tend to apply to one or the other of the styles, not to both at once.

- Jonathan Morton
Jonathan Morton
2018-11-27 22:36:25 UTC
Reply
Permalink
Just to add - I think the biggest impediment to experimentation in asynchronous logic is the complete absence of convenient Muller C-element gates in the 74-series logic family. If you want to build some, I recommend using NAND and OR gates as inputs to active-low SR flipflops.

- Jonathan Morton
Dave Taht
2018-11-28 07:23:55 UTC
Reply
Permalink
Post by Jonathan Morton
Just to add - I think the biggest impediment to experimentation in
asynchronous logic is the complete absence of convenient Muller
C-element gates in the 74-series logic family. If you want to build
some, I recommend using NAND and OR gates as inputs to active-low SR
flipflops.
Need millions of transistors, not dozens. :)

To me the biggest barrier is in tools. I'm still looking for the caltech
tool and language which really helped in thinking in this way, and I did
find it on github once, and it still seemed developed....

And the field is not entirely dead, after all. I keep meaning to pick up
one of the new risc-v boards. Here's a async design of the risc-v... in
GO of all things. (I also really hate the universal adoption of java
amongst the circuit design folk... and I really loved the prospects of
chisel, except for the jvm dependency):

https://www.inf.pucrs.br/~calazans/publications/2017_MarcosSartori_EoTW.pdf

in the risc-v world, well, it's still trundling forward.

https://www.lowrisc.org/about/

This is pretty neat - standby is 2uA:

https://greenwaves-technologies.com/en/gap8-product/

And pulp is pretty neat.

https://pulp-platform.org//

Still, I liked xmos's stuff... rexcomputing hasn't surfaced in a while

In the last weird hardware embedded news of the day, you can get a old
intel compute stick for 34 dollars on ebay.

https://www.ebay.com/p/Intel-Compute-Stick-STCK1A8LFC-Intel-Atom-Z3735F-1-33GHz-8GB-PC-Stick-BOXSTCK1A8LFC/11020833331?iid=153273128090&chn=ps

they were painfully slow but fit on your keychain. The most modern
version of this design is
https://www.amazon.com/Intel-Compute-Computer-processor-BOXSTK2m3W64CC/dp/B01AZC4IKK/ref=sr_1_4?s=electronics&ie=UTF8&qid=1543389564&sr=1-4&keywords=intel+compute+stick

2 cores, 4MB of cache, 64GB of flash... on your keychain.

I rather miss vga in that it would be better to be able to screw these in...
Holland, Jake
2018-11-27 19:11:00 UTC
Reply
Permalink
On 2018-11-27, 10:31, "Stephen Hemminger" <***@networkplumber.org> wrote:
With asynchronous circuits there is too much unpredictablity and instability.
Seem to remember there are even cases where two inputs arrive at once and output is non-determistic.

IIRC they talked about that some too. I think maybe some papers were going back and forth. But last I heard, they proved that this is not a real objection, in that:
1. you can quantify the probability of failure and ensure a design keeps it under threshold when operating within specified conditions (e.g. normal temperature and voltage thresholds)
2. you can work around the issues where it's critical by adding failure detection and faults, and
3. you have the exact same fundamental theoretical problem with synchronous circuits, particularly in registers that can keep a value through a clock cycle, but it hasn't stopped them from being useful.

I'm not an expert and this was all a long time ago for me, but the qdi wiki page doesn't disagree with what I'm remembering here, and has some good references on the topic:
https://en.wikipedia.org/wiki/Quasi-delay-insensitive_circuit#Stability_and_non-interference
https://en.wikipedia.org/wiki/Quasi-delay-insensitive_circuit#Timing
Loading...