Discussion:
[Bloat] known buffer sizes on switches
Dave Taht
2018-11-24 23:29:08 UTC
Permalink
https://people.ucsc.edu/~warner/buffer.html
--
Dave Täht
CTO, TekLibre, LLC
http://www.teklibre.com
Tel: 1-831-205-9740
Mikael Abrahamsson
2018-11-25 06:44:33 UTC
Permalink
Post by Dave Taht
https://people.ucsc.edu/~warner/buffer.html
Nice resource, thanks.

If someone wonders why things look the way they do, so it's all about
on-die and off-die memory. Either you use off-die or on-die memory, often
SRAM which requires 6 gates per bit. So spending half a billion gates
gives you ~10MB buffer on-die. If you're doing off-die memory (DRAM or
similar) then you'll get the gigabytes of memory seen in some equipment.
There basically is nothing in between. As soon as you go off-die you might
as well put at least 2-6 GB in there.

Also, off-die memory takes IO capacity. A forwarding chip might have 4
"sides" with I/O lanes sets. If you put it in a 1RU device with no buffer,
you can connect ports to all of the lanes. This gives you a very high port
density low buffer size device and a very good price point.

Now, if you want more buffer and more route memory (taking one "side"
each) plus connecting it to a backplane (another side), you now only have
a single "side" left for ports. This is why high route-count, high buffer,
modular switches are so much more expensive compared low-route,
low-buffer, fixed configuration ones.

Above is principle, there are of course combinations and optimizations to
be made so not all devices adhere exactly to the above.
--
Mikael Abrahamsson email: ***@swm.pp.se
Bruno George Moraes
2018-11-28 16:32:10 UTC
Permalink
Post by Mikael Abrahamsson
Nice resource, thanks.
If someone wonders why things look the way they do, so it's all about
on-die and off-die memory. Either you use off-die or on-die memory, often
SRAM which requires 6 gates per bit. So spending half a billion gates
gives you ~10MB buffer on-die. If you're doing off-die memory (DRAM or
similar) then you'll get the gigabytes of memory seen in some equipment.
There basically is nothing in between. As soon as you go off-die you might
as well put at least 2-6 GB in there.
There are some reasearch on new memory devices with unexpected results...
https://ieeexplore.ieee.org/document/8533260

The HMC memory allows improvements in execution time and consumed energy.
Post by Mikael Abrahamsson
In some situations, this memory type permits removing the L2 cache from the
memory hierarchy.
HMC parts start at 2GB
Dave Taht
2018-11-28 16:55:17 UTC
Permalink
Post by Mikael Abrahamsson
Nice resource, thanks.
If someone wonders why things look the way they do, so it's all about
on-die and off-die memory. Either you use off-die or on-die memory, often
SRAM which requires 6 gates per bit. So spending half a billion gates
gives you ~10MB buffer on-die. If you're doing off-die memory (DRAM or
similar) then you'll get the gigabytes of memory seen in some equipment.
There basically is nothing in between. As soon as you go off-die you might
as well put at least 2-6 GB in there.
There are some reasearch on new memory devices with unexpected
results...
https://ieeexplore.ieee.org/document/8533260
The HMC memory allows improvements in execution time and consumed
energy. In some situations, this memory type permits removing the
L2 cache from the memory hierarchy.
HMC parts start at 2GB
Thank you for that. I do have a long standing dream of a single chip
wifi router, with the lowest SNR possible, and the minimum number of
pins coming off of it. I'd settle for 32MB of (static?) ram on chip as
that has proven sufficient to date to drive 802.11n....

which would let you get rid of both the L2 and L1 cache. That said, I
think the cost of 32MB of on-chip static ram remains a bit high, and
plugging it into a mips cpu, kind of silly. Someday there will be a case
to just doing everything on a single chip, but...
Post by Mikael Abrahamsson
_______________________________________________
Bloat mailing list
https://lists.bufferbloat.net/listinfo/bloat
David Collier-Brown
2018-11-28 18:26:06 UTC
Permalink
Post by Dave Taht
Thank you for that. I do have a long standing dream of a single chip
wifi router, with the lowest SNR possible, and the minimum number of
pins coming off of it. I'd settle for 32MB of (static?) ram on chip as
that has proven sufficient to date to drive 802.11n....
which would let you get rid of both the L2 and L1 cache. That said, I
think the cost of 32MB of on-chip static ram remains a bit high, and
plugging it into a mips cpu, kind of silly. Someday there will be a case
to just doing everything on a single chip, but...
I could see 32MB or more of fast memory on-chip as being attractive when
one is fighting with diminishing returns in CPU speed and program
parallelizability.

In the past that might have excited MIPS, but these days less so. Maybe
ARM? IBM?

--dave
--
David Collier-Brown, | Always do right. This will gratify
System Programmer and Author | some people and astonish the rest
***@spamcop.net | -- Mark Twain
Dave Taht
2018-11-28 19:02:01 UTC
Permalink
I really don't know a whole heck of a lot about where mips is going.
Certainly they remain strong in the embedded market (I do like the
edgerouter X a lot), but as for their current direction or future
product lines, not a clue.

I used to know someone over there, maybe he's restored new directions.
Last I recall he was busy obsoleting a whole lot of instruction space
in order to make room for "new stuff". He'd even asked me if adding an
invsqrt to the instruction set would help, and I sadly replied that
that bit of codel was totally invisible on a trace.....

I really like(d) mips. ton of registers, better instruction set than
arm (IMHO), no foolish processor extensions.
Post by David Collier-Brown
Post by Dave Taht
Thank you for that. I do have a long standing dream of a single chip
wifi router, with the lowest SNR possible, and the minimum number of
pins coming off of it. I'd settle for 32MB of (static?) ram on chip as
that has proven sufficient to date to drive 802.11n....
which would let you get rid of both the L2 and L1 cache. That said, I
think the cost of 32MB of on-chip static ram remains a bit high, and
plugging it into a mips cpu, kind of silly. Someday there will be a case
to just doing everything on a single chip, but...
I could see 32MB or more of fast memory on-chip as being attractive when
one is fighting with diminishing returns in CPU speed and program
parallelizability.
In the past that might have excited MIPS, but these days less so. Maybe
ARM? IBM?
--dave
--
David Collier-Brown, | Always do right. This will gratify
System Programmer and Author | some people and astonish the rest
_______________________________________________
Bloat mailing list
https://lists.bufferbloat.net/listinfo/bloat
--
Dave Täht
CTO, TekLibre, LLC
http://www.teklibre.com
Tel: 1-831-205-9740
David Collier-Brown
2018-11-28 20:34:53 UTC
Permalink
That would be really cool: I loved the Mips we had at YorkU.ca

--dave
Post by Dave Taht
I really don't know a whole heck of a lot about where mips is going.
Certainly they remain strong in the embedded market (I do like the
edgerouter X a lot), but as for their current direction or future
product lines, not a clue.
I used to know someone over there, maybe he's restored new directions.
Last I recall he was busy obsoleting a whole lot of instruction space
in order to make room for "new stuff". He'd even asked me if adding an
invsqrt to the instruction set would help, and I sadly replied that
that bit of codel was totally invisible on a trace.....
I really like(d) mips. ton of registers, better instruction set than
arm (IMHO), no foolish processor extensions.
Post by David Collier-Brown
Post by Dave Taht
Thank you for that. I do have a long standing dream of a single chip
wifi router, with the lowest SNR possible, and the minimum number of
pins coming off of it. I'd settle for 32MB of (static?) ram on chip as
that has proven sufficient to date to drive 802.11n....
which would let you get rid of both the L2 and L1 cache. That said, I
think the cost of 32MB of on-chip static ram remains a bit high, and
plugging it into a mips cpu, kind of silly. Someday there will be a case
to just doing everything on a single chip, but...
I could see 32MB or more of fast memory on-chip as being attractive when
one is fighting with diminishing returns in CPU speed and program
parallelizability.
In the past that might have excited MIPS, but these days less so. Maybe
ARM? IBM?
--dave
--
David Collier-Brown, | Always do right. This will gratify
System Programmer and Author | some people and astonish the rest
_______________________________________________
Bloat mailing list
https://lists.bufferbloat.net/listinfo/bloat
--
David Collier-Brown, | Always do right. This will gratify
System Programmer and Author | some people and astonish the rest
***@spamcop.net | -- Mark Twain
Dave Taht
2018-11-29 02:33:38 UTC
Permalink
Post by Dave Taht
Post by Mikael Abrahamsson
Nice resource, thanks.
If someone wonders why things look the way they do, so it's all about
on-die and off-die memory. Either you use off-die or on-die memory, often
SRAM which requires 6 gates per bit. So spending half a billion gates
gives you ~10MB buffer on-die. If you're doing off-die memory (DRAM or
similar) then you'll get the gigabytes of memory seen in some equipment.
There basically is nothing in between. As soon as you go off-die you might
as well put at least 2-6 GB in there.
There are some reasearch on new memory devices with unexpected results...
https://ieeexplore.ieee.org/document/8533260
The HMC memory allows improvements in execution time and consumed
energy. In some situations, this memory type permits removing the
L2 cache from the memory hierarchy.
HMC parts start at 2GB
That effort actually looks pretty promising. I liked the support for
atomic ops too, offloaded.There are also so many useful operations
that I'd like to see offloaded to ram - like zeroing memory regions as
one example.

http://www.hybridmemorycube.org/

Will probably run hot. But: grump: I still don't "get" why the
traditional division between memory and cpu makers hasn't collapsed
yet. A package like that
with a cpu *in it*, and we're done. 4GB "ought to be enough for everybody".

27? years ago, back when I was attempting to write a SF novel, I had
an idea for a more efficient way to pack cores and memory together.
Basically: shrink the cray 1 design down to about the size of a nickel
(or dime!).

The cray had that rough shape for optimum routing and cooling, but...
the overall shape of the package becomes a hexagon
(https://en.wikipedia.org/wiki/Hexagon) cylinder. That gives you 6 or
12 vertical flat surfaces to mount chips on (or just let them stand in
slots on the package). There's one natural crossbar bus at the center,
connecting the 6 "core" chips more rapidly than the edges. Top, bottom
and sides of the package can be used for I/O, power and so on, and
each hexagonal component wedged tightly together (instead of today's
north-south east-west architectures you get 2 more dimensions
horizontally)

fill the package with some sort of coolant. Seal it up tight. Test the
module as a whole and ship 'em in palletloads. I'm pretty sure the
heat circulates from the center out naturally, in every orientation,
but what the heck, stick in some MEMs fans in there to keep things
pumping along.

that design naturally led to 2 cpu chips and 4 memories. Or 4 cpu
chips and 2 memories. or 2 cpus 2 mems and 2 IOs. Before you started
coming up with things to do with the outer 6 sides.

I I never thought separating ram from cpu by more than a millimeter
was a good idea.....

It's a quite a jump to envision going from the cray-1 (115kw!!!) down
to the size of a nickel!

But everybody has a cray-1 now. They just run too hot. And are often
not suited to task, just like the cray was.

https://en.wikipedia.org/wiki/Cray-1

Don't know if anyone's ever tried to pattern any circuits on a cylinder though!

We are certainly seeing a lot of multi-package modules now (like in
epyc) but I'd like 'em to be taller and not need so many darn pins. A
full blown wifi
router on chip wouldn't need more than... oh... this many pins:

https://www.amazon.com/Makerfocus-ESP8266-Wireless-Transceiver-Compatible/dp/B01EA3UJJ4/ref=asc_df_B01EA3UJJ4/?tag=hyprod-20&linkCode=df0&hvadid=309773039951&hvpos=1o1&hvnetw=g&hvrand=15072864816819105911&hvpone=&hvptwo=&hvqmt=&hvdev=c&hvdvcmdl=&hvlocint=&hvlocphy=9032156&hvtargid=pla-599566692924&psc=1
Post by Dave Taht
Thank you for that. I do have a long standing dream of a single chip
wifi router, with the lowest SNR possible, and the minimum number of
pins coming off of it. I'd settle for 32MB of (static?) ram on chip as
that has proven sufficient to date to drive 802.11n....
which would let you get rid of both the L2 and L1 cache. That said, I
think the cost of 32MB of on-chip static ram remains a bit high, and
plugging it into a mips cpu, kind of silly. Someday there will be a case
to just doing everything on a single chip, but...
Post by Mikael Abrahamsson
_______________________________________________
Bloat mailing list
https://lists.bufferbloat.net/listinfo/bloat
_______________________________________________
Bloat mailing list
https://lists.bufferbloat.net/listinfo/bloat
--
Dave Täht
CTO, TekLibre, LLC
http://www.teklibre.com
Tel: 1-831-205-9740
Continue reading on narkive:
Loading...