Discussion:
[Bloat] Reasons to prefer netperf vs iperf?
Rich Brown
8 years ago
Permalink
As I browse the web, I see several sets of performance measurement using either netperf or iperf, and never know if either offers an advantage.

I know Flent uses netperf by default: what are the reason(s) for selecting it? Thanks.

Rich
Dave Taht
8 years ago
Permalink
Post by Rich Brown
As I browse the web, I see several sets of performance measurement using either netperf or iperf, and never know if either offers an advantage.
I know Flent uses netperf by default: what are the reason(s) for selecting it? Thanks.
* Netperf

+ netperf is the preferred network stress tool of the linux kernel devs.
+ the maintainer is responsive and capable
+ the code is very fast with nearly no compromises on speed or accuracy
we've successfully used it to 40GigE
+ the code is also very portable
+ one explicitly versioned version. When you use netperf, you know you
are using netperf.

- netperf has a pre-OSI (1993) license which makes for default
inclusion in debian impossible and elsewhere sometimes dicy
- netperf does not have a way to send timestamps within flows
- it is very hard to add new tests to netperf
- it's test negotiation protocol is less than fully documented and can
break between releases (and I'm being kind here)
- it could use better real-time support

* iperf
+ More widely available
- "Academic" code, often with papers not citing the specific version used
- I have generally not trusted the results published either - but
aaron finding that major bug in iperf's udp measurements explains a
LOT of that. I think.
- Has, like, 3-8 non-interoperable versions.
- is available in java, for example

there *might* be an iperf version worth adopting but I have no idea
which one it would be.

I started speccing out a flent specific netperf/iperf replacement
*years* ago, (twd), but the enormous amount of money/effort required
to do it right caused me to dump the project. Also, at the time
(because of the need for reliable high speed measurements AND for
measurements on obscure, weak, cpus) my preferred language was going
to be C, and that too raised the time/money metric into the
stratosphere.

I had some hope of leveraging owamp one day, but I have more hope now
of leveraging the rapidly maturing infrastructure around go, http2,
and quic.

there are other tools (ndt for example), but getting past that high
speed and weird low end cpu requirement were showstoppers, and remains
so.

I've been fiddling with esr's new "loccount" tool - both to teach
myself go, and deeply understand the (deeply flawed) COCOMO software
development model, and according to it, replacing netperf would cost:

***@nemesis:~/git/netperf$ loccount -c .
all 50968 (100.00%) in 112 files
c 41739 (81.89%) in 48 files
shell 5125 (10.06%) in 22 files
python 2376 (4.66%) in 5 files
m4 895 (1.76%) in 11 files
autotools 767 (1.50%) in 9 files
awk 66 (0.13%) in 2 files
Total Physical Source Lines of Code (SLOC) = 50968
Development Effort Estimate, Person-Years (Person-Months) = 12.41 (148.89)
(Basic COCOMO model, Person-Months = 2.40 * (KSLOC**1.05))
Schedule Estimate, Years (Months) = 1.39 (16.74)
(Basic COCOMO model, Months = 2.50 * (person-months**0.38))
Estimated Average Number of Developers (Effort/Schedule) = 8.90
Total Estimated Cost to Develop = $1798148
(average salary = $60384/year, overhead = 2.40).

...

loccount is a remarkable improvement in speed over "sloccount" (aside
from I/O the code is "embarrassingly parallel" and scales beautifully
as a function of the number of cores), and has thus far been quite
useful for me finally beginning to grok go.

Get it at:

git clone https://gitlab.com/esr/loccount

And the effort in actually understanding the COCOMO model I hope will
one day pay off by trying to come up with a model that more accurately
models theory costs, development time, maintenence and refactoring
costs.

(that said, if anyone out there is aware of the state of the art in
this - and has code ), I'd appreciate it. What I'd wanted to do was
begin to leverage the oft-published lwn stats on kernel development
(and churn) and try to see what that costs.

(I'm mostly amusing myself by applying age old techniques to go to
speed up loccount, rather than worrying about the model. Everybody
needs to just relax and do something like this once in a while)
Post by Rich Brown
Rich
_______________________________________________
Bloat mailing list
https://lists.bufferbloat.net/listinfo/bloat
--
Dave Täht
Let's go make home routers and wifi faster! With better software!
http://blog.cerowrt.org
Aaron Wood
8 years ago
Permalink
Post by Rich Brown
Post by Rich Brown
As I browse the web, I see several sets of performance measurement using
either netperf or iperf, and never know if either offers an advantage.
Post by Rich Brown
I know Flent uses netperf by default: what are the reason(s) for
selecting it? Thanks
*netperf
+supports multiple tests in parallel on the same server
Post by Rich Brown
* iperf
+ More widely available
Sort of... Given the variants, less so. But iperf3 is coded to be pretty
portable, and so it's pretty widely available

It has a pretty good JSON format for the client results, but the server
results are returned in plain text. And it doesn't report anything
finer-grained than 100ms.

- I have generally not trusted the results published either - but
Post by Rich Brown
aaron finding that major bug in iperf's udp measurements explains a
LOT of that. I think.
I've found something else with it, that I need to write up: with UDP the
application-layer pacing and the fq socket pacing cause it to report a lot
of invalid packet loss. The application pacing is focused on layer-6
good-put. But the fq pacing appears to be enforcing wire-rates (or
calculated ethernet rates), and so with small packets (64 byte payloads),
it's typical to see 40% packet loss (as the fq layer discards the UDP
frames to cut say 100Mbps of application layer down to 100Mpbs of wire
rate). I need to actually do the tcpdump analysis of that and get it
written up.
Post by Rich Brown
- Has, like, 3-8 non-interoperable versions.
- is available in java, for example
there *might* be an iperf version worth adopting but I have no idea
which one it would be.
Part of the issue with iPerf is that there are two main variants: iperf
and iperf3. iperf3 is currently maintained by the ESNET folks, and their
use-case is wildly different from ours:

- Very high bandwidth (>=100Gbps)
- Latency insensitive (long-running bulk data transfers)
- private networks (jumbo frames are an assumed use)

I'm also happy to take the fork of it that I have (
https://github.com/woody77/iperf) and make that tuned for our uses. There
are certain aspects that I wouldn't want to dive into changing at the
moment (like the single-threaded nature of the server). But I can easily
bang on the corners and get it's defaults better suited for our uses, and
make it behave better in the face of running without offloads. On my test
boxes, it starts to get I/O limited around 4-5 Gbps when using 1400-byte
UDP payloads. With TCP and TSO, it's merrily runs at 9.4Gbps of good-put
over a 10Gbps NIC.

But the application and kernel-level "packet rate" is really quite
different at that point. By default it's dropping 128KB blocks into the
TCP send buffer, and letting the TCP stack and offloads do their thing.

At the higher-performance end of things, I think it would benefit from
using sendmmsg()/recvmmsg() on platforms that support them. I think that
would let it better work with fq pacing at rates of 5Gbps and up.

I started speccing out a flent specific netperf/iperf replacement
Post by Rich Brown
*years* ago, (twd), but the enormous amount of money/effort required
to do it right caused me to dump the project. Also, at the time
(because of the need for reliable high speed measurements AND for
measurements on obscure, weak, cpus) my preferred language was going
to be C, and that too raised the time/money metric into the
stratosphere.
iperf3's internal structure might be useful for bootstrapping a project
like that. It already has all the application logic infrastructure, and
has a central timer heap (which can be made more efficient/exact), and it
already has a notion of different kinds of workloads. It wouldn't be too
much work to make its tests more modular.

-Aaron

Loading...