[HPGMG Forum] More thoughts on the Kaviats

Jed Brown jed at jedbrown.org
Mon Jun 9 12:36:32 UTC 2014

Theodore Omtzigt <theo at stillwater-sc.com> writes:

> As always, depending on the question you need to answer, you want to see
> different Kaviat graphs. As a hardware designer, I would like to see how
> well a particular resource allocation is helping the performance of an
> application. As all machines have a common collection of resources, the
> Kaviat's that I am interested would capture those, in particular this list:
> FPU sp peak throughput
> FPU dp peak  (big difference in silicon allocation)
> IU peak throughput
> L1 bw peak (latency is governed by core clock, but bw is parallelism on
> the cache ports)
> L2 bw peak (latency becomes an issue further away, but most L2s are
> supporting fast L1 refresh)
> L3 bw peak
> L3 size
> FSB bw peak
> Memory latency (for low-concurrency kernels this is the bottleneck)
> Memory bw peak
> Network latency
> Network bw peak
> Reduction peak throughput
> Broadcast peak throughput
> The beauty of this list is that it is uniform across all machines,
> including non-von Neumann. 

This seems to imply a particular cache hierarchy, which may not be
universal.  I think that to reword your request, we would like to see
the DERIVATIVE of application performance with respect to each of these
attributes.  I wrote exactly these words in emails this winter, but I
don't know how to measure this derivative.  Does there exist a hardware
platform or simulator capable of incrementally crippling one attribute
at a time?

If we can measure these quantities for the suite of apps and benchmarks
on different machines, we could make a very useful interactive
javascript presentation.
