[HPGMG Forum] More thoughts on the Kaviats

Theodore Omtzigt theo at stillwater-sc.com
Mon Jun 9 12:10:20 UTC 2014


Barry:

  The pro for normalization would be that it is empirical and thus a
complex mixture of software and hardware interactions. The resulting
curves would indeed be perfect for comparisons of scaling. But that
empirical normalization could also be argued is a con as it does not
provide any insight what resource in the machine is causing more or less
scaling capability.

As always, depending on the question you need to answer, you want to see
different Kaviat graphs. As a hardware designer, I would like to see how
well a particular resource allocation is helping the performance of an
application. As all machines have a common collection of resources, the
Kaviat's that I am interested would capture those, in particular this list:

FPU sp peak throughput
FPU dp peak  (big difference in silicon allocation)
IU peak throughput
L1 bw peak (latency is governed by core clock, but bw is parallelism on
the cache ports)
L2 bw peak (latency becomes an issue further away, but most L2s are
supporting fast L1 refresh)
L3 bw peak
L3 size
FSB bw peak
Memory latency (for low-concurrency kernels this is the bottleneck)
Memory bw peak
Network latency
Network bw peak
Reduction peak throughput
Broadcast peak throughput

The beauty of this list is that it is uniform across all machines,
including non-von Neumann. It focuses on the actual hardware that
delivers computation. Most hardware is designed from this core on out,
that is, you set a particular FPU throughput rate, and from that point
on you are trying to remove resource bottlenecks till you can reach that
throughput rate on the key kernels you selected.

These Kaviats would also be helpful for the application community as it
shows you the resource efficiency of your algorithm for a particular
machine.

Theo
 
On 6/9/2014 7:30 AM, hpgmg-forum-request at hpgmg.org wrote:
> Send HPGMG-Forum mailing list submissions to
> 	hpgmg-forum at hpgmg.org
>
> To subscribe or unsubscribe via the World Wide Web, visit
> 	https://hpgmg.org/lists/listinfo/hpgmg-forum
> or, via email, send a message with subject or body 'help' to
> 	hpgmg-forum-request at hpgmg.org
>
> You can reach the person managing the list at
> 	hpgmg-forum-owner at hpgmg.org
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of HPGMG-Forum digest..."
>
>
> Today's Topics:
>
>    1. Re:  HPGMG release v0.1 (Barry Smith)
>    2. Re:  HPGMG release v0.1 (Jed Brown)
>    3. Re:  HPGMG release v0.1 (Jed Brown)
>    4. Re:  HPGMG release v0.1 (Jed Brown)
>    5.  Reuse of Kiviat diagram? (Karl Rupp)
>    6. Re:  Reuse of Kiviat diagram? (Jed Brown)
>    7. Re:  HPGMG release v0.1 (Mark Adams)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Sun, 8 Jun 2014 21:01:51 -0500
> From: Barry Smith <bsmith at mcs.anl.gov>
> To: Mark Adams <mfadams at lbl.gov>
> Cc: HPGMG Forum <hpgmg-forum at hpgmg.org>
> Subject: Re: [HPGMG Forum] HPGMG release v0.1
> Message-ID: <A89223F3-D0E8-41A7-9FA8-2E33E82D5A50 at mcs.anl.gov>
> Content-Type: text/plain; charset="windows-1252"
>
>
>   Mark,
>
>    Absolutely right and I noted in my first email, the problem is people cannot stop themselves from doing the comparison even when they know that it is wrong. An extreme response might be to normalize the curves from each machine so that they all start at the same point and then the only visible information would be the scaling for each machine, not that one curve is consistently above another curve (because of fatter nodes or whatever). Hmm, maybe that is not a bad idea?
>
>    Barry
>
> On Jun 8, 2014, at 8:46 PM, Mark Adams <mfadams at lbl.gov> wrote:
>
>> This is a great discussion.  
>>
>> As a more immediate matter, we should make clear that these are superimposed plots and a "socket" is not comparable in general (Edison and Peregrine in fact are).  These plots do not imply that a Cray XC30/Aries is a better HPGMG machine than BG/Q or K, for instance, but you can see that Aries is scaling noticeably better than Gemini and Infiniband on HPGMG (and HPL would probably not distinguish these interconnects).  We should try to make that clear in presentations and perhaps we (ie, Jed) could add a little disclaimer to this effect on this web page, seeing as it has gotten so much attention here.
>>
>> Mark
>>
>>
>> On Sun, Jun 8, 2014 at 3:12 PM, Brian Van Straalen <bvstraalen at lbl.gov> wrote:
>> Since ?nodes?  seems to be the standard unit of allocation on compute platforms I still prefer nodes as a scaling.  When I start getting charged by the Joule I will desire an unambiguous way of measuring Joules to make sure the users are treated fairly.  
>>
>> I know that my own host site NERSC uses hours*nodes*cores/node  which would seem to indicate people are core-counting, but perhaps Edison is the last of the truly Fat core platforms we will see and we will go back to allocation awards being in units of nodes.
>>
>> to get around core/node/socket/accelerator/etc  you would probably have to drop all the way down to transistors, or better, die area.  Even that can be complicated for how you sum up SoC real estate.
>>
>> If the various computing centers can figure out how to normalize their users across platforms then we should use the same normalization.
>>
>> Brian
>>
>> On Jun 8, 2014, at 7:54 AM, Constantinos Evangelinos <cevange at us.ibm.com> wrote:
>>
>>> In my mind at least users ask a queuing system in most cases for nodes as node sharing is discouraged for obvious reasons. So nodes seems to me to be the most useful x-axis choice. Cores is problematic as (a) they get added in large block increments and (b) it stretches the axes a lot even without thinking of GPUs with the relatively wimpy cores in BG/Q and Xeon Phi.
>>>
>>> Constantinos 
>>>
>>> Sent from my iPhone so please excuse any typing errors
>>>
>>>> On Jun 7, 2014, at 7:01 PM, "Sam Williams" <swwilliams at lbl.gov> wrote:
>>>>
>>>> Nominally, there would be a paragraph describing the setup for a figure.  For this data, the x-axis is what is colloquially defined today as a numa node on the Cray machines.  There is one process per numa node.  Thus, for all of these machines, there is one process per chip.
>>>>
>>>> K = 1 processes per compute node, 8 threads per process
>>>> BGQ = 1 process per compute node, 64 threads per process
>>>> Edison = 2 processes per compute node, 12 threads per process
>>>> Peregrine = 2 processes per compute node, 12 threads per process
>>>> Hopper = 4 processes per compute node, 6 threads per process
>>>>
>>>>
>>>>
>>>>
>>>> On Jun 7, 2014, at 3:51 PM, Barry Smith <bsmith at mcs.anl.gov> wrote:
>>>>
>>>>>  I submit that even nodes or ?sockets? is actually not completely unambiguous
>>>>>
>>>>> On Jun 7, 2014, at 5:39 PM, Jeff Hammond <jeff.science at gmail.com> wrote:
>>>>>
>>>>>> On Sat, Jun 7, 2014 at 3:35 PM, Barry Smith <bsmith at mcs.anl.gov> wrote:
>>>>>>> On Jun 7, 2014, at 5:31 PM, Jeff Hammond <jeff.science at gmail.com> wrote:
>>>>>>>
>>>>>>>> On Sat, Jun 7, 2014 at 3:26 PM, Barry Smith <bsmith at mcs.anl.gov> wrote:
>>>>>>>>> The use of multicore processor == sockets as the independent variable in the plot of aggregate performance seems arbitrary. Though people should not use this kind of plot to compare machines they will. Now just change sockets to nodes and boom suddenly the machines compare very differently (since some systems have two sockets per node and some one). Should cores be used instead? Or hardware threads? Or cores scaled by their clock speed? Or hardware floating point units (scaled by clock speed?) ? Or number of instruction decorders? Power usage? Cost? etc etc. Maybe have a dynamic plot where one can switch the independent variable by selecting from a menu or moving the mouse over choices ?.?
>>>>>>>  Yes, but how do we measure power? The actual amount being pulled from the ?wall socket?? Is that possible? Like the various hardware features you mention I wouldn?t trust anything the vendor says about power.
>>>>>> Assuming you run on more than one node, just use the total machine
>>>>>> power that is used by Green500.  Granted, that is not ideal since it
>>>>>> won't be measured for the same code, but at least there is a
>>>>>> well-defined procedure for measuring it and hopefully it is at least
>>>>>> roughly comparable between systems.
>>>>>>
>>>>>> But I agree that power is nearly as hard to get exactly right as
>>>>>> anything else besides counting nodes.  That is about the only
>>>>>> independent variable that seems unambiguous.
>>>>>>
>>>>>> Jeff
>>>>>>
>>>>>>>> The last suggestion is obviously the best one since it is the most
>>>>>>>> general, but I think power is the best choice of independent variable.
>>>>>>>> Most of the other hardware features are bad choices because it is very
>>>>>>>> hard to determine some of these.  What is the clock speed of an Intel
>>>>>>>> socket that does dynamic frequency scaling?  How do you count cores on
>>>>>>>> a GPU?  NVIDIA's core-counting methodology is complete nonsense...
>>>>>>>>
>>>>>>>> Best,
>>>>>>>>
>>>>>>>> Jeff
>>>>>>>>
>>>>>>>>
>>>>>>>>> On Jun 7, 2014, at 4:27 PM, Jed Brown <jed at jedbrown.org> wrote:
>>>>>>>>>
>>>>>>>>>> Mark Adams <mfadams at lbl.gov> writes:
>>>>>>>>>>> We are please to announce that hpgmg.org and the associated mailing
>>>>>>>>>>> list hpgmg-forum at hpgmg.org is officially available.
>>>>>>>>>> Thanks, Mark.  To help kick off the discussion, I would like to call
>>>>>>>>>> attention to our recent blog posts describing "results".
>>>>>>>>>>
>>>>>>>>>> The most recent announces the v0.1 release and includes a Kiviat diagram
>>>>>>>>>> comparing the on-node performance characteristics of CORAL apps and
>>>>>>>>>> several benchmarks running on Blue Gene/Q.
>>>>>>>>>>
>>>>>>>>>> https://hpgmg.org/2014/06/06/hpgmg-01/
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> This earlier post shows performance on a variety of top machines:
>>>>>>>>>>
>>>>>>>>>> https://hpgmg.org/2014/05/15/fv-results/
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> We are interested in better ways to collect and present the comparison
>>>>>>>>>> data as well as any characteristics that you think are important.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> In addition to the general principles on the front page, some further
>>>>>>>>>> rationale is given at:
>>>>>>>>>>
>>>>>>>>>> https://hpgmg.org/why/
>>>>>>>>>>
>>>>>>>>>> None of this is set in stone and we would be happy to discuss any
>>>>>>>>>> questions or comments on this list.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Please encourage any interested colleagues to subscribe to this list:
>>>>>>>>>>
>>>>>>>>>> https://hpgmg.org/lists/listinfo/hpgmg-forum
>>>>>>>>>> _______________________________________________
>>>>>>>>>> HPGMG-Forum mailing list
>>>>>>>>>> HPGMG-Forum at hpgmg.org
>>>>>>>>>> https://hpgmg.org/lists/listinfo/hpgmg-forum
>>>>>>>>> _______________________________________________
>>>>>>>>> HPGMG-Forum mailing list
>>>>>>>>> HPGMG-Forum at hpgmg.org
>>>>>>>>> https://hpgmg.org/lists/listinfo/hpgmg-forum
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Jeff Hammond
>>>>>>>> jeff.science at gmail.com
>>>>>>>> http://jeffhammond.github.io/
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Jeff Hammond
>>>>>> jeff.science at gmail.com
>>>>>> http://jeffhammond.github.io/
>>>>> _______________________________________________
>>>>> HPGMG-Forum mailing list
>>>>> HPGMG-Forum at hpgmg.org
>>>>> https://hpgmg.org/lists/listinfo/hpgmg-forum
>>>> _______________________________________________
>>>> HPGMG-Forum mailing list
>>>> HPGMG-Forum at hpgmg.org
>>>> https://hpgmg.org/lists/listinfo/hpgmg-forum
>>>>
>>> _______________________________________________
>>> HPGMG-Forum mailing list
>>> HPGMG-Forum at hpgmg.org
>>> https://hpgmg.org/lists/listinfo/hpgmg-forum
>> Brian Van Straalen         Lawrence Berkeley Lab
>> BVStraalen at lbl.gov         Computational Research
>> (510) 486-4976             Division (crd.lbl.gov)
>>
>>
>>
>>
>>
>> _______________________________________________
>> HPGMG-Forum mailing list
>> HPGMG-Forum at hpgmg.org
>> https://hpgmg.org/lists/listinfo/hpgmg-forum
>>
>>
>> _______________________________________________
>> HPGMG-Forum mailing list
>> HPGMG-Forum at hpgmg.org
>> https://hpgmg.org/lists/listinfo/hpgmg-forum
>
>
> ------------------------------
>
> Message: 2
> Date: Mon, 09 Jun 2014 08:46:42 +0200
> From: Jed Brown <jed at jedbrown.org>
> To: Brian Van Straalen <bvstraalen at lbl.gov>, Constantinos Evangelinos
> 	<cevange at us.ibm.com>
> Cc: HPGMG Forum <hpgmg-forum at hpgmg.org>
> Subject: Re: [HPGMG Forum] HPGMG release v0.1
> Message-ID: <87r42ya8bx.fsf at jedbrown.org>
> Content-Type: text/plain; charset="us-ascii"
>
> Brian Van Straalen <bvstraalen at lbl.gov> writes:
>> I know that my own host site NERSC uses hours*nodes*cores/node which
>> would seem to indicate people are core-counting, but perhaps Edison is
>> the last of the truly Fat core platforms we will see and we will go
>> back to allocation awards being in units of nodes.
> NERSC has a different charge factor for Edison versus Hopper (and
> different factors for different queues).  The x axis is arbitrary,
> serving only to quantify run size in units that can be interpreted
> separately for each machine.  Slope and maximum value are the quantities
> that are meaningful to compare.
>
> Sam, I think that counting NUMA nodes, while principled and relevant to
> the implementation, is ultimately confusing and prone to
> misinterpretation.  Would you mind regenerating this figure with x axis
> representing compute nodes (the unit in which users of the machine
> count)?
> -------------- next part --------------
> A non-text attachment was scrubbed...
> Name: not available
> Type: application/pgp-signature
> Size: 818 bytes
> Desc: not available
> URL: <https://hpgmg.org/lists/archives/hpgmg-forum/attachments/20140609/d818bb44/attachment-0001.bin>
>
> ------------------------------
>
> Message: 3
> Date: Mon, 09 Jun 2014 09:00:59 +0200
> From: Jed Brown <jed at jedbrown.org>
> To: Mark Adams <mfadams at lbl.gov>, Brian Van Straalen
> 	<bvstraalen at lbl.gov>
> Cc: HPGMG Forum <hpgmg-forum at hpgmg.org>
> Subject: Re: [HPGMG Forum] HPGMG release v0.1
> Message-ID: <87oay2a7o4.fsf at jedbrown.org>
> Content-Type: text/plain; charset="us-ascii"
>
> Mark Adams <mfadams at lbl.gov> writes:
>
>> This is a great discussion.
>>
>> As a more immediate matter, we should make clear that these are
>> superimposed plots and a "socket" is not comparable in general (Edison and
>> Peregrine in fact are).  These plots do not imply that a Cray XC30/Aries is
>> a better HPGMG machine than BG/Q or K, for instance, but you can see that
>> Aries is scaling noticeably better than Gemini and Infiniband on HPGMG (and
>> HPL would probably not distinguish these interconnects).  We should try to
>> make that clear in presentations and perhaps we (ie, Jed) could add a
>> little disclaimer to this effect on this web page, seeing as it has gotten
>> so much attention here.
> I added an update which I hope clarifies matters.
>
> https://bitbucket.org/hpgmg/hpgmg.org/commits/b820cd0771699f46cdf65912a47f58330ba9d0bb
>
> All, feel free to change further.  (Submit a pull request or send a
> patch to the mailing list if you don't have write access to this
> repository.)
> -------------- next part --------------
> A non-text attachment was scrubbed...
> Name: not available
> Type: application/pgp-signature
> Size: 818 bytes
> Desc: not available
> URL: <https://hpgmg.org/lists/archives/hpgmg-forum/attachments/20140609/77bf77ce/attachment-0001.bin>
>
> ------------------------------
>
> Message: 4
> Date: Mon, 09 Jun 2014 09:02:07 +0200
> From: Jed Brown <jed at jedbrown.org>
> To: Barry Smith <bsmith at mcs.anl.gov>, Mark Adams <mfadams at lbl.gov>
> Cc: HPGMG Forum <hpgmg-forum at hpgmg.org>
> Subject: Re: [HPGMG Forum] HPGMG release v0.1
> Message-ID: <87lht6a7m8.fsf at jedbrown.org>
> Content-Type: text/plain; charset="us-ascii"
>
> Barry Smith <bsmith at mcs.anl.gov> writes:
>> An extreme response might be to normalize the curves from each machine
>> so that they all start at the same point and then the only visible
>> information would be the scaling for each machine, not that one curve
>> is consistently above another curve (because of fatter nodes or
>> whatever).
> This would prevent comparison of the maximum values.
> -------------- next part --------------
> A non-text attachment was scrubbed...
> Name: not available
> Type: application/pgp-signature
> Size: 818 bytes
> Desc: not available
> URL: <https://hpgmg.org/lists/archives/hpgmg-forum/attachments/20140609/e7e40c25/attachment-0001.bin>
>
> ------------------------------
>
> Message: 5
> Date: Mon, 9 Jun 2014 11:36:10 +0200
> From: Karl Rupp <rupp at iue.tuwien.ac.at>
> To: <hpgmg-forum at hpgmg.org>
> Subject: [HPGMG Forum] Reuse of Kiviat diagram?
> Message-ID: <5395800A.1020307 at iue.tuwien.ac.at>
> Content-Type: text/plain; charset="ISO-8859-1"; format=flowed
>
> Hi guys,
>
> congratulations on HPGMG, this is a great piece of work, as it addresses 
> the shortcomings of the 'big benchmarks' in use. I'd like to write a 
> blog post, summarizing the better balance of HPGMG, for which I plan to 
> adapt your Kiviat diagram [1] a little (adding a few arrows and labels). 
> Do I get your permission for altering the graph and for using it in my 
> blog? References and links will of course be set appropriately.
>
> Thanks and best regards,
> Karli
>
> [1] https://hpgmg.org/images/hpgmg-kiviat-20140606.png
>
>
> ------------------------------
>
> Message: 6
> Date: Mon, 09 Jun 2014 11:58:55 +0200
> From: Jed Brown <jed at jedbrown.org>
> To: Karl Rupp <rupp at iue.tuwien.ac.at>, hpgmg-forum at hpgmg.org
> Subject: Re: [HPGMG Forum] Reuse of Kiviat diagram?
> Message-ID: <87y4x68kv4.fsf at jedbrown.org>
> Content-Type: text/plain; charset="us-ascii"
>
> Ian and Bert, do I understand correctly that you got approval for public
> release of the spreadsheet (not just to those of us at LBNL and ANL)?
>
> Mark, I think you added the most recent data.  Can you add it to the
> website repository (place in content/static/ and link from the blog
> post) or send to me and I'll do it.
>
> Karl Rupp <rupp at iue.tuwien.ac.at> writes:
>
>> Hi guys,
>>
>> congratulations on HPGMG, this is a great piece of work, as it addresses 
>> the shortcomings of the 'big benchmarks' in use. I'd like to write a 
>> blog post, summarizing the better balance of HPGMG, for which I plan to 
>> adapt your Kiviat diagram [1] a little (adding a few arrows and labels). 
>> Do I get your permission for altering the graph and for using it in my 
>> blog? References and links will of course be set appropriately.
>>
>> Thanks and best regards,
>> Karli
>>
>> [1] https://hpgmg.org/images/hpgmg-kiviat-20140606.png
>> _______________________________________________
>> HPGMG-Forum mailing list
>> HPGMG-Forum at hpgmg.org
>> https://hpgmg.org/lists/listinfo/hpgmg-forum
> -------------- next part --------------
> A non-text attachment was scrubbed...
> Name: not available
> Type: application/pgp-signature
> Size: 818 bytes
> Desc: not available
> URL: <https://hpgmg.org/lists/archives/hpgmg-forum/attachments/20140609/4cbfe2b8/attachment-0001.bin>
>
> ------------------------------
>
> Message: 7
> Date: Mon, 9 Jun 2014 04:30:09 -0700
> From: Mark Adams <mfadams at lbl.gov>
> To: Barry Smith <bsmith at mcs.anl.gov>
> Cc: HPGMG Forum <hpgmg-forum at hpgmg.org>
> Subject: Re: [HPGMG Forum] HPGMG release v0.1
> Message-ID:
> 	<CADOhEh6uLq+y69CPh_Lz1QbpQA3RUm4b0m_yys05PqMv3qY1Ew at mail.gmail.com>
> Content-Type: text/plain; charset="utf-8"
>
> On Sun, Jun 8, 2014 at 7:01 PM, Barry Smith <bsmith at mcs.anl.gov> wrote:
>
>>   Mark,
>>
>>    Absolutely right and I noted in my first email, the problem is people
>> cannot stop themselves from doing the comparison even when they know that
>> it is wrong.
>
> Probably but this might be a benign exercise, like debugging code that you
> see in a presentation, at least for people that understand that sockets
> have different costs.
>
>
>> An extreme response might be to normalize the curves from each machine so
>> that they all start at the same point and then the only visible information
>> would be the scaling for each machine, not that one curve is consistently
>> above another curve (because of fatter nodes or whatever). Hmm, maybe that
>> is not a bad idea?
>>
> This would make it easier to distinguish data/trends that log-log smothers.
>  The "socket" data is useful in that it is "raw" data and difference in
> performance of a socket is useful, even if not complete without some sort
> of cost.  The plasma PIC codes that I work with, for instance, generate
> plots like this (sockets).  I could compare and see how much faster IVB is
> than BG/Q nodes on these PIC codes and HPGMG. This ratio (Cray/IBM)_PIC /
>  (Cray/IBM)_HPGMG might be interesting.
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL: <https://hpgmg.org/lists/archives/hpgmg-forum/attachments/20140609/e660c932/attachment.html>
>
> ------------------------------
>
> Subject: Digest Footer
>
> _______________________________________________
> HPGMG-Forum mailing list
> HPGMG-Forum at hpgmg.org
> https://hpgmg.org/lists/listinfo/hpgmg-forum
>
>
> ------------------------------
>
> End of HPGMG-Forum Digest, Vol 3, Issue 6
> *****************************************
>

-- 
*Dr. E. Theodore L. Omtzigt*
CEO and Founder
Stillwater Supercomputing, Inc.
office US: EST (617) 314 6424, PST (415) 738 7387
mobile US: +1 916 296-7901
mobile EU: +31 6 292 000 50
3941 Park Drive, Suite 20-354
El Dorado Hills, CA 95762
USA
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://hpgmg.org/lists/archives/hpgmg-forum/attachments/20140609/4898bb7e/attachment-0001.html>


More information about the HPGMG-Forum mailing list