[HPGMG Forum] Do we want the benchmark to go into intrinsics?

Sam Williams swwilliams at lbl.gov
Wed Apr 30 13:37:58 UTC 2014

For HPL, the website just has the reference implementation but links to optimized BLAS.  The reality is its very unlikely every vendor optimizing for HPGMG will contribute to the repo and give up IP ownership.  As such, optimized becomes a sliding scale.  There is optimized in the repo and there's the optimized from the vendor.

I also think that like BLAS, the optimized (architecture-specific) HPGMG code needs to be self-contained and not sprinkled throughout.  I don't want porting to a new architecture becoming a search for every instance of __bgq__, __x86__, __mic__ ... to make sure you've found every routine.

On Apr 29, 2014, at 10:38 PM, John Shalf <JShalf at lbl.gov> wrote:

> How about having a “no-opt” compile option (or you could call it “compiler shaming mode”) that disables the architecture specific directives.  That way you can always go back to the reference implementation and properly shame bad compilers.
> However, I agree with Jed’s motivation to have the optimized versions all in one code base as much as possible so that you have performant code out-of-the-box for all architectures as much as possible.  (don’t want to have one optimized version per architecture… would be revision control nightmare, and average bencharker would likely fail to choose the correct optimized version).
> -john
> On Apr 29, 2014, at 7:58 PM, Jed Brown <jed at jedbrown.org> wrote:
>> Sam Williams <swwilliams at lbl.gov> writes:
>>> The kiviats were constructed with code as is.  HPCG defenders promise
>>> better performance with optimization.  
>> This is largely fantasy because of the memory bandwidth limitations.
>> This is clear from the performance model.
>>> Showing HPGMG optimized performance and performance characteristics
>>> looks better than vanilla code/HPCG doesn't address the argument.  It
>>> only says optimization is beneficial.
>> Optimization will inevitably be more beneficial for HPGMG because it
>> exercises much more than memory bandwidth.  This reflects real apps
>> (almost all of which benefit from optimization).
>>> I think the reference implementation should be "good".  I'm much more
>>> curious as to why xlc fails so badly on HPGMG-FE while it does pretty
>>> good on HPGMG-FV.
>> It has some tight local loops with nontrivial indexing.  It is easier
>> for compilers to optimize for x86 because the chips are so much smarter
>> (out-of-order, branch prediction, etc), but those compilers have bigger
>> markets and are tend to be better (not to mention standards compliance
>> and miscompilation rate).
>> I know a lot of applications that see 5x lower efficiency when running
>> on BG/Q than on Edison.
>> _______________________________________________
>> HPGMG-Forum mailing list
>> HPGMG-Forum at hpgmg.org
>> https://hpgmg.org/lists/listinfo/hpgmg-forum
> _______________________________________________
> HPGMG-Forum mailing list
> HPGMG-Forum at hpgmg.org
> https://hpgmg.org/lists/listinfo/hpgmg-forum

More information about the HPGMG-Forum mailing list