[HPGMG Forum] Do we want the benchmark to go into intrinsics?

Sam Williams swwilliams at lbl.gov
Tue Apr 29 16:33:50 UTC 2014


I think there is a difference in saying 
- we need a high-quality implementation to tease out the kiviat characteristics in order to showcase the compute requirements of FEM
- we need a high-quality reference implementation that's been manually optimized for Mira.


On Apr 29, 2014, at 9:22 AM, Jed Brown <jed at jedbrown.org> wrote:

> Brian Van Straalen <bvstraalen at lbl.gov> writes:
> 
>> Jed Brown	7b171e1       	 fe: loop optimizations to TensorContract_QPX	
>> 28 Apr 2014
>> Jed Brown	339691b       	 fe: initial QPX version of tensor contraction	
>> 28 Apr 2014
>> Jed Brown	b1189a3       	 make: remove redundant link flags	
>> 28 Apr 2014
>> 
>> This seems like a pretty unportable benchmark idea.  Does HPL do this
>> for the download version?  Or are these commits to the research
>> branch?
> 
> We need a "high-quality" implementation.  The XL compiler is amazingly
> terrible at producing decent code for the small tensor contractions
> (versus gcc on x86, which does quite well).  Consequently, any
> performance counter data on BG/Q is entirely measuring the compiler (by
> about an order of magnitude).  The fact that code is easier to optimize
> on Intel than BG/Q is well-known, but we can't have a credible benchmark
> without decent code-gen there.  I don't want private vendor
> implementations to be an order of magnitude faster than what people can
> run for themselves.  (A modest difference is unavoidable.)
> 
> Note that this stuff is not compiled when running on other
> architectures, so does not impact portability.  I also have Intel
> AVX/FMA intrinsics which are easier to work with and produce nearly 2x
> speedup over vanilla C (and GCC produces better code than ICC).
> _______________________________________________
> HPGMG-Forum mailing list
> HPGMG-Forum at hpgmg.org
> https://hpgmg.org/lists/listinfo/hpgmg-forum



More information about the HPGMG-Forum mailing list