[HPGMG Forum] Do we want the benchmark to go into intrinsics?

Jed Brown jed at jedbrown.org
Tue Apr 29 16:22:13 UTC 2014

Brian Van Straalen <bvstraalen at lbl.gov> writes:

> Jed Brown	7b171e1       	 fe: loop optimizations to TensorContract_QPX	
> 28 Apr 2014
> Jed Brown	339691b       	 fe: initial QPX version of tensor contraction	
> 28 Apr 2014
> Jed Brown	b1189a3       	 make: remove redundant link flags	
> 28 Apr 2014
> This seems like a pretty unportable benchmark idea.  Does HPL do this
> for the download version?  Or are these commits to the research
> branch?

We need a "high-quality" implementation.  The XL compiler is amazingly
terrible at producing decent code for the small tensor contractions
(versus gcc on x86, which does quite well).  Consequently, any
performance counter data on BG/Q is entirely measuring the compiler (by
about an order of magnitude).  The fact that code is easier to optimize
on Intel than BG/Q is well-known, but we can't have a credible benchmark
without decent code-gen there.  I don't want private vendor
implementations to be an order of magnitude faster than what people can
run for themselves.  (A modest difference is unavoidable.)

Note that this stuff is not compiled when running on other
architectures, so does not impact portability.  I also have Intel
AVX/FMA intrinsics which are easier to work with and produce nearly 2x
speedup over vanilla C (and GCC produces better code than ICC).
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 835 bytes
Desc: not available
URL: <https://hpgmg.org/lists/archives/hpgmg-forum/attachments/20140429/a6efba53/attachment.bin>

More information about the HPGMG-Forum mailing list