[HPGMG Forum] Do we want the benchmark to go into intrinsics?

Jed Brown jed at jedbrown.org
Wed Apr 30 14:05:17 UTC 2014

Sam Williams <swwilliams at lbl.gov> writes:

> For HPL, the website just has the reference implementation but links
> to optimized BLAS.  The reality is its very unlikely every vendor
> optimizing for HPGMG will contribute to the repo and give up IP
> ownership.  As such, optimized becomes a sliding scale.  There is
> optimized in the repo and there's the optimized from the vendor.

I don't foresee "standard" libraries of HPGMG spice, but perhaps that is
a viable mechanism.  I would like there to be a way for users to
reproduce vendor-provided numbers _after_ The List is released.
(Otherwise people have to blindly trust the committee.  I'd rather
empower the public to "trust but verify".)  Mark was interested in a way
to make the source code accessible eventually.  I wonder if there is a
way to swing that (perhaps with a special license).

> I also think that like BLAS, the optimized (architecture-specific)
> HPGMG code needs to be self-contained and not sprinkled throughout.  I
> don't want porting to a new architecture becoming a search for every
> instance of __bgq__, __x86__, __mic__ ... to make sure you've found
> every routine.

In HPGMG-FE, the restriction and prolongation has been pretty good
without further trickery, so the two parts that need optimization are
tensor contraction and the pointwise element kernel.  The main reason
tensor contraction is "generic" now is so that it can be reused between
different operators, but once we pin down an operator, there is no
reason not to group it back into one file.  My intent was that at the
end of the day, there would be one file containing optimization for each
architecture.  It is already set up so that new source files are
automatically registered.
