[HPGMG Forum] Do we want the benchmark to go into intrinsics?

Sam Williams swwilliams at lbl.gov
Wed Apr 30 16:29:14 UTC 2014


I wasn't suggesting we need BLAS.  Rather I was hoping to mirror the manner in which the HPL webpage presents a reference implementation and then mentions/links to vendor tuned implementations (the BLAS) by having a reference HPGMG implementation and then perhaps links (even if they are housed in the same repo) to tuned HPGMG operators.


On Apr 30, 2014, at 8:25 AM, Brian Van Straalen <bvstraalen at lbl.gov> wrote:

> ugh, you guys really get all over the map  :-)
> 
> I think we don’t drift too far from the benchmark we are looking to track behind, which is really HPL.  Our hope is that HPCG is eventually buried once the primary actors have gotten their publications.
> 
>> "HPL - A Portable Implementation of the High-Performance Linpack Benchmark for Distributed-Memory Computers"
>> 
>> 
>> 
>> "The HPL software package requires the availibility on your system of an implementation of the Message Passing Interface MPI (1.1 compliant). An implementation of either the Basic Linear Algebra Subprograms BLAS or the Vector Signal Image Processing Library VSIPL is also needed. Machine-specific as well as generic implementations of MPI, the BLAS and VSIPL are available for a large variety of systems."
> 
> 
> I think we should have as simple a primary designation.  OpenMP3.1 seems reasonable now to add.  MPI-1.1 standard, I don’t think we need BLAS anywhere. I suspect that vendors are already very well motivated to deliver a competent BLAS, and if not ATLAS can find it for a platform.  But a killer BLAS doesn’t help us very much.  I don’t think putting a VSIPL helps us much either.  We can be even more portable than HPL really.  Everyone has a C compiler I think.
> 
> I would be open to adding Vectors if we can abstract the Vector programming requirements in HPGMG with an unified, if small API.  For this I think we look at the superset of operations for several current and recent past platforms.  So, VEC2FMA, VEC2ADD, VEC4FMA as macros and people can help us populate this HPGMGVector.h head file with their platform specifics. default implementations are just regular operations in C.  Adding a new benchmark intrinsic means adding the default C implementation as as many vendor specific intrinsics as the developer knows.  For instance, we might want to also put ARM VFP instructions in here if we can get an AMR VFP processor in house to play with.  We probably want SSE, AVX, QPX, VFP, C.    A configuration parameter might let us create a compiler benchmark by making all our vector code drop to C and watch how the native compiler handles things.  
> 
> I would think every time a new machine and students show up they will try several compilers natively, and our vectors, and show up at a conference and tell us about it., thus creating the correct public shaming motivation while still allowing the same team to augment our vector set and show how well the machine can do.
> 
> I’m wondering where users can insert heterogeneity in the execution…which seems a big coming issue.
> 
> Brian
> 
> 
>  
> On Apr 30, 2014, at 9:05 AM, Jed Brown <jed at jedbrown.org> wrote:
> 
>> Sam Williams <swwilliams at lbl.gov> writes:
>> 
>>> For HPL, the website just has the reference implementation but links
>>> to optimized BLAS.  The reality is its very unlikely every vendor
>>> optimizing for HPGMG will contribute to the repo and give up IP
>>> ownership.  As such, optimized becomes a sliding scale.  There is
>>> optimized in the repo and there's the optimized from the vendor.
>> 
>> I don't foresee "standard" libraries of HPGMG spice, but perhaps that is
>> a viable mechanism.  I would like there to be a way for users to
>> reproduce vendor-provided numbers _after_ The List is released.
>> (Otherwise people have to blindly trust the committee.  I'd rather
>> empower the public to "trust but verify".)  Mark was interested in a way
>> to make the source code accessible eventually.  I wonder if there is a
>> way to swing that (perhaps with a special license).
>> 
>>> I also think that like BLAS, the optimized (architecture-specific)
>>> HPGMG code needs to be self-contained and not sprinkled throughout.  I
>>> don't want porting to a new architecture becoming a search for every
>>> instance of __bgq__, __x86__, __mic__ ... to make sure you've found
>>> every routine.
>> 
>> In HPGMG-FE, the restriction and prolongation has been pretty good
>> without further trickery, so the two parts that need optimization are
>> tensor contraction and the pointwise element kernel.  The main reason
>> tensor contraction is "generic" now is so that it can be reused between
>> different operators, but once we pin down an operator, there is no
>> reason not to group it back into one file.  My intent was that at the
>> end of the day, there would be one file containing optimization for each
>> architecture.  It is already set up so that new source files are
>> automatically registered.
>> _______________________________________________
>> HPGMG-Forum mailing list
>> HPGMG-Forum at hpgmg.org
>> https://hpgmg.org/lists/listinfo/hpgmg-forum
> 
> Brian Van Straalen         Lawrence Berkeley Lab
> BVStraalen at lbl.gov         Computational Research
> (510) 486-4976             Division (crd.lbl.gov)
> 
> 
> 
> 



More information about the HPGMG-Forum mailing list