[HPGMG Forum] latest HPGMG-FV release transposes the data structure to facilitate OpenMP4/OpenACC/CUDA implementations

Jeff Hammond jeff.science at gmail.com
Thu Jan 22 05:43:52 UTC 2015

Does this change the performance on systems that aren't designed to
maximize programmer pain?  It would be a shame if this interfered with
performance on systems where the big compute and the big memory are


On Wed, Jan 21, 2015 at 10:59 PM, Sam Williams <swwilliams at lbl.gov> wrote:
> FYI, the latest HPGMG-FV release transposes the internal data layout to facilitate any potential OpenMP4/OpenACC/CUDA implementations.  This should be completely transparent and does not affect any existing operators as the code previously did not assume data was contiguous.  It does however, create an additional set of pointers (one per vector) in level_type that point to the union of FP data across all boxes on that level for a specific vector.  Hopefully, this will facilitate any future OpenMP4/OpenACC/CUDA implementations as it allows one to perform one bulk copy (single omp target update/acc copyin/cudamemcpy statement) for *all* boxes on a level instead of having one update/copyin/cudamemcpy for *each* of a variable number of boxes.
> At a high level, the figure below highlights the change.
> _______________________________________________
> HPGMG-Forum mailing list
> HPGMG-Forum at hpgmg.org
> https://hpgmg.org/lists/listinfo/hpgmg-forum

Jeff Hammond
jeff.science at gmail.com

More information about the HPGMG-Forum mailing list