[HPGMG Forum] Solver options used for FV results?

Sam Williams swwilliams at lbl.gov
Tue Aug 19 00:47:09 UTC 2014

Sorry, but the figures on the website are a bit out of date.  The data was collected in the March-April timeframe and then posted in May.

Edison, Hopper, and Peregrine used icc, Mira used mpixlc_r, and K used gcc (I think).

Each machine used one process per numa node (64,12,6,12,8 threads for Mira,Edison,Hopper,Peregrine,K) and one 128^3 box per process (./run 7 1).
It then runs with the number of processes equal to cubes of integers (1^3, 2^3, 3^3, 4^3, 5^3 -> 1, 8, 27, 64, 125,...).  At large scales, I strided by 4 (16^3, 20^3, 24^3, 28^3, 32^3, 36^3, 40^3, ...)

Be careful as the GNU runtime seems significantly more sensitive to the potential for nested parallelism (even if it never occurs) than the Intel or IBM run times.

At the time, we were experimenting with GSRB vs Chebyshev.  The data online likely use -DUSE_FCYCLES, -DUSE_BICGSTAB, and -DUSE_GSRB.  Since then we have replaced -DUSE_GSRB with -DUSE_CHEBY as the default and standard.  This should not significantly affect performance or scalability; just error, and even then only slightly.

Nevertheless, since March/April (first runs at full machine scales), I maded changes to the structure of agglomeration in the v-cycle to improve scalability and eliminate most spikes.  Additionally, I changed the timing so that instead of running the solve ~10 times and reporting the average, it warms up for 10 seconds, then runs for another 10 seconds reporting the average solver performance for only the last 10 seconds.  This helps damp some performance variability when you don't have exclusive access to the machine.

Thus, updated data is something like...

-------------- next part --------------
A non-text attachment was scrubbed...
Name: hpgmg-fv.pdf
Type: application/pdf
Size: 46762 bytes
Desc: not available
URL: <https://hpgmg.org/lists/archives/hpgmg-forum/attachments/20140818/12ceb8a4/attachment-0001.pdf>
-------------- next part --------------

For Edison, I either use...

cc -Ofast -fopenmp level.c operators.7pt.c mg.c solvers.c hpgmg.c timers.c -DUSE_MPI  -DUSE_SUBCOMM -DUSE_FCYCLES -DUSE_CHEBY -DUSE_BICGSTAB -DSTENCIL_FUSE_BC -DSTENCIL_FUSE_DINV  -o run.edison

aprun ... ./run.edison 7 1
to run 128^3 per process with 1 process per NUMA node.


cc -Ofast level.c operators.7pt.c mg.c solvers.c hpgmg.c timers.c -DUSE_MPI  -DUSE_SUBCOMM -DUSE_FCYCLES -DUSE_CHEBY -DUSE_BICGSTAB -DSTENCIL_FUSE_BC -DSTENCIL_FUSE_DINV  -o run.edison.flat

aprun ... ./run.edison.flat 6 1
to run 64^3 per process with *8* process per NUMA node.  i.e. same working size as MPI+OpenMP, but I leave 4 cores idle.

On Aug 18, 2014, at 5:14 PM, Richard Mills <richardtmills at gmail.com> wrote:

> Sam (or anyone else who knows),
> Can you please let me know the solver options that were used to generate the FV results that on the web page at
>   https://hpgmg.org/2014/05/15/fv-results/
> I want to experiment with your code on some of the compute resources that we have at Intel, and I'd like to use the same settings so that I can compare to the results you have posted.
> Thanks,
> Richard
> _______________________________________________
> HPGMG-Forum mailing list
> HPGMG-Forum at hpgmg.org
> https://hpgmg.org/lists/listinfo/hpgmg-forum

More information about the HPGMG-Forum mailing list