[HPGMG Forum] Solver options used for FV results?
cevange at us.ibm.com
Tue Aug 19 13:36:45 UTC 2014
Out of curiosity, does the OpenMP runtime on the K machine employ the fast
synchronization primitives of the processor used for VISIMPACT? I see no
reason why not but you never know.
From: Sam Williams <swwilliams at lbl.gov>
To: Jeff Hammond <jeff.science at gmail.com>
Cc: "hpgmg-forum at hpgmg.org" <hpgmg-forum at hpgmg.org>
Date: 08/19/2014 09:31 AM
Subject: Re: [HPGMG Forum] Solver options used for FV results?
Sent by: "HPGMG-Forum" <hpgmg-forum-bounces at hpgmg.org>
sorry, K always used mpifccpx.
On Aug 18, 2014, at 9:09 PM, Jeff Hammond <jeff.science at gmail.com> wrote:
> I wasn't aware that K supported GCC but in any case, I'd imagine it's
dramatically worse than the Fujitsu C compiler. Do you need C99?
> Sent from my iPhone
>> On Aug 18, 2014, at 5:47 PM, Sam Williams <swwilliams at lbl.gov> wrote:
>> Sorry, but the figures on the website are a bit out of date. The data
was collected in the March-April timeframe and then posted in May.
>> Edison, Hopper, and Peregrine used icc, Mira used mpixlc_r, and K used
gcc (I think).
>> Each machine used one process per numa node (64,12,6,12,8 threads for
Mira,Edison,Hopper,Peregrine,K) and one 128^3 box per process (./run 7 1).
>> It then runs with the number of processes equal to cubes of integers
(1^3, 2^3, 3^3, 4^3, 5^3 -> 1, 8, 27, 64, 125,...). At large scales, I
strided by 4 (16^3, 20^3, 24^3, 28^3, 32^3, 36^3, 40^3, ...)
>> Be careful as the GNU runtime seems significantly more sensitive to the
potential for nested parallelism (even if it never occurs) than the Intel
or IBM run times.
>> At the time, we were experimenting with GSRB vs Chebyshev. The data
online likely use -DUSE_FCYCLES, -DUSE_BICGSTAB, and -DUSE_GSRB. Since
then we have replaced -DUSE_GSRB with -DUSE_CHEBY as the default and
standard. This should not significantly affect performance or scalability;
just error, and even then only slightly.
>> Nevertheless, since March/April (first runs at full machine scales), I
maded changes to the structure of agglomeration in the v-cycle to improve
scalability and eliminate most spikes. Additionally, I changed the timing
so that instead of running the solve ~10 times and reporting the average,
it warms up for 10 seconds, then runs for another 10 seconds reporting the
average solver performance for only the last 10 seconds. This helps damp
some performance variability when you don't have exclusive access to the
>> Thus, updated data is something like...
>> For Edison, I either use...
>> cc -Ofast -fopenmp level.c operators.7pt.c mg.c solvers.c hpgmg.c
timers.c -DUSE_MPI -DUSE_SUBCOMM -DUSE_FCYCLES -DUSE_CHEBY -DUSE_BICGSTAB
-DSTENCIL_FUSE_BC -DSTENCIL_FUSE_DINV -o run.edison
>> aprun ... ./run.edison 7 1
>> to run 128^3 per process with 1 process per NUMA node.
>> cc -Ofast level.c operators.7pt.c mg.c solvers.c hpgmg.c timers.c
-DUSE_MPI -DUSE_SUBCOMM -DUSE_FCYCLES -DUSE_CHEBY -DUSE_BICGSTAB
-DSTENCIL_FUSE_BC -DSTENCIL_FUSE_DINV -o run.edison.flat
>> aprun ... ./run.edison.flat 6 1
>> to run 64^3 per process with *8* process per NUMA node. i.e. same
working size as MPI+OpenMP, but I leave 4 cores idle.
>>> On Aug 18, 2014, at 5:14 PM, Richard Mills <richardtmills at gmail.com>
>>> Sam (or anyone else who knows),
>>> Can you please let me know the solver options that were used to
generate the FV results that on the web page at
>>> I want to experiment with your code on some of the compute resources
that we have at Intel, and I'd like to use the same settings so that I can
compare to the results you have posted.
>>> HPGMG-Forum mailing list
>>> HPGMG-Forum at hpgmg.org
>> HPGMG-Forum mailing list
>> HPGMG-Forum at hpgmg.org
HPGMG-Forum mailing list
HPGMG-Forum at hpgmg.org
-------------- next part --------------
An HTML attachment was scrubbed...
-------------- next part --------------
A non-text attachment was scrubbed...
Size: 105 bytes
Desc: not available
More information about the HPGMG-Forum