[HPGMG Forum] Solver options used for FV results?

Mark Adams mfadams at lbl.gov
Tue Aug 19 02:57:55 UTC 2014


"it warms up for 10 seconds, then runs for another 10 seconds "  ... this
design might need some attention.  Modulo scales it sounds more like a
lover than a benchmark :)


On Mon, Aug 18, 2014 at 8:47 PM, Sam Williams <swwilliams at lbl.gov> wrote:

> Sorry, but the figures on the website are a bit out of date.  The data was
> collected in the March-April timeframe and then posted in May.
>
> Edison, Hopper, and Peregrine used icc, Mira used mpixlc_r, and K used gcc
> (I think).
>
> Each machine used one process per numa node (64,12,6,12,8 threads for
> Mira,Edison,Hopper,Peregrine,K) and one 128^3 box per process (./run 7 1).
> It then runs with the number of processes equal to cubes of integers (1^3,
> 2^3, 3^3, 4^3, 5^3 -> 1, 8, 27, 64, 125,...).  At large scales, I strided
> by 4 (16^3, 20^3, 24^3, 28^3, 32^3, 36^3, 40^3, ...)
>
> Be careful as the GNU runtime seems significantly more sensitive to the
> potential for nested parallelism (even if it never occurs) than the Intel
> or IBM run times.
>
> At the time, we were experimenting with GSRB vs Chebyshev.  The data
> online likely use -DUSE_FCYCLES, -DUSE_BICGSTAB, and -DUSE_GSRB.  Since
> then we have replaced -DUSE_GSRB with -DUSE_CHEBY as the default and
> standard.  This should not significantly affect performance or scalability;
> just error, and even then only slightly.
>
>
>
> Nevertheless, since March/April (first runs at full machine scales), I
> maded changes to the structure of agglomeration in the v-cycle to improve
> scalability and eliminate most spikes.  Additionally, I changed the timing
> so that instead of running the solve ~10 times and reporting the average,
> it warms up for 10 seconds, then runs for another 10 seconds reporting the
> average solver performance for only the last 10 seconds.  This helps damp
> some performance variability when you don't have exclusive access to the
> machine.
>
>
> Thus, updated data is something like...
>
>
>
>
>
>
>
> For Edison, I either use...
>
> cc -Ofast -fopenmp level.c operators.7pt.c mg.c solvers.c hpgmg.c timers.c
> -DUSE_MPI  -DUSE_SUBCOMM -DUSE_FCYCLES -DUSE_CHEBY -DUSE_BICGSTAB
> -DSTENCIL_FUSE_BC -DSTENCIL_FUSE_DINV  -o run.edison
>
> with
> aprun ... ./run.edison 7 1
> to run 128^3 per process with 1 process per NUMA node.
>
> or
>
> cc -Ofast level.c operators.7pt.c mg.c solvers.c hpgmg.c timers.c
> -DUSE_MPI  -DUSE_SUBCOMM -DUSE_FCYCLES -DUSE_CHEBY -DUSE_BICGSTAB
> -DSTENCIL_FUSE_BC -DSTENCIL_FUSE_DINV  -o run.edison.flat
>
> with
> aprun ... ./run.edison.flat 6 1
> to run 64^3 per process with *8* process per NUMA node.  i.e. same working
> size as MPI+OpenMP, but I leave 4 cores idle.
>
>
>
>
>
>
> On Aug 18, 2014, at 5:14 PM, Richard Mills <richardtmills at gmail.com>
> wrote:
>
> > Sam (or anyone else who knows),
> >
> > Can you please let me know the solver options that were used to generate
> the FV results that on the web page at
> >
> >   https://hpgmg.org/2014/05/15/fv-results/
> >
> > I want to experiment with your code on some of the compute resources
> that we have at Intel, and I'd like to use the same settings so that I can
> compare to the results you have posted.
> >
> > Thanks,
> > Richard
> > _______________________________________________
> > HPGMG-Forum mailing list
> > HPGMG-Forum at hpgmg.org
> > https://hpgmg.org/lists/listinfo/hpgmg-forum
>
>
> _______________________________________________
> HPGMG-Forum mailing list
> HPGMG-Forum at hpgmg.org
> https://hpgmg.org/lists/listinfo/hpgmg-forum
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://hpgmg.org/lists/archives/hpgmg-forum/attachments/20140818/6728141b/attachment.html>


More information about the HPGMG-Forum mailing list