[HPGMG Forum] [EXTERNAL] Re: Acceptable rounding errors

Mark Adams mfadams at lbl.gov
Mon Aug 3 17:30:42 UTC 2015

On Mon, Aug 3, 2015 at 9:58 AM, Sam Williams <swwilliams at lbl.gov> wrote:

> I think scalable implementations *should* reduce the bottom problem size
> to a single process.  Nevertheless, I don't think we specify that the
> bottom solve *must* be run on a single process.

This does violate one of our two (or is it just one) rules: HPGMG must be
scale and architecture free.  No processes, no caches, no main memory, etc.

I just realized that since we do not specify the coarse grid solver nor its
size, someone could just call the whole solve the coarse grid and do
whatever they like ...

> I think we only specify the restrictions on the dimensions of the bottom
> solve.

How so?  Your implementation has some restrictions, but it's not in the

If we specify restriction and prolongation, as we do now (I think), then I
guess we can only refine by powers of 2, but we could define it more
generally (eg, P_0 & P_1) and that would allow for odd coarsening to use MG
all the way to one cell.  One could, at a coarse grid, put a nice grid over
the ugly (eg, (2*13)^3) grid, and use more general interpolation, to get to
a nice grid.  We do not have to worry about it this round, but something to
think about.

> This flexibility could be beneficial for odd problem sizes like (11*2^k)^3
> whose bottom solve may be 11^3.  Even if running on a single process, there
> can still be multiple threads, cores, or processors (e.g. single process
> spans a multisocket node) performing the bottom solve.  How one chooses to
> thread such an operations is an implementation choice.
> For the FV method, you are also required to sum the flux surfaces for each
> cell.  Obviously this is 6 terms that must be summed and not 1000, but I
> don't think we should mandate the order of that summation.  There is a
> deeper question as to whether adjacent cells must calculate common flux
> surfaces with the same order of operations.
> At this point, I'm not inclined to mandate reproducibility, but am curious
> to see quantitative/numerical data where it made a difference in HPGMG-FV.
> On Aug 3, 2015, at 9:20 AM, Jed Brown <jed at jedbrown.org> wrote:
> > Brian Van Straalen <bvstraalen at lbl.gov> writes:
> >
> >> Once you reach the bottom solver for full multigrid you should not be
> >> in the network at all anymore.  You would be computing a dot product
> >> over a few hundred doubles in L1 memory.
> >
> > And the order of those doubles doesn't depend on the
> > network/fine-grid/number of processors (at least not in FE), so it's
> > reproducible in any precision and thus I don't know why we're having
> > this particular discussion about dot products that occur only on the
> > coarsest (serial) grid.
> >
> > The summation order during restriction might not be reproducible
> > (depends on implementation details including whether you use
> > MPI_Waitsome versus MPI_Waitall), but can be made reproducible and
> > parallel-invariant if desired.
> > _______________________________________________
> > HPGMG-Forum mailing list
> > HPGMG-Forum at hpgmg.org
> > https://hpgmg.org/lists/listinfo/hpgmg-forum
> _______________________________________________
> HPGMG-Forum mailing list
> HPGMG-Forum at hpgmg.org
> https://hpgmg.org/lists/listinfo/hpgmg-forum
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://hpgmg.org/lists/archives/hpgmg-forum/attachments/20150803/ddf7b924/attachment.html>

More information about the HPGMG-Forum mailing list