<div dir="ltr">You said "we could probably implement a reproducible dot product".  As luck would have it, I have actually implemented and extensively tested 128b precision (which is what Mark proposed) dot products.  My ~50x number is the measured difference between 64b and 128b dot products on IA.  The data are invariant to GCC vs Intel compilers and Fortran vs C implementation in my tests.<div><br></div><div>I have also implemented Kahan summation for dot products, but haven't spent enough time with the experiments to report performance data that I would consider reliable enough for decision-making.  Looking at the code, I would estimate it is more like ~10x slower than straight DP.</div><div><br></div><div>Marat Dukham has done some nice work on DDP that has been reported on other public email lists.</div><div><br></div><div>Anyways, I have no position in this debate.  I am merely trying to provide data that I have measured to inform the debate you all are having.</div><div><div><br></div><div>Jeff<br><div class="gmail_extra"><br><div class="gmail_quote">On Fri, Jul 31, 2015 at 5:04 PM, Brian Van Straalen <span dir="ltr"><<a href="mailto:bvstraalen@lbl.gov" target="_blank">bvstraalen@lbl.gov</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div style="word-wrap:break-word">Once you reach the bottom solver for full multigrid you should not be in the network at all anymore.  You would be computing a dot product over a few hundred doubles in L1 memory.  I’m just guessing here but I don’t think the cost of a reproducible dot product would be that bad to implement in software as a compensated sum or distillation.   There was a talk at ISC this year showing some results.  From DRAM compensated summation is about the same, and gets more expensive in faster memory levels.  The impact was not 50x ever though. <div><div class="h5"><div><br><div><br><div><br><div><blockquote type="cite"><div>On Jul 31, 2015, at 4:38 PM, Jeff Hammond <<a href="mailto:jeff.science@gmail.com" target="_blank">jeff.science@gmail.com</a>> wrote:</div><br><div><div dir="ltr">If by 128b floats, you mean IEEE754 quad precision implemented in SW, then the associated dot product will run ~50x slower on conventional hardware (that is, hardware that does not support QP).<div><br></div><div>It should be possible to implement DDP or some form of compensated summation more efficiently.<br><div><br></div><div>Jeff<br><div class="gmail_extra"><br><div class="gmail_quote">On Fri, Jul 31, 2015 at 4:18 PM, Brian Van Straalen <span dir="ltr"><<a href="mailto:bvstraalen@lbl.gov" target="_blank">bvstraalen@lbl.gov</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div style="word-wrap:break-word"><div><br></div><div>I would think that we could probably implement a reproducible dot product in the krylov code since it only happens on the coarse grid which should be small enough.</div><div><br></div><div>HPGMG uses max norms, so we should be ok for that part.</div><span><font color="#888888"><div><br></div><div>Brian</div></font></span><div><div><div><br></div><br><div><blockquote type="cite"><div>On Jul 31, 2015, at 3:27 PM, Hoemmen, Mark <<a href="mailto:mhoemme@sandia.gov" target="_blank">mhoemme@sandia.gov</a>> wrote:</div><br><div><br style="font-family:Helvetica;font-size:14px;font-style:normal;font-variant:normal;font-weight:normal;letter-spacing:normal;line-height:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px"><br style="font-family:Helvetica;font-size:14px;font-style:normal;font-variant:normal;font-weight:normal;letter-spacing:normal;line-height:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px"><span style="font-family:Helvetica;font-size:14px;font-style:normal;font-variant:normal;font-weight:normal;letter-spacing:normal;line-height:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;float:none;display:inline!important">On 7/31/15, 3:45 PM, "Jed Brown" <</span><a href="mailto:jed@jedbrown.org" style="font-family:Helvetica;font-size:14px;font-style:normal;font-variant:normal;font-weight:normal;letter-spacing:normal;line-height:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px" target="_blank">jed@jedbrown.org</a><span style="font-family:Helvetica;font-size:14px;font-style:normal;font-variant:normal;font-weight:normal;letter-spacing:normal;line-height:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;float:none;display:inline!important">> wrote:</span><br style="font-family:Helvetica;font-size:14px;font-style:normal;font-variant:normal;font-weight:normal;letter-spacing:normal;line-height:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px"><br style="font-family:Helvetica;font-size:14px;font-style:normal;font-variant:normal;font-weight:normal;letter-spacing:normal;line-height:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px"><blockquote type="cite" style="font-family:Helvetica;font-size:14px;font-style:normal;font-variant:normal;font-weight:normal;letter-spacing:normal;line-height:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px">Brian Van Straalen <<a href="mailto:bvstraalen@lbl.gov" target="_blank">bvstraalen@lbl.gov</a>> writes:<br><blockquote type="cite">The concern is not trivial.  I¹ve spent some time re-reading<br>Precimonious paper (<a href="http://eecs.berkeley.edu/~rubio/includes/sc13.pdf" target="_blank">eecs.berkeley.edu/~rubio/includes/sc13.pdf</a><br><<a href="http://eecs.berkeley.edu/~rubio/includes/sc13.pdf" target="_blank">http://eecs.berkeley.edu/~rubio/includes/sc13.pdf</a>>) and I realize<br>that it would not be hard to make a faster version of FMG using mixed<br>precision.  <br></blockquote><br>Just a quick comment now.  I think there's not as much fat to trim as<br>you think.  In general, the precision needs to be as accurate as the<br>discretization.  Most flops occur on fine grids where the discretization<br>is more accurate than single precision.  I challenge you to speed up<br>HPGMG by more than, say, 15%, while maintaining order of accuracy on<br>fine grids.<br><br><blockquote type="cite">There have been papers over the last few years using 4-byte AMG as a<br>preconditioner<span> </span><br></blockquote><br>So much fat already.  Then you have a Krylov method and full-accuracy<br>residuals, but HPGMG solves in the cost of a few residual evaluations.<br>Also, these low-accuracy preconditioners are usually used for problems<br>that are only modestly ill-conditioned.  Try it with an operator with<br>condition number 10^{12} like you see in solid mechanics or geodynamics<br>and it doesn't look so hot any more.<br></blockquote><br style="font-family:Helvetica;font-size:14px;font-style:normal;font-variant:normal;font-weight:normal;letter-spacing:normal;line-height:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px"><span style="font-family:Helvetica;font-size:14px;font-style:normal;font-variant:normal;font-weight:normal;letter-spacing:normal;line-height:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;float:none;display:inline!important">It could be fun to use such a tool to find out the best places to put</span><br style="font-family:Helvetica;font-size:14px;font-style:normal;font-variant:normal;font-weight:normal;letter-spacing:normal;line-height:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px"><span style="font-family:Helvetica;font-size:14px;font-style:normal;font-variant:normal;font-weight:normal;letter-spacing:normal;line-height:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;float:none;display:inline!important">128-bit floating-point arithmetic.  That could help with some really hard</span><br style="font-family:Helvetica;font-size:14px;font-style:normal;font-variant:normal;font-weight:normal;letter-spacing:normal;line-height:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px"><span style="font-family:Helvetica;font-size:14px;font-style:normal;font-variant:normal;font-weight:normal;letter-spacing:normal;line-height:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;float:none;display:inline!important">problems, or at least avoid some reproducibility issues.</span><br style="font-family:Helvetica;font-size:14px;font-style:normal;font-variant:normal;font-weight:normal;letter-spacing:normal;line-height:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px"><br style="font-family:Helvetica;font-size:14px;font-style:normal;font-variant:normal;font-weight:normal;letter-spacing:normal;line-height:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px"><span style="font-family:Helvetica;font-size:14px;font-style:normal;font-variant:normal;font-weight:normal;letter-spacing:normal;line-height:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;float:none;display:inline!important">mfh</span></div></blockquote></div><br></div></div><span><div>
<span style="border-collapse:separate;font-family:Helvetica;font-style:normal;font-variant:normal;font-weight:normal;letter-spacing:normal;line-height:normal;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px"><div><div><font face="'Courier New'">Brian Van Straalen         Lawrence Berkeley Lab</font></div><div><font face="'Courier New'"><a href="mailto:BVStraalen@lbl.gov" target="_blank">BVStraalen@lbl.gov</a>         Computational Research</font></div><div><font face="'Courier New'">(510) 486-4976             Division (<a href="http://crd.lbl.gov/" target="_blank">crd.lbl.gov</a>)</font></div></div><div><br></div><div><br></div></span><br>
</div>
<br></span></div><br>_______________________________________________<br>
HPGMG-Forum mailing list<br>
<a href="mailto:HPGMG-Forum@hpgmg.org" target="_blank">HPGMG-Forum@hpgmg.org</a><br>
<a href="https://hpgmg.org/lists/listinfo/hpgmg-forum" rel="noreferrer" target="_blank">https://hpgmg.org/lists/listinfo/hpgmg-forum</a><br>
<br></blockquote></div><br><br clear="all"><div><br></div>-- <br><div>Jeff Hammond<br><a href="mailto:jeff.science@gmail.com" target="_blank">jeff.science@gmail.com</a><br><a href="http://jeffhammond.github.io/" target="_blank">http://jeffhammond.github.io/</a></div>
</div></div></div></div>
</div></blockquote></div><br><div>
<span style="border-collapse:separate;color:rgb(0,0,0);font-family:Helvetica;font-style:normal;font-variant:normal;font-weight:normal;letter-spacing:normal;line-height:normal;text-align:auto;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px"><div><div><font face="'Courier New'">Brian Van Straalen         Lawrence Berkeley Lab</font></div><div><font face="'Courier New'"><a href="mailto:BVStraalen@lbl.gov" target="_blank">BVStraalen@lbl.gov</a>         Computational Research</font></div><div><font face="'Courier New'">(510) 486-4976             Division (<a href="http://crd.lbl.gov" target="_blank">crd.lbl.gov</a>)</font></div></div><div><br></div><div><br></div></span><br>
</div>
<br></div></div></div></div></div></div></blockquote></div><br><br clear="all"><div><br></div>-- <br><div class="gmail_signature">Jeff Hammond<br><a href="mailto:jeff.science@gmail.com" target="_blank">jeff.science@gmail.com</a><br><a href="http://jeffhammond.github.io/" target="_blank">http://jeffhammond.github.io/</a></div>
</div></div></div></div>