[HPGMG Forum] new 4th order Full Multigrid HPGMG-FV implementation

Sam Williams swwilliams at lbl.gov
Wed Apr 15 02:06:01 UTC 2015


Included in the latest release is a 4th order Full Multigrid implementation of HPGMG-FV.  This version is currently included as a research vehicle to differentiate the performance characteristics of the 2nd order method from high-order finite volume as well as to differentiate high order FE and FV.  We are thus asking for feedback from the community on the value of the 4th order implementation compared to the 2nd order.  i.e., is 4th order HPGMG-FV a better benchmark than the 2nd order HPGMG-FV?  


*For now, we will continue to rank systems based on the 2nd order HPGMG-FV implementation.  As such, submissions must conform to the guidlines as described on the website.*


Unlike the 2nd order implementation, the 4th order version requires a different driver (hpgmg-fv.c) and a different operator file (operators.fv4.c).  Currently, for 4th order, it is highly recommended one use GSRB over the Chebyshev or Jacobi smoothers.  By default, most smoothers make 6 passes thru the data (instead of 4) in order to provide sufficient error and residual.  This has the effect of increasing the time spent in smoothing by at least 50%.  

Nominally, for a fixed problem size, compared to the 2nd order method, the 4th order smoother performs ...
- 4x the flops
- 4x the MPI messages
- 2x the MPI data movement
- 1x the DRAM data movement
- provides 4 more bits of accuracy for every 8x increase in the problem size (instead of 2 bits)
These characteristics tend to push the solver into the compute or network limited regime depending on the relative balance of Flop/s, DRAM capacity, DRAM bandwidth, and network performance.


In order to compile the 4th order version on Edison, one may use the following command line...

cc  -Ofast -fopenmp level.c operators.fv4.c  mg.c solvers.c hpgmg-fv.c timers.c -DUSE_MPI  -DUSE_SUBCOMM -DUSE_FCYCLES -DUSE_GSRB -DUSE_BICGSTAB -o run.edison

Although it uses a different driver, the benchmark is run the same as the 2nd order method.  However, in order to measure error for a cell averaged finite volume scheme (instead of cell centered), it will run two additional coarser problems at grid spacing 2h and 4h.  Thus, the benchmark will build the MG hierarchy (calculating some values more precisely), warm up the system by running 10 solvers, benchmark for 30s, print the timing results, and finally perform 3 additional solves to verify the order is 4th order.


More information about the HPGMG-Forum mailing list