RESULTS
| SUKDMIG2D | Configuration | Model Size | Cores | Elapsed Time | Speed up |
|---|---|---|---|---|---|
| CPU Only (Baseline) | 2x E5-2698 v3 2.30GHz | 2301 x 751 | 1 | 218 | 1.00 |
| mig2d: |
|---|
Parallel Directives
pragma
Parallelize for loops
Vectorize
Compiler vectorizes inner loops Parallel Directives restrict on pointers! limits aliasing
www.wikipedia.org/wiki/Restrict
pragma
Parallelize outer for loops
Compiler parallelizes inner loop
Resolve Errors!
Parallel Directives
pragma
Parallelize for outer loop
Parallelize inner loops
Resolve loop carried depend
Add acc loop directive
Resit (managed):
537, Accelerator kernel generated
Generating Tesla code
538, #pragma acc loop gang / blockIdx.x /
553, #pragma acc loop vector(128) / threadIdx.x /
540, Loop carried dependence of t->-> prevents parallelization
Loop carried backward dependence of t->-> prevents vector
Resit:
pragma acc parallel for for (ix=0; ix<nx; ++ix)
{
pragma acc loop for (is=0; is<ns; ++is)
{ . . .