RESULTS
| ==55246== Profiling result: |
|---|
Compile
pgcc –acc \
-ta=tesla:managed Profile ! pgprof <managed binary>
| main: |
|---|
Managed Compile
Verbose output
Guided enhancements
Targeted changes
Common Optimizations
Data Movement
Copy, copyin, copyout
Create, delete Update
Loop Collapse
| main: |
|---|
Data Movement
Analyze data flow in application
Explicitly use data directives
Move data directive to main
Create only when possible
Copyin move data to GPU
Update to move data to host
sum2: (managed)
Data Movement 571, Generating copyin(t2[:nx][:nz],t1[:nx][:nz])
Explicitly use present for Generating copyout(t[:nx][:nz]) data already on GPU!
Collapse
Increase the threads nx*nz
Present
Data is already on the GPU
Prevent data movement
Data Movement
Move large data transfers to main i.e. mig, mig1
Minimize Copyin, Copyout
Maximize Create, Present
Prevents data transfers
Use Copyin, Copyout, Copy only when data changes!
Delete happens when leaving scope
| mig2d: |
|---|
void mig2d(float * restrict trace, int nt, float ft,...)
{ ...
pragma acc data copyin(trace[0:nz],trf[0:nt+2mtmax]) \ present(*mig, mig1, tb,tsum,tvsum,cssum,pb,... \ create(tmt[0:nxt][0:nzt], ampt[0:nxt][0:nzt],...
{
| Resit: (managed) |
|---|
Data Movement
| resit: |
|---|
Use present for data already on GPU!
Collapse
Increase the threads nx*ns
Present
Data is already on the GPU
Prevent data movement
Data Movement
mig, mig1 data large
Move to main
Copyin at start
Mark as present
Copyout for snapshots
Minimize Copyin, Copyout
Use create
Prevents copy in/out
Delete happens when leaving scope
| ==2242== Profiling result: |
|---|
Compile pgcc –acc -ta=tesla Profile
pgprof <tesla binary>
mig2d and sum2 about the same.
- cuAllocManged (11s) removed.
- cuMemFree (11.5s) reduced to milliseconds.
No longer compiling with :managed