RESULTS

==55246== Profiling result:

Compile

pgcc –acc \

-ta=tesla:managed Profile ! pgprof <managed binary>

main:

Managed Compile

Verbose output

Guided enhancements

Targeted changes

Common Optimizations

Data Movement

Copy, copyin, copyout

Create, delete Update

Loop Collapse

main:

Data Movement

Analyze data flow in application

Explicitly use data directives

Move data directive to main

Create only when possible

Copyin move data to GPU

Update to move data to host

sum2: (managed)

Data Movement 571, Generating copyin(t2[:nx][:nz],t1[:nx][:nz])

Explicitly use present for Generating copyout(t[:nx][:nz]) data already on GPU!

Collapse

Increase the threads nx*nz

Present

Data is already on the GPU

Prevent data movement

Data Movement

Move large data transfers to main i.e. mig, mig1

Minimize Copyin, Copyout

Maximize Create, Present

Prevents data transfers

Use Copyin, Copyout, Copy only when data changes!

Delete happens when leaving scope

mig2d:

void mig2d(float * restrict trace, int nt, float ft,...)

{ ...

pragma acc data copyin(trace[0:nz],trf[0:nt+2mtmax]) \ present(*mig, mig1, tb,tsum,tvsum,cssum,pb,... \ create(tmt[0:nxt][0:nzt], ampt[0:nxt][0:nzt],...

{

Resit: (managed)

Data Movement

resit:

Use present for data already on GPU!

Collapse

Increase the threads nx*ns

Present

Data is already on the GPU

Prevent data movement

Data Movement

mig, mig1 data large

Move to main

Copyin at start

Mark as present

Copyout for snapshots

Minimize Copyin, Copyout

Use create

Prevents copy in/out

Delete happens when leaving scope

==2242== Profiling result:

Compile pgcc –acc -ta=tesla Profile

pgprof <tesla binary>

mig2d and sum2 about the same.

  • cuAllocManged (11s) removed.
  • cuMemFree (11.5s) reduced to milliseconds.

No longer compiling with :managed

results matching ""

    No results matching ""