Copy/compute Overlap
When we perform GPU computation, we usually need to copy data to the CPU, perform computation, and copy data back. However, it is possible to start GPU computation even when the copy is partially completed.
In CUDA, if we want to perform copy/compute overlap, we need to use pinned memory.