Stream is a series of operation that are executed in issued order. They allow multiple CUDA kernels to launch concurrently.
There is always one default stream.
There can be no execution in any non-default streams at the same time as any execution in the default stream. Or in other word, default stream cannot overlap with non-default stream.
Many CUDA runtime functions takes a stream argument (which default to 0)
Kernels can be launched with non-default streams using the 4th launch configuration argument:
kernel<<<grid, block, shared_memory, stream>>>
API
Creation
cudaStream_t stream;
cudaStreamCreate(&stream);
destruction
cudaStreamDestroy(stream);