The execution resources of CUDA are organized in streaming multiprocessors (SM). Multiple thread block can be assigned to each SM.

One SM contains multiple streaming processors (SPs). SM.jpg

Each device sets a limit number of blocks that can be assigned to each SM.

Reference