The execution resources of CUDA are organized in streaming multiprocessors (SM). Multiple thread block can be assigned to each SM.
One SM contains multiple streaming processors (SPs).
Each device sets a limit number of blocks that can be assigned to each SM.
Reference
- Programming massively parallel processors, Chapter 3