GPU has fixed number of resources to use. For example, on NVIDIA G80 we have 8K registers on each SM. So we can either have 768 threads using 10 registers each, or 512 threads using 15 registers each.

A performance cliff is when a slight increase in one resource leads to a dramatic reduction in parallelism and performance.

See Also

  • Occupancy - ratio of active warps to maximum number of warps in device