Occupancy is the ratio of the active warps (wavefronts on AMD) to the maximum number of warps on a device.

Having a low occupancy means that there are less threads running on the same time, which results in less latency hiding. On the other hand, having a high occupancy may not always results in better performance, if it causes more resource usage.

  • performance cliff - when a slight increase in one resource leads to a dramatic reduction in parallelism and performance

References