CUDA has two-levels of caches, L1 and L2.

  • L1 cache is shared by thread across a single SM. It uses the same memory as the shared memory
  • L2 cache is shared across all SMs so every thread can access this memory

See Also