Branches that can’t be effectively predicted by CPU are costly because they can cause long pipeline stall.
Related to Data Parallel Programming
- Branchless programming is important in SIMD since it doesn’t have branches in the first place
- Branchless is also important for GPU programming where branch divergence hurts performance a lot