How benchmark can go wrong
the computation got optimized out
Benchmarking the wrong code
Related to the previous point introduce code that doesn’t related to what actually is being benchmarked
Throughput vs latency
See latency vs throughput in benchmarking
Cold cache
See code cache
Sensitivity to function, branch alignment, link order, etc
Cache
Benchmarks often have small cache footprint. Real world applications often use much more cache