Glossary
TOPS
Tera-Operations Per Second — a marketing-friendly metric for AI accelerator throughput. Higher is better, but the operation precision (INT8, FP16, FP4) and sparsity assumptions matter as much as the number.
TOPS measures how many AI operations a chip can execute per second. Vendors quote it for NPUs, GPUs, and SoCs.
The asterisks
- Precision. 40 TOPS at INT8 ≠ 40 TOPS at FP16. Lower precision yields more TOPS.
- Sparsity. "Sparsity-on" doubles theoretical TOPS by skipping zero weights, but real models rarely hit theoretical sparsity.
- Sustained vs peak. Most quoted figures are peak; sustained throughput under thermal limits is lower.
Why it matters in 2026
Windows Copilot+ PCs require an NPU with ≥40 TOPS for on-device Copilot features. Apple Neural Engine sits around 35 TOPS. NVIDIA's mobile RTX dwarfs both — 200+ TOPS — but at much higher power.
For LLM inference, total memory bandwidth often bottlenecks before TOPS does.
Where this matters
Categories that use tops
Continue reading