Glossary

TOPS

Tera-Operations Per Second — a marketing-friendly metric for AI accelerator throughput. Higher is better, but the operation precision (INT8, FP16, FP4) and sparsity assumptions matter as much as the number.

TOPS measures how many AI operations a chip can execute per second. Vendors quote it for NPUs, GPUs, and SoCs.

The asterisks

Precision. 40 TOPS at INT8 ≠ 40 TOPS at FP16. Lower precision yields more TOPS.
Sparsity. "Sparsity-on" doubles theoretical TOPS by skipping zero weights, but real models rarely hit theoretical sparsity.
Sustained vs peak. Most quoted figures are peak; sustained throughput under thermal limits is lower.

Why it matters in 2026

Windows Copilot+ PCs require an NPU with ≥40 TOPS for on-device Copilot features. Apple Neural Engine sits around 35 TOPS. NVIDIA's mobile RTX dwarfs both — 200+ TOPS — but at much higher power.

For LLM inference, total memory bandwidth often bottlenecks before TOPS does.

Where this matters

Categories that use tops

Laptops Smartphones Graphics Cards

Other terms you might need

10-bit color 3DMark Time Spy score 5G 6 GHz band Active noise cancellation (ANC)Adobe RGB AF points AMOLED Full glossary →