Glossary
Tensor cores
Specialized matrix-multiply units on NVIDIA RTX GPUs, optimized for the half-precision math used in deep learning. Power DLSS, frame generation, and on-device AI inference.
Tensor cores execute fused multiply-accumulate operations on small matrices (typically 4×4 FP16/BF16/INT8) far faster than CUDA cores can. The architectural payoff: hundreds of TOPS on consumer cards.
What uses them
- DLSS — neural upscaling and frame generation.
- NVIDIA Broadcast — background blur, noise removal.
- Local LLMs — Llama, Mistral inference at usable speed.
- Stable Diffusion — image generation 5–10× faster than CUDA-only.
Generations
- Turing — 1st gen.
- Ampere — 3rd gen, added sparsity.
- Ada — 4th gen, added FP8.
- Blackwell — 5th gen, added FP4 + much higher throughput.
Where this matters
Categories that use tensor cores
Continue reading