Skip to content
vsMars
Glossary

Tensor cores

Specialized matrix-multiply units on NVIDIA RTX GPUs, optimized for the half-precision math used in deep learning. Power DLSS, frame generation, and on-device AI inference.

Tensor cores execute fused multiply-accumulate operations on small matrices (typically 4×4 FP16/BF16/INT8) far faster than CUDA cores can. The architectural payoff: hundreds of TOPS on consumer cards.

What uses them

  • DLSS — neural upscaling and frame generation.
  • NVIDIA Broadcast — background blur, noise removal.
  • Local LLMs — Llama, Mistral inference at usable speed.
  • Stable Diffusion — image generation 5–10× faster than CUDA-only.

Generations

  • Turing — 1st gen.
  • Ampere — 3rd gen, added sparsity.
  • Ada — 4th gen, added FP8.
  • Blackwell — 5th gen, added FP4 + much higher throughput.
Where this matters

Categories that use tensor cores

Continue reading

Other terms you might need