Technology

bfloat16

Bfloat16 (Brain Floating Point) is a 16-bit numerical format (8-bit exponent, 7-bit mantissa) engineered by Google Brain to accelerate AI/ML training by preserving FP32's dynamic range.

This 16-bit floating-point format is a performance multiplier for deep learning: it cuts memory usage in half compared to 32-bit float (FP32). Bfloat16's design is strategic, retaining FP32’s crucial 8-bit exponent for a wide dynamic range, which prevents overflow and underflow during training. The trade-off is a reduced 7-bit mantissa, acceptable for most AI workloads where precision is less critical than range. Developed by Google Brain, bfloat16 is now an industry standard, natively supported on major accelerators (Google TPUs, Intel Xeon, NVIDIA GPUs), significantly boosting throughput and enabling larger models and batch sizes.

https://cloud.google.com/tpu/docs/bfloat16

1 project · 1 city

Related technologies

INT8 Tensor Cores 1 NVIDIA A100 1 NVIDIA GeForce RTX 4090 2 Transformer models 1

Recent Talks & Demos

Showing 1-1 of 1

Members-Only

NVIDIA INT8 ML Training

Singapore Nov 19

INT8 Tensor Cores NVIDIA GeForce RTX 4090