Members-Only
Recent Talks & Demos are for members only
You must be an AI Tinkerers active member to view these talks and demos.
November 19, 2024
·
Singapore
NVIDIA INT8 ML Training
Use INT8 Tensor Cores on NVIDIA GPUs to train transformers, achieving up to 70% speedup on 4090 and 40% on A100 with minimal accuracy loss.
Overview
This talk explores using INT8 Tensor Cores in recent NVIDIA GPUs to accelerate training of transformer models. Benchmark results show up to 70% speedup on 4090 and 40% speedup on A100 compared to BF16 with minimal accuracy degradation.
Links
PyTorch `torchao` prototypes memory/speed-efficient INT8/BitNet quantized training with stochastic rounding.
Tech stack