NVIDIA INT8 ML Training

Use INT8 Tensor Cores on NVIDIA GPUs to train transformers, achieving up to 70% speedup on 4090 and 40% on A100 with minimal accuracy loss.

Overview

This talk explores using INT8 Tensor Cores in recent NVIDIA GPUs to accelerate training of transformer models. Benchmark results show up to 70% speedup on 4090 and 40% speedup on A100 compared to BF16 with minimal accuracy degradation.