Technology

ONNX Runtime

ONNX Runtime: The cross-platform, high-performance engine for accelerating machine learning model inferencing and training across diverse hardware.

This is your core engine for optimized ML deployment. ONNX Runtime (ORT) is a cross-platform accelerator that significantly boosts performance for both inferencing and training workloads. It achieves this by applying graph optimizations and leveraging hardware-specific Execution Providers (like CUDA, TensorRT, or CoreML) across CPU, GPU, and NPU targets. ORT supports models from major frameworks (PyTorch, TensorFlow, scikit-learn) converted to the ONNX format. Microsoft relies on it heavily: it powers AI in products like Office, Azure, and Bing, handling over 1 trillion daily inferences and offering proven speedups (e.g., 11% throughput gain over PyTorch for BERT-L pre-training).