.

Technology

Triton Inference Server

Deploy any AI model (TensorFlow, PyTorch, ONNX) with optimized performance: Triton handles concurrent execution and dynamic batching to maximize throughput on NVIDIA GPUs and CPUs.

Triton Inference Server is the dedicated, open-source engine for high-performance AI deployment. It streamlines production inference by supporting major frameworks—TensorRT, PyTorch, TensorFlow, and ONNX—across diverse hardware, including NVIDIA GPUs, x86, and ARM CPUs. The server's core strength lies in its optimization features: Dynamic Batching, Concurrent Model Execution, and Ensemble pipelines. For instance, teams have leveraged dynamic batching to jump from 2 RPS to ~15 RPS per GPU on large computer vision models, making projects viable. Use Triton to manage your model repository and serve real-time, batched, or streaming requests via HTTP/REST or GRPC with guaranteed low latency and high utilization.

https://github.com/triton-inference-server/server
3 projects · 3 cities

Related technologies

Recent Talks & Demos

Showing 1-3 of 3

Members-Only

Sign in to see who built these projects