Technology

TensorFlow Serving

TensorFlow Serving is a high-performance serving system designed for production environments to deploy machine learning models via REST and gRPC interfaces.

Engineered for high-throughput production environments, TensorFlow Serving handles the entire model lifecycle from versioning to inference. It allows teams to deploy new algorithms and experiments instantly without changing client code or stopping the server. By utilizing gRPC and REST APIs, it optimizes hardware acceleration (GPUs and TPUs) to deliver sub-millisecond latency for complex deep learning architectures. The system supports multi-model serving out of the box, ensuring that 100 percent of your inference traffic remains stable during seamless model hot-swapping.

https://www.tensorflow.org/tfx/guide/serving

1 project · 1 city

Related technologies

Groq 32 KServe 1 Multimodal Models 7 ONNX Runtime 6 OpenVINO 1 Seldon Core 1 TensorRT 5 TorchServe 1 Triton Inference Server 3 Voice models 3

Recent Talks & Demos

Showing 1-1 of 1

Members-Only

Multimodal Groq Demo

Denver Jun 10

Groq Multimodal Models