.

Technology

AI inference

AI inference is the deployment phase: a trained machine learning model processes new, unseen data to generate predictions or real-time outputs.

Inference is where the value is realized: it’s the process of running a trained model—like a Large Language Model (LLM) or a computer vision system—to produce an actionable result. This is a high-stakes, compute-intensive operation focused on low-latency and high-throughput performance. Hardware accelerators are key here: specialized chips like the NVIDIA H100 GPU and Google’s Edge TPU are optimized specifically for this workload, often utilizing techniques like quantization to reduce model weights for speed. Real-world applications include autonomous vehicles making millisecond decisions, real-time language translation, and generative AI services like ChatGPT responding to user prompts.

https://www.ibm.com/topics/ai-inference
5 projects · 6 cities

Related technologies

Recent Talks & Demos

Showing 1-5 of 5

Members-Only

Sign in to see who built these projects