Technology

Context Cache API

Context Caching reduces latency and costs by persisting frequently used model context across multiple API requests.

Google's Context Cache API optimizes long-context workflows by storing massive datasets (like 100,000 lines of code or hour-long videos) in a temporary cache for Gemini models. Instead of re-sending and re-processing the same tokens for every query, developers pay a one-time storage fee (typically $0.01 per 1M tokens per hour) to keep the data warm. This architecture slashes Time to First Token (TTFT) and cuts input costs by up to 90% for repetitive tasks like document analysis or multi-turn chat sessions.

https://ai.google.dev/gemini-api/docs/caching

1 project · 1 city

Related technologies

Gemini 178 MD5 1 net/http 1 Redis 17 Ruby 1

Recent Talks & Demos

Showing 1-1 of 1

Members-Only

Gemini Context Cache Optimization

Montreal Dec 3

Gemini Redis