Technology

Cactus Compute Runtime (on‑device inference engine for FunctionGemma

High-performance on-device inference engine built to execute FunctionGemma models with sub-150ms latency for local tool calling.

Cactus Compute Runtime serves as the dedicated execution layer for FunctionGemma: a model family optimized for precise API orchestration. It leverages hardware acceleration (specifically Vulkan and Metal) to run 2B and 7B parameter models directly on mobile chipsets. By utilizing 4-bit quantization and tight memory management, the runtime enables complex intent recognition and local device actions (like smart home control or calendar scheduling) without cloud round-trips.

https://huggingface.co/google/function-gemma-7b

0 projects · 0 cities

Recent Talks & Demos

Showing 1-0 of 0

Members-Only

No public projects found for this technology yet.