Bhumi: Outpacing Native AI Inference

Learn how Bhumi's Rust-powered client delivers faster AI inference than native libraries and HTTP calls, supporting OpenAI, Anthropic, and Gemini with parallel processing.

Overview

Bhumi is a high-performance AI inference client designed to be faster than any other library, including native implementations and direct HTTP calls. Built in Rust with Python bindings, it optimizes request handling, reduces latency, and significantly improves throughput. Supporting OpenAI, Anthropic, and Gemini, Bhumi provides seamless multi-model switching while being 2-3x faster than LiteLLM and other alternatives.

Links

https://github.com/justrach/bhumi
Bhumi: Rust-built Python AI inference client for fast LLM inference.
https://bhumi.trilok.ai
Bhumi is a Rust-powered Python client for fast, unified AI inference.

Tech stack

Related projects

Introduction Groq - world's fastest AI inference

Singapore

Learn how Groq's LPU hardware and software platform provides high‑speed, energy‑efficient AI inference, offers free cloud compute for…

Run Local, open source AI

Singapore

Learn how to run open-source models like Llama3, Mistral, and Gemma locally using Jan.ai and Cortex.so, with practical…

Artecon - A hotspot for AI

Seattle

Learn how to run CPU‑based ML models with low latency, using small public models and post‑processing, then bundle…

Big Models, Small Machines: Run Full-Precision LLMs on Low Memory

London

Learn how to run full‑precision LLMs on low‑memory devices using a custom inference strategy, demonstrated with a 1.7B…

How we built one of the most accurate computer use agent, and how we are scaling it

Singapore

The talk covers Iris, a computer use agent capable of browsing, reading files, and connecting to MCP servers,…

Ho Jiak Bo: a food recommender app based on top food blogs

Singapore

Build an AI food recommender by crawling Singapore blogs, extracting JSON via LLMs, adding map data, and blending…