Members-Only
Recent Talks & Demos are for members only
You must be an AI Tinkerers active member to view these talks and demos.
Bhumi: Outpacing Native AI Inference
Learn how Bhumi's Rust-powered client delivers faster AI inference than native libraries and HTTP calls, supporting OpenAI, Anthropic, and Gemini with parallel processing.
Bhumi is a high-performance AI inference client designed to be faster than any other library, including native implementations and direct HTTP calls. Built in Rust with Python bindings, it optimizes request handling, reduces latency, and significantly improves throughput. Supporting OpenAI, Anthropic, and Gemini, Bhumi provides seamless multi-model switching while being 2-3x faster than LiteLLM and other alternatives.
Bhumi: Rust-built Python AI inference client for fast LLM inference.
Bhumi is a Rust-powered Python client for fast, unified AI inference.
- RustRust is a high-performance systems programming language that guarantees memory and thread safety via its compile-time ownership model.Rust is a statically-typed systems language engineered for performance and reliability, directly challenging C/C++ in speed. Its core innovation is the ownership model and 'borrow checker,' which enforces strict memory and thread safety at compile-time, eliminating data races and null pointer dereferences without a conventional garbage collector. Rust achieves near-native speed through 'zero-cost abstractions,' allowing high-level features to compile into highly optimized code. Major industry players, including Microsoft and Cloudflare, leverage Rust for critical infrastructure, and it is now officially supported for development in the Linux kernel.
- PythonPython: The high-level, general-purpose language built for readability, powering everything from web backends to advanced machine learning models.Python is the high-level, general-purpose language prioritizing clear, readable syntax (via significant indentation), ensuring rapid development for any team . Its ecosystem is massive: use it for robust web development with frameworks like Django and Flask, or leverage its power in data science with libraries such as Pandas and NumPy . The Python Package Index (PyPI) provides thousands of community-contributed modules, offering immediate solutions for tasks from network programming to GUI creation . The language is actively maintained by the Python Software Foundation (PSF), with the stable release currently at Python 3.14.0 (as of November 2025) .
- OpenAI APIOpenAI API: Your direct gateway to cutting-edge AI models (GPT-4o, DALL-E 3, Whisper), enabling scalable, multimodal intelligence integration into any application.The OpenAI API provides authenticated, programmatic access to a powerful suite of generative AI models. Developers leverage REST endpoints and official libraries (Python, Node.js) to integrate capabilities like advanced text generation (GPT-4o), image creation (DALL-E 3), and speech-to-text transcription (Whisper). This platform is engineered for scale, supporting millions of daily requests for tasks from complex reasoning to real-time customer support agents, ensuring your application gets reliable, state-of-the-art intelligence.
- Anthropic APIProgrammatic access to Anthropic's Claude models (Opus, Sonnet, Haiku) for complex reasoning, vision, and tool-use applications.The Anthropic API delivers programmatic access to the Claude model family (Opus, Sonnet, Haiku), enabling developers to integrate state-of-the-art AI into applications. Use the Messages API for conversational tasks, leveraging Claude 3.5 Sonnet for balanced performance or Claude 3 Opus for complex analysis. Key features include Tool Use (function calling), Vision capabilities for image analysis, and a large 200K token context window for extensive document processing. This API provides a powerful, reliable foundation for next-generation AI projects.
- GeminiGoogle's natively multimodal AI model: understands and operates across text, code, audio, image, and video.Gemini is Google's most capable and general AI model, engineered from the ground up to be natively multimodal: it seamlessly understands and combines information across text, code, audio, image, and video inputs. The technology is optimized for flexibility, running efficiently on everything from data centers to mobile devices. It is deployed in three key sizes: Ultra (for highly complex tasks), Pro (for broad scaling), and Nano (for efficient on-device tasks). Developers access this power via the Gemini API to build next-generation applications.
Related projects
Introduction Groq - world's fastest AI inference
Singapore
Learn how Groq's LPU hardware and software platform provides high‑speed, energy‑efficient AI inference, offers free cloud compute for…
Run Local, open source AI
Singapore
Learn how to run open-source models like Llama3, Mistral, and Gemma locally using Jan.ai and Cortex.so, with practical…
Artecon - A hotspot for AI
Seattle
Learn how to run CPU‑based ML models with low latency, using small public models and post‑processing, then bundle…
Big Models, Small Machines: Run Full-Precision LLMs on Low Memory
London
Learn how to run full‑precision LLMs on low‑memory devices using a custom inference strategy, demonstrated with a 1.7B…
How we built one of the most accurate computer use agent, and how we are scaling it
Singapore
The talk covers Iris, a computer use agent capable of browsing, reading files, and connecting to MCP servers,…
Ho Jiak Bo: a food recommender app based on top food blogs
Singapore
Build an AI food recommender by crawling Singapore blogs, extracting JSON via LLMs, adding map data, and blending…