Members-Only
Recent Talks & Demos are for members only
You must be an AI Tinkerers active member to view these talks and demos.
Open-webui
Learn how to use Open‑WebUI to integrate multiple local or API LLMs with a RAG pipeline, reducing costs and enabling multi‑user, web‑based deployment.
How to leverage multiple LLM’s (local or api endpoint) within a polished framework
live demo arxiv rag pipeline I’ve been developing using a pipelines solution
Self-hosted AI platform supporting Ollama, OpenAI, RAG, and Docker deployment.
This Llama Index RAG pipeline retrieves full papers using PGVector, Mistral.
- OpenWebUIOpenWebUI is the extensible, self-hosted AI platform (Docker/Kubernetes) that unifies Ollama and OpenAI-compatible APIs for a powerful, privacy-first LLM experience.OpenWebUI delivers a robust, self-hosted AI platform, ensuring a privacy-first approach for your large language models. Deployment is straightforward: use Docker or Kubernetes for seamless setup (supports :ollama and :cuda images). It acts as a universal frontend, supporting both local LLM runners like Ollama and external OpenAI-compatible APIs (e.g., GroqCloud, LMStudio). Key features include a built-in Retrieval Augmented Generation (RAG) engine for document interaction, granular Role-Based Access Control (RBAC) for multi-user security, and native Python function-calling for advanced agent creation. This architecture provides a single, user-friendly interface for managing diverse LLM workloads efficiently.
- RAGRAG (Retrieval-Augmented Generation) is the GenAI framework that grounds LLMs (like GPT-4) on external, verified data, drastically reducing model hallucinations and providing verifiable sources.RAG is a critical GenAI architecture: it solves the LLM 'hallucination' problem by inserting a retrieval step before generation. A user query is vectorized, then used to query an external knowledge base (e.g., a Pinecone vector database) for relevant document chunks (typically 512-token segments). These retrieved facts augment the original prompt, providing the LLM (e.g., Gemini or Llama 3) the specific, current, or proprietary context required. This process ensures the final response is accurate and grounded in domain-specific data, avoiding the high cost and latency of full model retraining.
- APIThe Application Programming Interface (API) is the digital contract that allows two separate software systems to communicate and exchange data, typically JSON, securely over a network.An API is the essential communication layer: it defines the methods (GET, POST, DELETE) and the data structures (often JSON) for two distinct software applications to interact. This interface acts as a secure intermediary, managing authentication (via API keys or OAuth 2.0) and ensuring only authorized data is exchanged between the client and server. For example, the Stripe API handles billions of dollars in payments by exposing a single endpoint for a charge request, while the Google Maps API allows a third-party application to request and display complex map data, saving millions of development hours and enabling rapid feature deployment across the modern web.
- On-PremisesFull control: your hardware, software, and data reside on your company's physical premises, managed entirely by your internal IT team.On-Premises is the traditional IT deployment model: your organization owns, installs, and manages the entire technology stack—hardware, software, and security—within its own data center. This setup guarantees maximum control over data sovereignty and compliance, which is essential for regulated industries like finance or healthcare. Expect a significant upfront capital expenditure (CapEx) for server racks and perpetual software licenses. Your dedicated IT staff handles all operations: patching, maintenance, and disaster recovery. The trade-off is clear: you get predictable, low-latency performance and total physical access, but scaling capacity requires manual procurement and installation of new physical assets.
Related projects
BetaTester - Bot that tests the UI / UX of your web app
San Francisco
This talk covers BetaTester, an open-source tool using LLMs and Playwright to automatically test web app UI/UX flows,…
AI ∞ UI: A Versatile Web Interface for Seamless Interaction with LLM APIs
Seattle
A walkthrough of AI ∞ UI, showing model switching, system message and temperature controls, conversation management, markdown/LaTeX support, and layered…
Controllable AI Video Generation: Wan 2.1 & ComfyUI
Los Angeles
Learn how Wan 2.1’s control‑code system and ComfyUI integration enable precise, multimodal video generation and collaborative prototyping for…
Building an LLM Email Assistant
Orange County
Learn how to build an email assistant using OpenAI's LLM: system architecture, prompt design, integration steps, and a…
CorpusKeeper - Talk To Data
Seattle
Demonstrating how to integrate LLMs, RAG, and function calls to create an agency‑style interface that designs, scripts, and…
LLM drives a web browser
New York City
This talk demonstrates an open-source interface that enables large language models to interact with web pages through a…