Members-Only
Recent Talks & Demos are for members only
You must be an AI Tinkerers active member to view these talks and demos.
VOZY: Voice AI Sales Agent
Demonstration of VOZY, a voice‑enabled AI receptionist that handles inbound sales calls, answers FAQs, gathers lead information, and logs contacts in real time.
This agent is a virtual assistant to attend phone calls from prospect or potential customers, answer some FAQs about the owner company and catch the interest details and contact details.
- Speech-to-TextSpeech-to-Text (STT) instantly converts spoken audio into written text: it’s the core engine for voice assistants like Alexa and real-time captioning across 125+ languages.Speech-to-Text, formally Automatic Speech Recognition (ASR), leverages sophisticated deep learning models to transform human speech into a digital text format. This technology powers critical enterprise applications: transcribing contact center calls, generating subtitles for live media, and enabling voice commands for smart devices. Major providers, including Google Cloud and Amazon Transcribe, offer APIs with high accuracy (often 95%+) and features like speaker diarization and custom vocabulary, making voice data actionable across nearly every industry.
- Text-to-SpeechText-to-Speech (TTS) is the AI-driven technology that converts written text into synthesized, human-like audio: it gives your digital content a voice.TTS is a core speech synthesis technology, leveraging deep learning and neural networks to transform raw text into natural-sounding speech. The process involves linguistic analysis (parsing grammar and context) and acoustic modeling (generating the audio waveform). Modern Neural TTS systems deliver high-fidelity voices across 50+ languages, moving far beyond the robotic sound of older systems. Key applications drive major efficiency gains: accessibility tools for users with reading disabilities, automated customer service via IVR systems, and high-volume content production for audiobooks and video narration.
- Generative AIGenerative AI employs foundation models (e.g., Large Language Models) to create novel, complex content—text, images, code, and audio—from simple user prompts.Generative AI is a deep learning paradigm focused on *creating* new output, not just classifying data. Key models like OpenAI's GPT-4 and Stability AI's Stable Diffusion leverage massive datasets (trillions of parameters) to identify complex patterns. This enables them to generate high-quality, original content: from drafting software code and summarizing 50-page reports to producing photorealistic images in seconds. It fundamentally shifts the human-computer interaction model from command-based to prompt-based creation, driving immediate, high-impact productivity gains across all industries.
- Vector storeIt is a specialized index for high-dimensional vector embeddings, enabling millisecond-speed semantic similarity search for AI and RAG applications.A vector store is a specialized data engine: it efficiently stores, indexes, and manages high-dimensional vector embeddings—the numerical representations of unstructured data (text, images, audio). Unlike traditional databases, the vector store utilizes Approximate Nearest Neighbor (ANN) algorithms (e.g., HNSW) to quickly find data points that are conceptually similar, not just keyword-matched. This core capability is indispensable for modern AI workflows, specifically powering semantic search, recommendation systems, and Retrieval-Augmented Generation (RAG) in Large Language Models (LLMs). Look at platforms like Milvus and Qdrant for open-source implementations, or Pinecone for a fully managed service.
- RAGRAG (Retrieval-Augmented Generation) is the GenAI framework that grounds LLMs (like GPT-4) on external, verified data, drastically reducing model hallucinations and providing verifiable sources.RAG is a critical GenAI architecture: it solves the LLM 'hallucination' problem by inserting a retrieval step before generation. A user query is vectorized, then used to query an external knowledge base (e.g., a Pinecone vector database) for relevant document chunks (typically 512-token segments). These retrieved facts augment the original prompt, providing the LLM (e.g., Gemini or Llama 3) the specific, current, or proprietary context required. This process ensures the final response is accurate and grounded in domain-specific data, avoiding the high cost and latency of full model retraining.
Related projects
Vox Machina
San Francisco
Learn how to process multiple audio streams, build per‑user game state, and enable a voice bot to answer…
Voice AI Agent Architecture: Streaming Deepgram → OpenAI → ElevenLabs in Production
Bogotá
A live technical walkthrough of building a production voice AI agent, detailing orchestration of Deepgram, OpenAI, and ElevenLabs…
Your technical interviews sucks, so let's fix it.
Medellín
Demonstrates building a real-time voice interview agent with Realtime API and Chainlit, showing tool integration, prompt design, and…
Agentes de voz que modifican tu interface en tiempo real
Medellín
Build a real‑time, voice‑driven multi‑agent system using Google ADK and Gemini Live API, with live function calls that…
The Rise of Visual Agents: Speaking the Future of Business Intelligence
Medellín
Explore how voice and visual AI combine to create Visual Agents that generate dynamic visualizations, insights, and actions…
Automating What 400 People Do: Building an AI Agent for Trade Compliance
Medellín
Demonstrating an AI system that extracts, interprets, and validates import documents using legal logic and embeddings to automate…