.

Members-Only

Recent Talks & Demos are for members only

Exclusive feed

You must be an AI Tinkerers active member to view these talks and demos.

March 28, 2025 · Seattle

AI Gamemaster for Voice Games

This talk explores building an AI Gamemaster for a D&D-like voice-based game, focusing on combining game systems, human content, and gamemastery principles.

Video
Overview
Links
Tech stack
  • Claude
    Claude is Anthropic's flagship family of large language models (LLMs): a high-performance, Constitutional AI system built for safety, complex reasoning, and expert-level collaboration.
    Claude is a next-generation AI assistant developed by Anthropic, a research firm prioritizing AI safety. The models (including Opus, Sonnet, and Haiku) leverage Constitutional AI to ensure helpful, honest, and harmless outputs, a key differentiator from competitors. Claude excels at complex enterprise tasks: processing massive context windows for in-depth data analysis, generating and reviewing code, and providing expert-level summarization for documents up to 200,000 tokens. It is deployed as a conversational chatbot and via API, offering scalable AI solutions for developers and businesses.
  • Sonnet
    Sonnet is Anthropic's powerful, mid-tier AI model, balancing frontier intelligence with high-speed, cost-efficient performance for production-scale deployments.
    Sonnet (currently Claude Sonnet 4.5) is Anthropic’s versatile model, optimized for complex agentic workflows and coding tasks. It delivers state-of-the-art performance, achieving 77.2% on the SWE-bench Verified coding benchmark (cite: 2.2, 2.4). The model is engineered for high-volume, real-time applications like customer support automation and financial analysis, supporting a 200K token context window (cite: 2.8). Pricing is set for efficiency: $3 per million input tokens (cite: 2.8). This makes Sonnet the recommended choice for developers needing top-tier reasoning and coding capability at a practical, scalable cost.
  • Prompt Engineering
    Prompt Engineering is the discipline of structuring inputs (prompts) to Large Language Models (LLMs) to reliably and efficiently elicit a desired, high-quality output.
    This is the core skill for maximizing performance from models like GPT-4 and Claude 3: it's the art and science of guiding an AI. The process involves systematic iteration and applying specific techniques to control the model's behavior and reduce 'hallucination.' Key advanced methods include Chain-of-Thought (CoT) prompting, which forces the LLM to process complex problems step-by-step, and Few-Shot prompting (providing 2-3 examples) to establish a clear output format or style. Mastery of these methods directly translates to tangible gains: improved accuracy, reduced API costs from fewer retries, and production-ready outputs for applications like customer service bots or code generation.
  • Data tracking
    Data tracking is the systematic collection and analysis of user behavior across digital platforms (websites, apps) to inform business strategy and deliver personalized experiences.
    This technology monitors user interactions: it records clicks, page views, and purchases to build detailed behavioral profiles. Core mechanisms include first- and third-party cookies, invisible 1x1 pixel tags, and device fingerprinting (a unique profile of browser settings). Tools like Google Analytics and Adobe Analytics process this data, enabling businesses to optimize conversion rates and execute targeted advertising campaigns. However, this power demands compliance: regulations like GDPR and CCPA now mandate explicit user consent, shifting the operational focus from pure collection to responsible data governance and transparency.
  • State tracking
    State Tracking (ST) is the core computational process that maintains a real-time, structured representation of a user’s goals, constraints, and requests throughout an interaction.
    This technology, often called Dialogue State Tracking (DST) in AI, is mission-critical for task-oriented systems: it estimates the user’s belief state at every turn, typically as a set of slot-value pairs (e.g., 'destination: Paris', 'date: tomorrow'). The DST unit processes natural language understanding (NLU) outputs, updates the belief state, and feeds this structured data to the Dialogue Policy module for the next action. Modern approaches leverage deep learning models like BERT-DST and are validated on benchmarks such as the MultiWOZ dataset, with performance measured by Joint Goal Accuracy (JGA) — a metric that demands perfect state prediction per turn.

Related projects