Members-Only
Recent Talks & Demos are for members only
You must be an AI Tinkerers active member to view these talks and demos.
Transcriber R&D project
A demo of a Next.js app that evaluates and improves transcription quality and timestamps using multiple speech-to-text models, alignment, and merging techniques.
I’ll present a custom-built tool designed to evaluate the quality of timestamps and text generated by various speech-to-text engines. At Capsule, precise transcriptions and timestamps are essential for video editing, making accuracy in these areas critical.
In just 4 minutes, I’ll demo a Next.js app that integrates with the media library and visualizes outputs from different versions of our transcription pipeline. This pipeline includes ML models such as VAD, language identification, speech-to-text engines (e.g., Whisper, AssemblyAI, Reverb), alignment model, and LLMs.
You’ll get a glimpse into how I address complex challenges using techniques like chunking, LLM, Diff, Merge algorithms, and logging. I’ll cover solutions for issues such as missed repetitions, punctuation improvements, timestamp alignment losses, hallucinations, Japanese tokenization errors, and how merging outputs from two transcription models enhances quality of text and timestamps.
Related projects
journaling and note-taking with inline AI
San Francisco
This talk explores building a note-taking app with Excel-like formulas and inline AI using Claude’s citations API for…
Two dead cats in a dark room
San Francisco
This talk presents a content aggregator that searches personalized data sources like RSS feeds and YouTube subtitles to…
11.ai all the things
San Francisco
This talk demonstrates a voice-controlled AI home assistant using MCP and Cerebras for fast, accurate control of lights,…
Dittto
Los Angeles
Learn how Dittto lets teams instantly test multiple brand voices to generate clearer, more effective hero copy, with…
Zero shot voice cloning vs fine-tuning
San Francisco
This talk compares zero-shot voice cloning and fine-tuning methods, demonstrating voice cloning from short samples using state-of-the-art models…
unrav.io - Make complex simple again
Seattle
Live demonstration of unrav.io, an AI‑powered tool that transforms any web page into summaries, audio podcasts, mind maps,…