.

Members-Only

Recent Talks & Demos are for members only

Exclusive feed

You must be an AI Tinkerers active member to view these talks and demos.

January 30, 2025 · San Francisco

Instantcasts: Fast Whisper Transcripts

Learn how to process an hour‑long podcast in under ten seconds using optimized Whisper inference, covering model tweaks, audio storage, chunking, prompting, and diarization.

Overview
Tech stack
  • Whisper
    Whisper: OpenAI's robust, open-source ASR model for multilingual speech recognition, translation, and language identification.
    Whisper is OpenAI's general-purpose Automatic Speech Recognition (ASR) model, trained on a massive, diverse dataset for high-accuracy performance. It functions as a powerful multitasking system: handling multilingual transcription, direct speech translation, and language identification. The architecture processes audio in a sliding 30-second window, performing autoregressive predictions. Developers can select from six distinct model sizes to optimize for specific speed versus accuracy tradeoffs: this is the go-to solution for reliable, large-scale audio processing.

Related projects