Technology

WhisperX

WhisperX is the enhanced, open-source automatic speech recognition (ASR) tool: it delivers 70x real-time transcription, accurate word-level timestamps, and full speaker diarization.

WhisperX takes OpenAI's Whisper model and elevates its performance for production-grade ASR. We're talking major upgrades: it achieves up to 70x real-time transcription speed using batched inference with the `large-v2` model. Crucially, it replaces Whisper's inaccurate utterance-level timing with precise word-level timestamps, leveraging phoneme-based ASR alignment (Wav2Vec2). For multi-speaker audio, it integrates `pyannote-audio` for robust speaker diarization, assigning distinct IDs to each voice. This makes WhisperX the definitive solution for high-precision tasks like video subtitling and complex meeting transcription.

https://github.com/m-bain/whisperX

1 project · 1 city

Related technologies

ElevenLabs API 3 GPT-4o 56 Librosa 2 Pydub 1

Recent Talks & Demos

Showing 1-1 of 1

Members-Only

Podcast Localization using LLMs

Singapore Apr 25

WhisperX GPT-4o