Technology
Whisper
Whisper: OpenAI's robust, open-source ASR model for multilingual speech recognition, translation, and language identification.
Whisper is OpenAI's general-purpose Automatic Speech Recognition (ASR) model, trained on a massive, diverse dataset for high-accuracy performance. It functions as a powerful multitasking system: handling multilingual transcription, direct speech translation, and language identification. The architecture processes audio in a sliding 30-second window, performing autoregressive predictions. Developers can select from six distinct model sizes to optimize for specific speed versus accuracy tradeoffs: this is the go-to solution for reliable, large-scale audio processing.
Related technologies
Recent Talks & Demos
Showing 1-24 of 40