.

Technology

ASR

Automatic Speech Recognition (ASR) converts spoken audio into digital text using neural networks like OpenAI’s Whisper or Google’s USM.

ASR technology bridges the gap between human speech and machine processing by leveraging deep learning architectures (specifically Transformers) to transcribe audio in real-time. Modern systems like Whisper handle 680,000 hours of multilingual data to achieve human-level word error rates (WER) across diverse accents and noisy environments. This tech powers everything from Tesla’s voice commands to Zoom’s live captioning, processing raw waveforms into structured data at sub-second latencies. By utilizing large-scale weak supervision, current ASR models eliminate the need for manual fine-tuning, making high-fidelity transcription accessible for global enterprise applications.

https://openai.com/research/whisper
5 projects · 8 cities

Related technologies

Recent Talks & Demos

Showing 1-5 of 5

Members-Only

Sign in to see who built these projects