Technology

ASR

Automatic Speech Recognition (ASR) converts spoken audio into digital text using neural networks like OpenAI’s Whisper or Google’s USM.

ASR technology bridges the gap between human speech and machine processing by leveraging deep learning architectures (specifically Transformers) to transcribe audio in real-time. Modern systems like Whisper handle 680,000 hours of multilingual data to achieve human-level word error rates (WER) across diverse accents and noisy environments. This tech powers everything from Tesla’s voice commands to Zoom’s live captioning, processing raw waveforms into structured data at sub-second latencies. By utilizing large-scale weak supervision, current ASR models eliminate the need for manual fine-tuning, making high-fidelity transcription accessible for global enterprise applications.

https://openai.com/research/whisper

2 projects · 2 cities

Related technologies

Android 11 BERT 179 BLOOM 115 GPT-3 191 GPT-4 528 iOS 6 Kuralit 1 Llama-2 227 PaLM 2 116 RoBERTa 118 Voice AI 4

Recent Talks & Demos

Showing 1-2 of 2

Members-Only

Kuralit: Intent-Driven Mobile Interface

Tiruchirappalli Jan 31