Technology

Kaldi

The industry-standard C++ toolkit for speech recognition, providing finite-state transducer based modeling and deep learning integration.

Kaldi is the definitive open-source framework for speech processing (ASR). Built on OpenFST, it offers a modular C++ codebase that supports linear algebra, acoustic modeling, and extensive feature extraction. Researchers use it to build robust systems like the LibriSpeech and Switchboard recipes, leveraging its flexible integration with CUDA for GPU-accelerated neural network training. It remains the primary engine for speech scientists requiring precise control over the decoding graph and lattice generation.

https://kaldi-asr.org/

2 projects · 2 cities

Related technologies

Amazon Transcribe 2 BERT 179 CMU Sphinx 2 Google Cloud Speech-to-Text 2 GPT-3 191 GPT-4 528 IBM Watson Speech to Text 2 Amazon Polly 1 Azure Speech to Text 1 BLOOM 115 Database 8 DeepSpeech 1 eSpeak 1 Festival 1 Google Cloud Text-to-Speech 1 IBM Watson Text to Speech 1 Keras 74 Llama-2 227

Recent Talks & Demos

Showing 1-2 of 2

Members-Only

UzbekVoice

Tashkent Oct 31

Google Cloud Speech-to-Text Google Cloud Text-to-Speech

YT shorts finder

Austin Sep 12

Vision models Whisper