Technology

GLM-ASR

A generative speech-to-text framework leveraging the GLM-4 backbone to deliver high-accuracy transcriptions in noisy or technical environments.

GLM-ASR utilizes a 9B-parameter architecture to bridge the gap between acoustic modeling and linguistic context. It outperforms traditional CTC-based systems by using a transformer-based decoder to resolve homophones and complex terminology: achieving a Word Error Rate (WER) under 4.8% on the Wenetspeech corpus. The system supports long-form processing (up to 45 minutes of continuous audio) and handles multi-speaker scenarios with precise timestamping and semantic coherence.

https://github.com/THUDM/GLM-4

1 project · 1 city

Related technologies

Android 11 GLM-ASR STT 1 iOS 6 Ministral-3 1 Pocket-TTS 1 Rust 49 Server 4 TTS 3

Recent Talks & Demos

Showing 1-1 of 1

Members-Only

UnaMentis: On-Device Voice Models

Portland Mar 5

Pocket-TTS GLM-ASR