.

Technology

GLM-ASR

A generative speech-to-text framework leveraging the GLM-4 backbone to deliver high-accuracy transcriptions in noisy or technical environments.

GLM-ASR utilizes a 9B-parameter architecture to bridge the gap between acoustic modeling and linguistic context. It outperforms traditional CTC-based systems by using a transformer-based decoder to resolve homophones and complex terminology: achieving a Word Error Rate (WER) under 4.8% on the Wenetspeech corpus. The system supports long-form processing (up to 45 minutes of continuous audio) and handles multi-speaker scenarios with precise timestamping and semantic coherence.

https://github.com/THUDM/GLM-4
1 project · 1 city

Related technologies

Recent Talks & Demos

Showing 1-1 of 1

Members-Only

Sign in to see who built these projects