Technology
GLM-ASR
A generative speech-to-text framework leveraging the GLM-4 backbone to deliver high-accuracy transcriptions in noisy or technical environments.
GLM-ASR utilizes a 9B-parameter architecture to bridge the gap between acoustic modeling and linguistic context. It outperforms traditional CTC-based systems by using a transformer-based decoder to resolve homophones and complex terminology: achieving a Word Error Rate (WER) under 4.8% on the Wenetspeech corpus. The system supports long-form processing (up to 45 minutes of continuous audio) and handles multi-speaker scenarios with precise timestamping and semantic coherence.
1 project
·
1 city
Related technologies
Recent Talks & Demos
Showing 1-1 of 1