Members-Only
Recent Talks & Demos are for members only
You must be an AI Tinkerers active member to view these talks and demos.
Multi-task Audio Transformer Model
The talk explains a unified autoregressive transformer that handles audio and text, covering tokenization, multi-task training for TTS, ASR, and voice completion.
We have pretrained and finetuned a single model that can take in audio or text and output audio or text. This single model can be used for multiple audio-related tasks, like TTS, ASR, and text-to-voice completion. We will demo the TTS part and talk about the overall architecture of the model.
We have hosted the model with ultra-fast inference and low latency.