Members-Only
Recent Talks & Demos are for members only
You must be an AI Tinkerers active member to view these talks and demos.
Multi-task Audio Transformer Model
The talk explains a unified autoregressive transformer that handles audio and text, covering tokenization, multi-task training for TTS, ASR, and voice completion.
We have pretrained and finetuned a single model that can take in audio or text and output audio or text. This single model can be used for multiple audio-related tasks, like TTS, ASR, and text-to-voice completion. We will demo the TTS part and talk about the overall architecture of the model.
We have hosted the model with ultra-fast inference and low latency.
Related projects
Demo - AI Powered, rich video composition in real-time
Bengaluru
See how AI enables real-time, rich video composition for recording and live streaming, including voice anonymity. A mainstage…
The CheerLabs
Bengaluru
Explore how the GenMaya engine uses AI to compose video and remix audio in real time, simplifying asset…
Cols AI
Bengaluru
The talk explains how automated voice agents handle calls, navigate web and software interfaces, and resolve customer issues…
RAGBuilder by Krux AI
Bengaluru
Learn how RAGBuilder automatically optimizes chunking, embedding models, and other RAG parameters, evaluates configurations on test data, and…
podscript - CLI tool to generate podcast transcripts using language and speech-to-text models
Bengaluru
Learn how podscript uses LLMs and speech‑to‑text APIs like ChatGPT, Anthropic, Deepgram, and Groq to generate accurate podcast…
MixedVoices: Tracking and Improving Voice Agents
Bengaluru
Learn how to use a Python API and Streamlit dashboard to track, visualize, and test voice agents through…