Nova Sonic: Real-time Voice Assistants

This talk covers building real-time, bidirectional voice applications using Amazon’s Nova Sonic foundation model, with code samples and implementation guidance.

Video

Overview

Description:
Explore a real-time, bidirectional speech interactions using cutting-edge foundation models. Learn how to create fluid, natural-sounding voice applications that respond intelligently in real-time.

For developers and architects interested in conversational AI applications with state-of-the-art voice technology. Code samples and implementation guidelines will be provided to help you get started with your own voice-enabled projects.

Technical Prerequisites:

Programming experience
Basic understanding of audio processing concepts
Basic understanding of Foundation Models

Links

https://github.com/aws-samples/generative-ai-ml-latam-samples/tree/...
Implements Amazon Nova Sonic bidirectional streaming for real-time speech processing via Python.

Tech stack

Related projects

Talking in Real Time: Voice Agents for Live Conversations

Miami

A walkthrough of building a low‑latency, customizable voice agent for real‑time meetings and call‑center use, including integration demos…

The Rise of Visual Agents: Speaking the Future of Business Intelligence

Medellín

Explore how voice and visual AI combine to create Visual Agents that generate dynamic visualizations, insights, and actions…

A deep dive on voice AI and voice agents

Dublin

An in-depth exploration of ElevenLabs’ voice synthesis technology, covering its core features, integration methods, and practical implementation in…

Voicebots

Los Angeles

Learn how to create, customize, and share voice‑enabled GPTs, explore practical use cases, and get feedback on prompt…

Voice AI Agent Architecture: Streaming Deepgram → OpenAI → ElevenLabs in Production

Bogotá

A live technical walkthrough of building a production voice AI agent, detailing orchestration of Deepgram, OpenAI, and ElevenLabs…

Vocal Docs - A smart document editor you can talk to

Seattle

This talk explores a document editor that uses speech for both dictation and editing, demonstrating how language models…