Technology
Image
Automates data normalization by resizing images to 224x224 via Pillow and transcoding audio into uniform 16kHz mono formats.
This workflow automates the heavy lifting of data preparation for multimodal AI. We use Pillow to force images into a 224x224 pixel square (the standard for ResNet and VGG architectures) while maintaining aspect ratio through smart padding. On the audio side, we leverage FFmpeg to transcode diverse formats into 16kHz mono WAV files: this ensures consistent sample rates for downstream spectrogram generation. It is a no-nonsense approach to cleaning noise and unifying inputs before they hit the training loop.
72 projects
·
49 cities
Related technologies
Recent Talks & Demos
Showing 61-72 of 72
MCP vs Message Bus AI Agents
Las Vegas
May 8
Redis Streams
MCP
PromptPilot
Rio De Janeiro
Apr 26
GPT-4o
OpenAI API
AI Images for Social Media
Mumbai
Apr 26
Ideogram
Bing Image Creator
Vinchy: AI Fit Matching
Seattle
Apr 24
Flutter
Cloud Run
No-Code Frontend, AI Backend
Seattle
Apr 24
GPT-4
DALL·E 3
Transformers para Daño Sísmico
Quito
Apr 24
PyTorch
Python
FINDER AFRIC
Nairobi
Apr 9
Firebase
Node
Sketch control for image generation
Lausanne
Apr 1
Stable Diffusion
ControlNet
AI Game Master: Bootstrap Success
Seattle
Mar 27
ChatGPT
DALL-E
Automated Test Data Generation
San Francisco
Feb 27
Llama 3
Ollama
BAML: Zero-Shot Extraction
Seattle
Feb 21
Next
BAML
Iris Matching and Edge Mood
Manizales
Jan 22
MedSAM
Swin-UNETR