Members-Only
Recent Talks & Demos are for members only
You must be an AI Tinkerers active member to view these talks and demos.
regolo.ai: Scalable GPU Inference
This talk covers building an open-source inference provider focused on GPU scalability, InferenceOPS automation, and Kubernetes integration for efficient model deployment.
Share our experience and our work to create an inference provider for open-source/free access models. The provider we would like to build is centered around the open-source model. Discuss the topics of GPU scalability, the necessary automations in the field of InferenceOPS, and Kubernetes. We would like to receive feedback on potential use and suggestions for development.
Kubernetes GPU platform serving optimized Llama, Qwen, and FLUX models via API.
Related projects
PaperBench: Evaluating AI’s Ability to Replicate AI Research
Rome
This talk presents PaperBench, a benchmark for evaluating AI agents’ ability to replicate state-of-the-art AI research through code…
Building a Sovereign Multi-GPU AI Infrastructure in a European Data Center (in Less Than One Year)
Cologne
How a startup built a sovereign multi‑GPU AI platform in under a year, using Kubernetes, Ray actors, MongoDB,…
Repeated inference in practice
Hamburg
This talk explores using multiple LLM inferences to improve stability and accuracy when identifying relevant website elements for…
MLE-bench: Evaluating Machine Learning Agents on Machine Learning Engineering
Milan
The talk presents MLE‑bench, a benchmark of 75 Kaggle ML‑engineering competitions, shows human baselines, evaluates frontier language models,…
AI Computer
Berlin
Learn how to build a desktop PC with an RTX 3090 for local AI workloads, covering hardware assembly, software…
Omni ingestion RAG
Medellín
This talk covers multimodal ingestion in Retrieval Augmented Generation applications, focusing on processing unstructured data including images, tables,…