regolo.ai: Scalable GPU Inference

This talk covers building an open-source inference provider focused on GPU scalability, InferenceOPS automation, and Kubernetes integration for efficient model deployment.

Overview

Share our experience and our work to create an inference provider for open-source/free access models. The provider we would like to build is centered around the open-source model. Discuss the topics of GPU scalability, the necessary automations in the field of InferenceOPS, and Kubernetes. We would like to receive feedback on potential use and suggestions for development.

Links

https://regolo.ai
Kubernetes GPU platform serving optimized Llama, Qwen, and FLUX models via API.

Tech stack

Related projects

PaperBench: Evaluating AI’s Ability to Replicate AI Research

Rome

This talk presents PaperBench, a benchmark for evaluating AI agents’ ability to replicate state-of-the-art AI research through code…

Building a Sovereign Multi-GPU AI Infrastructure in a European Data Center (in Less Than One Year)

Cologne

How a startup built a sovereign multi‑GPU AI platform in under a year, using Kubernetes, Ray actors, MongoDB,…

Repeated inference in practice

Hamburg

This talk explores using multiple LLM inferences to improve stability and accuracy when identifying relevant website elements for…

MLE-bench: Evaluating Machine Learning Agents on Machine Learning Engineering

Milan

The talk presents MLE‑bench, a benchmark of 75 Kaggle ML‑engineering competitions, shows human baselines, evaluates frontier language models,…

AI Computer

Berlin

Learn how to build a desktop PC with an RTX 3090 for local AI workloads, covering hardware assembly, software…

Omni ingestion RAG

Medellín

This talk covers multimodal ingestion in Retrieval Augmented Generation applications, focusing on processing unstructured data including images, tables,…