Members-Only
Recent Talks & Demos are for members only
You must be an AI Tinkerers active member to view these talks and demos.
Nebius: Faster, Cheaper LLMs
Discover how to make inference cheaper and faster using third-party providers like Nebius AI Studio, and see LLM tracing in action. Free credits will be given.
- Shortly talk about advantages of 3rd party inference providers.
- I will show, how any private inference provider, can be substituted by 3rd party
- Show with Nebius AI Studio Inference
- Showcase LLM tracing
- give away free credits
Nebius offers NVIDIA H100/H200/GB200 GPU clusters via InfiniBand and Kubernetes orchestration.
Scalable LLM inference service offers ultra-low latency via OpenAI-compatible API.
- Nebius AI StudioNebius AI Studio is a high-performing Inference-as-a-Service platform for deploying, fine-tuning, and scaling leading open-source LLMs and text-to-image models.This is your end-to-end platform for AI inference: deploy models like Llama 3.1 and Mistral with zero MLOps overhead. Nebius AI Studio provides an OpenAI-compatible API and a user-friendly Playground for testing, comparing, and fine-tuning models against your domain-specific data. Leverage its proprietary infrastructure for ultra-low latency and cost-efficient, per-token pricing, a factor recognized by Artificial Analysis. The platform supports high-volume workloads, offering a standard capacity of 100M+ tokens per minute for text models. Beyond LLMs, it integrates text-to-image capabilities using models like Flux Schnell and SDXL, ensuring you can scale both language and visual generation at an enterprise level.
- LLM tracingLLM tracing captures the full execution path of AI requests (prompts, tool calls, and generations) for debugging, performance optimization, and cost analysis.LLM tracing is your essential observability layer for GenAI applications: it maps the entire request lifecycle, from initial prompt to final response, using structured spans (OpenTelemetry standard). This granular visibility is critical for debugging complex agent workflows (LangChain, LlamaIndex) and identifying bottlenecks. You get immediate, actionable metrics: track token-level usage for cost control, pinpoint latency spikes across retrieval-augmented generation (RAG) steps, and save production traces for robust evaluation and fine-tuning. Implement tracing now to move your LLM app from prototype to reliable, cost-efficient production.
- NebiusNebius delivers vertically integrated AI infrastructure: a full-stack cloud platform built on massive NVIDIA GPU clusters for high-performance AI training and inference.Nebius Group N.V. (NASDAQ: NBIS), headquartered in Amsterdam, is a specialized technology provider focused exclusively on AI infrastructure. We deploy and manage large-scale, cost-efficient GPU clusters—featuring thousands of NVIDIA H100, H200, and Blackwell GPUs—across Europe and the US, including a 300 MW region under construction in New Jersey. Our full-stack cloud platform provides AI practitioners with a supercomputer-level performance, validated by major multi-year AI infrastructure deals with hyperscalers like Microsoft and Meta Platforms. We offer a ready-to-go environment: high-performance InfiniBand networking, managed Kubernetes, and 24/7 expert support to accelerate development cycles in demanding sectors like healthcare and finance.
Related projects
A Hacker's Guide to Slashing LLM Bills
Orange County
Learn how to analyze production LLM traffic, identify cost‑driving patterns, and configure quantization, caching, batching, and latency settings…
Nandayo - AI Driven Support Agent
Montreal
Learn how Nandayo, an AI-driven support agent, integrates with any infrastructure to automatically monitor, triage, and resolve routine…
Building an Agentic Orchestrator for LLM Testing and Evaluation
Berlin
Explore Penelope, an agentic orchestrator coordinating multi-step LLM tests. Learn how it manages workflows, model calls via LiteLLM,…
LuminaLog - AI Journaling for self-development
Palo Alto
Explore how LuminaLog uses speech‑to‑text and an AI companion to analyze journal entries, offering concrete feedback, insights, and…
A Fireside chat with Guy Podjarny, founder of Tessl, on AI Native Development
London
Explore how AI shifts software development from code‑centric to spec‑centric, letting users define desired outcomes while AI handles…
The next evolution of AI - Neurosymbolic
London
Demonstration of a lightweight neurosymbolic platform that maps data, lets you define business logic in plain English, and…