Scaling LLM Inference for Reasoning

This talk covers implementing reasoning models by scaling inference-time computation on open source LLMs using techniques like Monte Carlo tree search, GRPO, and beam search.

Overview

implementations of how to go from LLMs to reasoning models, by scaling inference time compute on open source models. Implementing techniques like Monte Carlo tree search, GRPO and beam search.

Links

Tech stack

Related projects

Doing Precise and *Reliable* Automated Reasoning like It is 1972

Medellín

This talk compares modern LLMs' struggles with high-level logic puzzles to the effectiveness of symbolic AI methods like…

SomosNPL

Quito

The talk covers efforts to advance Spanish natural language processing by creating open resources, highlighting SomosNPL's role in…

Reinforcement Learning in Action: Building a Q-Learning System for Real-World Inventory Optimization

Quito

Live code walkthrough of Q‑Learning applied to inventory optimization, covering reward design, Q‑table mechanics, state transitions, and practical…

Building a Unified AI Interface: Live Demo of Dolphin MCP's Cross-Provider Tool Orchestration

Quito

This talk demonstrates building a unified AI interface using Dolphin MCP to orchestrate tools across multiple LLM providers…

Legal AI

Manizales

This talk covers developing a legal AI assistant using generative AI and retrieval-augmented generation to streamline legal research,…

Building High-Performance Search Agents: Local Inference with DuckDuckGo and Google Search Integration

Quito

This talk demonstrates building local high-performance search agents using llama-cpp-agents, integrating DuckDuckGo and Google, optimizing memory for large…