smolR1

Demonstrating a reproducible DeepSeek R1 implementation using Qwen2.5B‑0.5B on two 4090 GPUs, providing a compact, stable GRPO baseline for rapid RL experimentation.

Overview

reproducing DeepSeek’s R1 on the smallest scale with Qwen2.5B-0.5B on two 4090 GPUs.
a smol and stable baseline for rapid experimentation.

Links

https://github.com/rasdani/smolR1
Reproduces DeepSeek R1 Zero using Qwen2.5-0.5B on two 4090 GPUs.

Tech stack

Related projects

AI Computer

Berlin

Learn how to build a desktop PC with an RTX 3090 for local AI workloads, covering hardware assembly, software…

Aura: A Locally Hosted AI Gaming Companion

Learn how to combine local screen capture, vision, speech recognition, LLM, and text‑to‑speech into a real-time, offline, user‑controlled…

Building a Sovereign Multi-GPU AI Infrastructure in a European Data Center (in Less Than One Year)

Cologne

How a startup built a sovereign multi‑GPU AI platform in under a year, using Kubernetes, Ray actors, MongoDB,…

Deep RL for User Experience

Chicago

Learn how to use Ray’s distributed tuning and parallel processing to scale reinforcement learning predictions and training, including…

Going from 0 to 1 with the help of AI and RL

New York City

Learn how to generate synthetic user personas with Anthropic Claude, conduct AI‑driven interviews, and refine startup hypotheses using…

Teaching small language models a thing or two

Amsterdam

Learn about finetuning small language models for specific tasks using limited data, exploring how this approach can efficiently…