.

Technology

PPO

Proximal Policy Optimization (PPO) is the default, highly stable reinforcement learning (RL) algorithm: it uses a clipped objective function to ensure policy updates are efficient without becoming catastrophically large.

PPO is a core policy gradient method in deep reinforcement learning, introduced by OpenAI in 2017 (John Schulman et al.). It addresses the instability of earlier methods like Trust Region Policy Optimization (TRPO) by approximating the trust region constraint with a simpler, clipped surrogate objective function (L_CLIP). This design allows for multiple epochs of minibatch updates on sampled data, significantly improving sample efficiency and implementation simplicity. PPO is the go-to algorithm for complex tasks: it has been successfully applied to simulated robotic locomotion, Atari game playing, and, critically, in Reinforcement Learning from Human Feedback (RLHF) to align Large Language Models (LLMs) like ChatGPT with human preferences. Its balance of performance, stability, and ease of tuning makes it an industry standard.

https://doi.org/10.48550/arXiv.1707.06347
40 projects · 29 cities

Related technologies

Recent Talks & Demos

Showing 1-24 of 40

Members-Only

Sign in to see who built these projects

Optimización de recursos para LLMs
Bogotá
Transformers PEFT
Hive: Local-First Agent Gateway
Bogotá Apr 23
Bun TypeScript
The LLM Harness Layer
Nashville Mar 25
Python Gradio
UofT: Intelligent Document Search
Toronto Mar 25
Python FastAPI
Holocron: Skilled Trades Operating System
Eastside Entrepreneurs Mar 5
v0 Figma
Local OCR for Administrative Workflows
Tokyo Feb 19
Tesseract Multimodal AI
FHE-Studio: Encrypted AI Inference
Toronto Jan 29
Intel SGX FHE Studio
Transformer Lab: Local to Distributed ML
Toronto Jan 29
Kubernetes SLURM
Forge: Multi-Agent Code Fixes
Seattle Jan 12
Claude Opus Python
Number Theory: AI, Crypto, Optimization
Boston Dec 2
Python Apache Kafka
Arbiter: Zero-Instrumentation LLM Costs
San Francisco Nov 20
OpenAI SDK Gemini
Redops: Agent Memory Engineering
Dhaka Nov 1
RedOps ACE framework
Worldlabs: Single Image Worldbuilding
Toronto Oct 30
Gaussian Splatting PlayCanvas
RapidFire AI: Parallel LLM Experimentation
San Diego Oct 29
PyTorch Transformers
Teaching AI Maya Glyphs
Montreal Oct 21
YOLOv8 ResNet
Vibe Coding: IT Manager to AI
Hong Kong Sep 29
Next Supabase
Rafiki AI Tutor
Nairobi Sep 25
GPT-4 Whisper
Gemini CLI: Terminal Jira Integration
Montreal Sep 24
Gemini CLI Gemini
AI for AR MMORPGs
Brisbane Sep 11
Unity React Native
fastWorkflow: Deterministic Conversational AI
Houston Sep 9
LiteLLM Transformers
Hashisstant: WhatsApp LLM Assistant
Pereira Aug 28
Python Flask
Homemade and Follow AI Platforms
Amsterdam Aug 27
TAO LLM
PROTOSTAR: Scaling Alert Processing
Boston Aug 25
Claude ChatGPT
YouTube Knowledge Graph Analysis
Dubai Aug 23
GPT-4 LangChain