Technology

PPO

Proximal Policy Optimization (PPO) is the default, highly stable reinforcement learning (RL) algorithm: it uses a clipped objective function to ensure policy updates are efficient without becoming catastrophically large.

PPO is a core policy gradient method in deep reinforcement learning, introduced by OpenAI in 2017 (John Schulman et al.). It addresses the instability of earlier methods like Trust Region Policy Optimization (TRPO) by approximating the trust region constraint with a simpler, clipped surrogate objective function (L_CLIP). This design allows for multiple epochs of minibatch updates on sampled data, significantly improving sample efficiency and implementation simplicity. PPO is the go-to algorithm for complex tasks: it has been successfully applied to simulated robotic locomotion, Atari game playing, and, critically, in Reinforcement Learning from Human Feedback (RLHF) to align Large Language Models (LLMs) like ChatGPT with human preferences. Its balance of performance, stability, and ease of tuning makes it an industry standard.

https://doi.org/10.48550/arXiv.1707.06347

40 projects · 29 cities

Related technologies

Python 739 Docker 157 LangChain 439 OpenAI API 500 Transformers 168 Claude 384 FastAPI 159 Flask 32 GPT-4 678 Neo4j 35 PyTorch 264 FAISS 18 FastMCP 13 Gemini 254 Kubernetes 34 Next 197 OpenAI SDK 7 PostgreSQL 144

Recent Talks & Demos

Showing 1-24 of 40

Members-Only

Sign in to see who built these projects

Sign in View FAQ

Optimización de recursos para LLMs

Transformers PEFT

Hive: Local-First Agent Gateway

The LLM Harness Layer

Nashville Mar 25

UofT: Intelligent Document Search

Holocron: Skilled Trades Operating System

Eastside Entrepreneurs Mar 5

Local OCR for Administrative Workflows

Tesseract Multimodal AI

FHE-Studio: Encrypted AI Inference

Intel SGX FHE Studio

Transformer Lab: Local to Distributed ML

Kubernetes SLURM

Forge: Multi-Agent Code Fixes

Claude Opus Python

Number Theory: AI, Crypto, Optimization

Python Apache Kafka

Arbiter: Zero-Instrumentation LLM Costs

San Francisco Nov 20

OpenAI SDK Gemini

Redops: Agent Memory Engineering

RedOps ACE framework

Worldlabs: Single Image Worldbuilding

Gaussian Splatting PlayCanvas

RapidFire AI: Parallel LLM Experimentation

San Diego Oct 29

PyTorch Transformers

Teaching AI Maya Glyphs

Montreal Oct 21

Vibe Coding: IT Manager to AI

Hong Kong Sep 29

Rafiki AI Tutor

Gemini CLI: Terminal Jira Integration

Montreal Sep 24

Gemini CLI Gemini

AI for AR MMORPGs

Brisbane Sep 11

Unity React Native

fastWorkflow: Deterministic Conversational AI

LiteLLM Transformers

Hashisstant: WhatsApp LLM Assistant

Homemade and Follow AI Platforms

Amsterdam Aug 27

PROTOSTAR: Scaling Alert Processing

YouTube Knowledge Graph Analysis

GPT-4 LangChain