Technology

PPO

Proximal Policy Optimization (PPO) is the default, highly stable reinforcement learning (RL) algorithm: it uses a clipped objective function to ensure policy updates are efficient without becoming catastrophically large.

PPO is a core policy gradient method in deep reinforcement learning, introduced by OpenAI in 2017 (John Schulman et al.). It addresses the instability of earlier methods like Trust Region Policy Optimization (TRPO) by approximating the trust region constraint with a simpler, clipped surrogate objective function (L_CLIP). This design allows for multiple epochs of minibatch updates on sampled data, significantly improving sample efficiency and implementation simplicity. PPO is the go-to algorithm for complex tasks: it has been successfully applied to simulated robotic locomotion, Atari game playing, and, critically, in Reinforcement Learning from Human Feedback (RLHF) to align Large Language Models (LLMs) like ChatGPT with human preferences. Its balance of performance, stability, and ease of tuning makes it an industry standard.

https://doi.org/10.48550/arXiv.1707.06347

40 projects · 29 cities

Related technologies

Python 739 Docker 157 LangChain 439 OpenAI API 500 Transformers 168 Claude 384 FastAPI 159 Flask 32 GPT-4 678 Neo4j 35 PyTorch 264 FAISS 18 FastMCP 13 Gemini 254 Kubernetes 34 Next 197 OpenAI SDK 7 PostgreSQL 144

Recent Talks & Demos

Showing 21-40 of 40

Members-Only

Sign in to see who built these projects

Sign in View FAQ

Hashisstant: WhatsApp LLM Assistant

Homemade and Follow AI Platforms

Amsterdam Aug 27

PROTOSTAR: Scaling Alert Processing

YouTube Knowledge Graph Analysis

GPT-4 LangChain

NVIDIA LLM Router Blueprint

Llama 3 Mixtral 8x22B

Anywhere MCP: Self-Correcting Agents

Orange County Jul 31

LangChain FastAPI

ProQube: AI-Native Procurement Automation

Anthropic Claude

TSK: Safe AI Coding Sandbox

Real-Time Voice Agents

llama OpenAI API

Placeholder: Open Robotics Stack

AI: Judgment and Human Systems

ExaminaAI-CFA Exam Prep

Hong Kong Apr 30

RAG Chain-of-Thought

DoppelGoner: Vector Entity Clustering

Candle ML PostgreSQL

Claude Plays Pokémon via PyBoy

Claude OpenAI API

TensorFlow Agents: Maze Optimization

Vancouver Apr 16

OpenAI Gym TF-Agents

Good Morning Bot

Nashville Apr 9

OpenAI API Ollama

Empathic AI Clarifying Questions

DistilBERT-base-uncased spaCy

Airweave: Indexing Apps for Agents

Amsterdam Feb 26

Agent Laboratory: AI Research Agents

Multimodal Agents with RL Feedback

LangGraph ReactJsonAgent