Technology

SWE-agent benchmark

SWE-agent is the state-of-the-art autonomous system enabling LLMs (like GPT-4o) to fix real-world software bugs using developer tools, with performance measured on the SWE-bench dataset.

SWE-agent is the leading autonomous software engineering system, developed by Princeton and Stanford, that empowers LLMs (like GPT-4o or Claude Sonnet) to resolve real-world GitHub issues. It provides the model with a full suite of developer tools (bash, file editing, search) to generate a correct code patch within a containerized environment. Performance is benchmarked on the industry-standard SWE-bench dataset, which contains 2,294 real GitHub issues. The system is evaluated on its 'resolved rate' (pass@1), which tests both the foundation model and the agentic harness, establishing a rigorous, reproducible measure of AI capability in complex software development.

https://www.swe-agent.com/

1 project · 1 city

Related technologies

Anthropic 34 AWS 29 Sandboxing 2 SWE-bench-sonnet 1

Recent Talks & Demos

Showing 1-1 of 1

Members-Only

Self-modifying code

Seattle Feb 21

AWS Anthropic