Technology
SWE-agent benchmark
SWE-agent is the state-of-the-art autonomous system enabling LLMs (like GPT-4o) to fix real-world software bugs using developer tools, with performance measured on the SWE-bench dataset.
SWE-agent is the leading autonomous software engineering system, developed by Princeton and Stanford, that empowers LLMs (like GPT-4o or Claude Sonnet) to resolve real-world GitHub issues. It provides the model with a full suite of developer tools (bash, file editing, search) to generate a correct code patch within a containerized environment. Performance is benchmarked on the industry-standard SWE-bench dataset, which contains 2,294 real GitHub issues. The system is evaluated on its 'resolved rate' (pass@1), which tests both the foundation model and the agentic harness, establishing a rigorous, reproducible measure of AI capability in complex software development.
Related technologies
Recent Talks & Demos
Showing 1-1 of 1