Skip to content
@SWE-bench

SWE-bench

Organization for maintaining SWE-bench and related projects
SWE-bench    SWE-agent    SWE-smith    SWE-ReX    sb-cli

Software engineering agents, benchmarks, and models.
Built and maintained by researchers from Stanford University and Princeton University.

HuggingFace Discord YouTube


This organization contains the source code for several projects in the SWE-* open source ecosystem, including:

  • SWE-bench, a benchmark for evaluating AI systems on real world GitHub issues.
  • SWE-agent, a system that automatically solves GitHub issues using an LM agent.
  • SWE-smith, a toolkit for generating SWE training data at scale.

Also check out the supporting infrastructure for working with SWE-* projects

  • SWE-ReX, infrastructure supporting sandboxed code execution for AI agents
  • sb-cli, a command line interface for running evaluations on the cloud.
  • Mirror clones for SWE-bench and SWE-smith repository are available here and here.

Pinned Loading

  1. SWE-bench SWE-bench Public

    SWE-bench [Multimodal]: Can Language Models Resolve Real-world Github Issues?

    Python 3.1k 537

  2. SWE-smith SWE-smith Public

    Scaling Data for SWE-agents

    Python 252 25

  3. experiments experiments Public

    Open sourced predictions, execution logs, trajectories, and results from model inference + evaluation runs on the SWE-bench task.

    Shell 182 202

  4. sb-cli sb-cli Public

    Run SWE-bench evaluations remotely

    Python 20

Repositories

Showing 8 of 8 repositories
  • SWE-smith Public

    Scaling Data for SWE-agents

    SWE-bench/SWE-smith’s past year of commit activity
    Python 252 MIT 25 15 (1 issue needs help) 4 Updated Jun 18, 2025
  • experiments Public

    Open sourced predictions, execution logs, trajectories, and results from model inference + evaluation runs on the SWE-bench task.

    SWE-bench/experiments’s past year of commit activity
    Shell 182 202 19 15 Updated Jun 15, 2025
  • swe-bench.github.io Public

    Landing page + leaderboard for SWE-Bench benchmark

    SWE-bench/swe-bench.github.io’s past year of commit activity
    HTML 6 7 3 1 Updated Jun 11, 2025
  • SWE-bench Public

    SWE-bench [Multimodal]: Can Language Models Resolve Real-world Github Issues?

    SWE-bench/SWE-bench’s past year of commit activity
    Python 3,083 MIT 537 40 13 Updated Jun 2, 2025
  • sb-cli Public

    Run SWE-bench evaluations remotely

    SWE-bench/sb-cli’s past year of commit activity
    Python 20 MIT 0 1 1 Updated May 21, 2025
  • SkyRL Public Forked from NovaSky-AI/SkyRL

    SkyRL-v0: Train Real-World Long-Horizon Agents via Reinforcement Learning

    SWE-bench/SkyRL’s past year of commit activity
    Python 0 Apache-2.0 42 0 0 Updated May 15, 2025
  • .github Public
    SWE-bench/.github’s past year of commit activity
    0 0 0 0 Updated May 1, 2025
  • humanevalfix-results Public archive

    Evaluation data + results for SWE-agent inference on HumanEvalFix task

    SWE-bench/humanevalfix-results’s past year of commit activity
    Jupyter Notebook 0 0 0 0 Updated Jul 11, 2024

Most used topics

Loading…