Skip to content

itbench-hub/ITBench

Repository files navigation

ITBench

Paper | Leaderboard | Scenarios | Agents | How to Cite | Contributors | Contacts


📢 Announcements

Latest Updates

  • [June 13, 2025] Identified 25+ additional scenarios to be developed over the summer.
  • [May 2, 2025] 🚀 ITBench now provides fully-managed scenario environments for everyone! Our platform handles the complete workflow—from scenario deployment to agent evaluation and leaderboard updates. Visit our GitHub repository here for guidelines and get started today.
  • [February 28, 2025] 🏆 Limited Access Beta: Invite-only access to the ITBench hosted scenario environments. ITBench handles scenario deployment, agent evaluation, and leaderboard updates. To request access, e-mail us here.
  • [February 7, 2025] 🎉 Initial release! Includes research paper, self-hosted environment setup tooling, sample scenarios, and baseline agents.

Overview

ITBench measures the performance of AI agents across a wide variety of complex and real-world inspired IT automation tasks targeting three key use cases:

Use Case Focus Area
SRE (Site Reliability Engineering) Availability and resiliency
CISO (Compliance & Security Operations) Compliance and security enforcement
FinOps (Financial Operations) Cost efficiencies and ROI optimization

sample_tasks

Key Features

  • Real-world representation of IT environments and incident scenarios
  • Open, extensible framework with comprehensive IT coverage
  • Push-button workflows and interpretable metrics
  • Kubernetes-based scenario environments

What's Included

ITBench enables researchers and developers to replicate real-world incidents in Kubernetes environments and develop AI agents to address them.

We provide:

  1. Push-button deployment tooling for environment setup (open-source)
  2. Framework for recreating realistic IT scenarios using the deployment tooling:
    • 6 SRE scenarios and *21 mechanisms (open-source)
    • 4 categories of CISO scenarios (open-source)
    • 1 FinOps scenario (open-source)
  3. Two reference AI agents:
    • SRE (Site Reliability Engineering) Agent (open-source)
    • CISO (Chief Information Security Officer) Agent (open-source)
  4. Fully-managed leaderboard for agent evaluation and comparison

Roadmap

Timeline Key Deliverables
July 2025 • Refactor leading to a scenario specification generator and runner allowing for most (if not all) mechanisms to be re-used across diverse applications and microservices
• Implementation of 10 of the additional scenarios identified
August 2025 SRE-Agent-Lite: Lightweight agent to assist non-systems personnel with environment debugging
Snapshot & Replay: Data capture and replay capabilities
• Implementation of 15 of the additional scenarios to be developed over the summer
Fall 2025 BYOA (Bring Your Own Application): Support for custom application integration

Leaderboard

The ITBench Leaderboard tracks agent performance across SRE, FinOps, and CISO scenarios. We provide fully managed scenario environments while researchers/developers run their agents on their own systems and submit their outputs for evaluation.

Domain Leaderboard
SRE View SRE Leaderboard
CISO View CISO Leaderboard

Get Started: Visit docs/leaderboard.md for access and evaluation guidelines.


Scenarios

ITBench incorporates a collection of problems that we call scenarios. Each scenario is deployed in an operational environment where specific problems occur.

Examples of Scenarios

  • SRE: Resolve "High error rate on service checkout" in a Kubernetes environment
  • CISO: Assess compliance posture for "new control rule detected for RHEL 9"
  • FinOps: Identify and resolve cost overruns and anomalies

Find all scenarios: Scenarios repository


Agents

Two baseline agents are being open-sourced with ITBench, built using the CrewAI framework.

Agent Features

  • Configurable LLMs: watsonx, Azure, or vLLM support
  • Natural language tools: Interactions with the environment for information gathering

Available Agents

Agent Repository
SRE Agent itbench-sre-agent
CISO Agent itbench-ciso-caa-agent

How to Cite

@misc{jha2025itbench,
      title={ITBench: Evaluating AI Agents across Diverse Real-World IT Automation Tasks},
      author={Jha, Saurabh and Arora, Rohan and Watanabe, Yuji and others},
      year={2025},
      url={https://github.com/IBM/itbench-sample-scenarios/blob/main/it_bench_arxiv.pdf}
}

Join the Discussion

Have questions or need help getting started with ITBench?


Contacts

About

Code repository for ITBench

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published