Skip to content

cutalion/litellm-infra

Repository files navigation

LiteLLM Infra Model Evaluation Project

Overview

This repository is an experiment in using state-of-the-art Large Language Models (LLMs) to set up a simple, production-ready infrastructure for an AI API gateway based on LiteLLM. The goal was to test how well different LLMs can interpret a requirements plan and implement a real-world infrastructure project, and then judge each other's work.

Project Workflow

  1. Initial Plan: The project began with a single file, litellm_infra_plan.md, outlining the requirements for a modular, secure, and extensible LiteLLM-based API gateway using Docker Compose and Caddy.

  2. Branching & Model Setup: For each LLM, a new branch was created. Each model was asked to set up the project from scratch, following the plan. The models tested were:

    • o3
    • gemini-2.5-pro
    • claude-4-sonnet
    • claude-4-opus
  3. Cross-Model Judging: After all branches were created, each model was asked (from the main branch) to review all four branches and provide a verdict on which setup was best, with reasoning. Their verdicts are saved in:

Verdicts & Comparative Analysis

Summary Table

Model Winner Chosen Runner-Up Notable Comments
o3 claude-4-opus claude-4-sonnet Most complete, best onboarding
gemini-2.5-pro claude-4-sonnet claude-4-opus Modular, robust, extensible
claude-4-sonnet claude-4-opus claude-4-sonnet Makefile, scripts, prod/dev split
claude-4-opus claude-4-sonnet claude-4-opus Documentation, modularity, security

Key Insights from the Verdicts

  • claude-4-opus and claude-4-sonnet were consistently rated as the top two setups by all models.
  • claude-4-opus was praised for its comprehensive documentation, Makefile automation, onboarding experience, and production/development separation.
  • claude-4-sonnet was highlighted for its modularity, extensibility (with docker-compose.extensions.yml), and future-proof architecture (easy addition of Postgres, Redis, Prometheus, Grafana).
  • gemini-2.5-pro was recognized for its clean file organization and version pinning, but lacked documentation and automation.
  • o3 was the simplest and easiest to understand, but too minimal for production use (no docs, no advanced config, no helper scripts).

Interesting Patterns

  • Documentation and onboarding were universally valued. Branches with a detailed README and setup scripts were always rated higher.
  • Automation (Makefile/scripts) and dev/prod separation were seen as major strengths for real-world use.
  • Modularity and extensibility (especially via separate extension files) were considered best practice for scalable infrastructure.
  • Security and health checks in the Caddy config and Docker Compose were important for production-readiness.
  • Version pinning (as in gemini-2.5-pro) was noted as a good practice, but not enough to outweigh missing docs or automation.

Model Disagreements

  • The top two branches (claude-4-opus and claude-4-sonnet) swapped places as winner/runner-up depending on the model, but all agreed these were the best.
  • All models agreed that o3 was only suitable for demos or as a learning scaffold.

How to Use This Repo

  • To see the original plan:
  • To see each model's implementation:
    • Check out the corresponding branch: o3, gemini-2.5-pro, claude-4-sonnet, claude-4-opus
  • To see the verdicts:
    • Read the *_verdict.md files in the main branch

Conclusion

This project demonstrates that modern LLMs can not only follow infrastructure plans, but also critically evaluate and compare each other's work. The best results come from combining strong documentation, automation, modularity, and production best practices. If you want a robust starting point for a LiteLLM-based API gateway, use the claude-4-opus or claude-4-sonnet branches as your foundation.


This repository is a living experiment. Feel free to contribute, test new models, or suggest improvements to the evaluation process!


Note:

  • All tests and model evaluations were performed in the Cursor editor environment.
  • This README was generated by the GPT-4.1 model.

About

o3, sonnet 4 and gemini 2.5 pro setup a litellm project for prod and dev

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published