Skip to content

enable evaluation script to also evaluate remote models #294

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

RobotSail
Copy link
Member

@RobotSail RobotSail commented Jul 31, 2025

Summary by Sourcery

Enable the evaluation script to accept either a local checkpoint directory or a remote HuggingFace model by introducing a --hf-model option, enforcing mutual exclusivity, and updating usage instructions

New Features:

  • Add --hf-model option to evaluate remote HuggingFace model repositories

Enhancements:

  • Make input_dir an optional argument and require exactly one of --input-dir or --hf-model
  • Centralize model_path resolution and refactor validation checks in the evaluate command
  • Update usage examples to demonstrate evaluating both local checkpoints and remote models

Copy link

sourcery-ai bot commented Jul 31, 2025

Reviewer's Guide

This PR refactors the evaluate_best_checkpoint script to support evaluating remote Hugging Face models by introducing an optional hf_model parameter, overhauling the argument parsing and validation logic, and updating usage examples accordingly.

Sequence diagram for evaluation logic with local and remote models

sequenceDiagram
    actor User
    participant Script as evaluate_best_checkpoint.py
    participant Typer
    participant Evaluator as LeaderboardV2Evaluator

    User->>Script: Run evaluate command with --input-dir or --hf-model
    Script->>Typer: Parse arguments
    alt Only --input-dir provided
        Script->>Script: Validate input_dir exists and is a directory
        Script->>Evaluator: Instantiate with model_path = input_dir
    else Only --hf-model provided
        Script->>Evaluator: Instantiate with model_path = hf_model
    else Both or neither provided
        Script->>Typer: Print error and exit
    end
    Evaluator-->>Script: Evaluation results
    Script-->>User: Output results
Loading

File-Level Changes

Change Details Files
Extend evaluate CLI to accept remote HF models
  • Made input_dir an optional typer.Option instead of a required argument
  • Added hf_model as an optional typer.Option for remote model repos
  • Enforced mutual exclusivity and required presence of either input_dir or hf_model
  • Replaced direct use of input_dir with a unified model_path variable
scripts/evaluate_best_checkpoint.py
Update script docstring to illustrate new usage
  • Added example commands for evaluating a HF model via --hf-model
  • Added example for evaluating a local model via --input-dir
  • Prefixed remote/local examples with the new 'evaluate' subcommand
scripts/evaluate_best_checkpoint.py

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

Copy link

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @RobotSail - I've reviewed your changes - here's some feedback:

  • Consider refactoring the input validation logic into a helper function to reduce clutter inside the evaluate command.
  • Add a log message indicating whether you’re using a local directory or a remote HF model before creating the evaluator for better traceability.
  • You might leverage Typer’s mutually exclusive option patterns or callbacks to enforce --input-dir vs --hf-model exclusivity more declaratively rather than manual checks.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- Consider refactoring the input validation logic into a helper function to reduce clutter inside the `evaluate` command.
- Add a log message indicating whether you’re using a local directory or a remote HF model before creating the evaluator for better traceability.
- You might leverage Typer’s mutually exclusive option patterns or callbacks to enforce `--input-dir` vs `--hf-model` exclusivity more declaratively rather than manual checks.

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

typer.echo(f"Error: '{input_dir}' is not a directory")
raise typer.Exit(1)

model_path = hf_model if hf_model else str(input_dir)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion (code-quality): Replace if-expression with or (or-if-exp-identity)

Suggested change
model_path = hf_model if hf_model else str(input_dir)
model_path = hf_model or str(input_dir)


ExplanationHere we find ourselves setting a value if it evaluates to True, and otherwise
using a default.

The 'After' case is a bit easier to read and avoids the duplication of
input_currency.

It works because the left-hand side is evaluated first. If it evaluates to
true then currency will be set to this and the right-hand side will not be
evaluated. If it evaluates to false the right-hand side will be evaluated and
currency will be set to DEFAULT_CURRENCY.

@mergify mergify bot added the ci-failure label Jul 31, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant