enable evaluation script to also evaluate remote models #294

RobotSail · 2025-07-31T03:49:24Z

Summary by Sourcery

Enable the evaluation script to accept either a local checkpoint directory or a remote HuggingFace model by introducing a --hf-model option, enforcing mutual exclusivity, and updating usage instructions

New Features:

Add --hf-model option to evaluate remote HuggingFace model repositories

Enhancements:

Make input_dir an optional argument and require exactly one of --input-dir or --hf-model
Centralize model_path resolution and refactor validation checks in the evaluate command
Update usage examples to demonstrate evaluating both local checkpoints and remote models

sourcery-ai · 2025-07-31T03:49:32Z

Reviewer's Guide

This PR refactors the evaluate_best_checkpoint script to support evaluating remote Hugging Face models by introducing an optional hf_model parameter, overhauling the argument parsing and validation logic, and updating usage examples accordingly.

Sequence diagram for evaluation logic with local and remote models

sequenceDiagram
    actor User
    participant Script as evaluate_best_checkpoint.py
    participant Typer
    participant Evaluator as LeaderboardV2Evaluator

    User->>Script: Run evaluate command with --input-dir or --hf-model
    Script->>Typer: Parse arguments
    alt Only --input-dir provided
        Script->>Script: Validate input_dir exists and is a directory
        Script->>Evaluator: Instantiate with model_path = input_dir
    else Only --hf-model provided
        Script->>Evaluator: Instantiate with model_path = hf_model
    else Both or neither provided
        Script->>Typer: Print error and exit
    end
    Evaluator-->>Script: Evaluation results
    Script-->>User: Output results

File-Level Changes

Change	Details	Files
Extend evaluate CLI to accept remote HF models	Made input_dir an optional typer.Option instead of a required argument Added hf_model as an optional typer.Option for remote model repos Enforced mutual exclusivity and required presence of either input_dir or hf_model Replaced direct use of input_dir with a unified model_path variable	`scripts/evaluate_best_checkpoint.py`
Update script docstring to illustrate new usage	Added example commands for evaluating a HF model via --hf-model Added example for evaluating a local model via --input-dir Prefixed remote/local examples with the new 'evaluate' subcommand	`scripts/evaluate_best_checkpoint.py`

Tips and commands

Interacting with Sourcery

Trigger a new review: Comment @sourcery-ai review on the pull request.
Continue discussions: Reply directly to Sourcery's review comments.
Generate a GitHub issue from a review comment: Ask Sourcery to create an
issue from a review comment by replying to it. You can also reply to a
review comment with @sourcery-ai issue to create an issue from it.
Generate a pull request title: Write @sourcery-ai anywhere in the pull
request title to generate a title at any time. You can also comment
@sourcery-ai title on the pull request to (re-)generate the title at any time.
Generate a pull request summary: Write @sourcery-ai summary anywhere in
the pull request body to generate a PR summary at any time exactly where you
want it. You can also comment @sourcery-ai summary on the pull request to
(re-)generate the summary at any time.
Generate reviewer's guide: Comment @sourcery-ai guide on the pull
request to (re-)generate the reviewer's guide at any time.
Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
pull request to resolve all Sourcery comments. Useful if you've already
addressed all the comments and don't want to see them anymore.
Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
request to dismiss all existing Sourcery reviews. Especially useful if you
want to start fresh with a new review - don't forget to comment
@sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

Enable or disable review features such as the Sourcery-generated pull request
summary, the reviewer's guide, and others.
Change the review language.
Add, remove or edit custom review instructions.
Adjust other review settings.

Getting Help

Contact our support team for questions or feedback.
Visit our documentation for detailed guides and information.
Keep in touch with the Sourcery team by following us on X/Twitter, LinkedIn or GitHub.

sourcery-ai

Hey @RobotSail - I've reviewed your changes - here's some feedback:

Consider refactoring the input validation logic into a helper function to reduce clutter inside the evaluate command.
Add a log message indicating whether you’re using a local directory or a remote HF model before creating the evaluator for better traceability.
You might leverage Typer’s mutually exclusive option patterns or callbacks to enforce --input-dir vs --hf-model exclusivity more declaratively rather than manual checks.

Prompt for AI Agents

Please address the comments from this code review:
## Overall Comments
- Consider refactoring the input validation logic into a helper function to reduce clutter inside the `evaluate` command.
- Add a log message indicating whether you’re using a local directory or a remote HF model before creating the evaluator for better traceability.
- You might leverage Typer’s mutually exclusive option patterns or callbacks to enforce `--input-dir` vs `--hf-model` exclusivity more declaratively rather than manual checks.

Sourcery is free for open source - if you like our reviews please consider sharing them ✨

_{Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.}

sourcery-ai · 2025-07-31T03:50:15Z

scripts/evaluate_best_checkpoint.py

+            typer.echo(f"Error: '{input_dir}' is not a directory")
+            raise typer.Exit(1)
+
+    model_path = hf_model if hf_model else str(input_dir)


suggestion (code-quality): Replace if-expression with or (or-if-exp-identity)

Suggested change

model_path = hf_model if hf_model else str(input_dir)

model_path = hf_model or str(input_dir)

Explanation
Here we find ourselves setting a value if it evaluates to True, and otherwise
using a default.

The 'After' case is a bit easier to read and avoids the duplication of
input_currency.

It works because the left-hand side is evaluated first. If it evaluates to
true then currency will be set to this and the right-hand side will not be
evaluated. If it evaluates to false the right-hand side will be evaluated and
currency will be set to DEFAULT_CURRENCY.

enable evaluation script to also evaluate remote models

d3f0fd6

sourcery-ai bot reviewed Jul 31, 2025

View reviewed changes

mergify bot added the ci-failure label Jul 31, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

enable evaluation script to also evaluate remote models #294

enable evaluation script to also evaluate remote models #294

Uh oh!

RobotSail commented Jul 31, 2025 •

edited by sourcery-ai bot

Loading

Uh oh!

sourcery-ai bot commented Jul 31, 2025 •

edited

Loading

Interacting with Sourcery

Customizing Your Experience

Getting Help

Uh oh!

sourcery-ai bot left a comment

Uh oh!

sourcery-ai bot Jul 31, 2025

Uh oh!

Uh oh!

	model_path = hf_model if hf_model else str(input_dir)
	model_path = hf_model or str(input_dir)

enable evaluation script to also evaluate remote models #294

Are you sure you want to change the base?

enable evaluation script to also evaluate remote models #294

Uh oh!

Conversation

RobotSail commented Jul 31, 2025 • edited by sourcery-ai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by Sourcery

Uh oh!

sourcery-ai bot commented Jul 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviewer's Guide

Sequence diagram for evaluation logic with local and remote models

File-Level Changes

Interacting with Sourcery

Customizing Your Experience

Getting Help

Uh oh!

sourcery-ai bot left a comment

Choose a reason for hiding this comment

Uh oh!

sourcery-ai bot Jul 31, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

RobotSail commented Jul 31, 2025 •

edited by sourcery-ai bot

Loading

sourcery-ai bot commented Jul 31, 2025 •

edited

Loading