-
Notifications
You must be signed in to change notification settings - Fork 26
enable evaluation script to also evaluate remote models #294
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Reviewer's GuideThis PR refactors the evaluate_best_checkpoint script to support evaluating remote Hugging Face models by introducing an optional hf_model parameter, overhauling the argument parsing and validation logic, and updating usage examples accordingly. Sequence diagram for evaluation logic with local and remote modelssequenceDiagram
actor User
participant Script as evaluate_best_checkpoint.py
participant Typer
participant Evaluator as LeaderboardV2Evaluator
User->>Script: Run evaluate command with --input-dir or --hf-model
Script->>Typer: Parse arguments
alt Only --input-dir provided
Script->>Script: Validate input_dir exists and is a directory
Script->>Evaluator: Instantiate with model_path = input_dir
else Only --hf-model provided
Script->>Evaluator: Instantiate with model_path = hf_model
else Both or neither provided
Script->>Typer: Print error and exit
end
Evaluator-->>Script: Evaluation results
Script-->>User: Output results
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @RobotSail - I've reviewed your changes - here's some feedback:
- Consider refactoring the input validation logic into a helper function to reduce clutter inside the
evaluate
command. - Add a log message indicating whether you’re using a local directory or a remote HF model before creating the evaluator for better traceability.
- You might leverage Typer’s mutually exclusive option patterns or callbacks to enforce
--input-dir
vs--hf-model
exclusivity more declaratively rather than manual checks.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- Consider refactoring the input validation logic into a helper function to reduce clutter inside the `evaluate` command.
- Add a log message indicating whether you’re using a local directory or a remote HF model before creating the evaluator for better traceability.
- You might leverage Typer’s mutually exclusive option patterns or callbacks to enforce `--input-dir` vs `--hf-model` exclusivity more declaratively rather than manual checks.
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
typer.echo(f"Error: '{input_dir}' is not a directory") | ||
raise typer.Exit(1) | ||
|
||
model_path = hf_model if hf_model else str(input_dir) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion (code-quality): Replace if-expression with or
(or-if-exp-identity
)
model_path = hf_model if hf_model else str(input_dir) | |
model_path = hf_model or str(input_dir) |
Explanation
Here we find ourselves setting a value if it evaluates toTrue
, and otherwiseusing a default.
The 'After' case is a bit easier to read and avoids the duplication of
input_currency
.
It works because the left-hand side is evaluated first. If it evaluates to
true then currency
will be set to this and the right-hand side will not be
evaluated. If it evaluates to false the right-hand side will be evaluated and
currency
will be set to DEFAULT_CURRENCY
.
Summary by Sourcery
Enable the evaluation script to accept either a local checkpoint directory or a remote HuggingFace model by introducing a --hf-model option, enforcing mutual exclusivity, and updating usage instructions
New Features:
Enhancements: