Skip to content

Evals does not evaluate all metrics if one fails #1126

Open
@vinod-seshadri-TA

Description

@vinod-seshadri-TA

Title

AgentEvaluator only reports the first failing metric, subsequent metrics not evaluated

Describe the bug

When running tests that involve AgentEvaluator.evaluate_eval_set, if the first metric in the criteria fails, an AssertionError is raised immediately. This causes the test execution to halt, so any subsequent metrics are not evaluated. As a result, the test report does not provide a complete overview of all metric failures for the given agent evaluation.

To Reproduce

Steps to reproduce the behavior:

  • Create a test case with multiple metrics in the criteria dict
  • Ensure that the first metric will fail and the second metric would also fail if evaluated.
  • Run the test with pytest.
  • Observe that only the first metric failure is reported; the second is not evaluated or reported.

Expected behavior

All metrics in the criteria should be evaluated, and the test should report failures for all that do not meet the threshold, not just the first one.

Actual behavior

Test execution stops at the first failing metric due to an assertion, and subsequent metrics are not checked.

Potential Fix

Accumulate failures for each metric in a list, and after all metrics are evaluated, raise an AssertionError with details of all failed metrics.

Version observed

v1.0.0

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions