Evals does not evaluate all metrics if one fails

## Title
AgentEvaluator only reports the first failing metric, subsequent metrics not evaluated

## Describe the bug
When running tests that involve AgentEvaluator.evaluate_eval_set, if the first metric in the criteria fails, an AssertionError is raised immediately. This causes the test execution to halt, so any subsequent metrics are not evaluated. As a result, the test report does not provide a complete overview of all metric failures for the given agent evaluation.

## To Reproduce
Steps to reproduce the behavior:

- Create a test case with multiple metrics in the criteria dict
- Ensure that the first metric will fail and the second metric would also fail if evaluated.
- Run the test with pytest.
- Observe that only the first metric failure is reported; the second is not evaluated or reported.

## Expected behavior
All metrics in the criteria should be evaluated, and the test should report failures for all that do not meet the threshold, not just the first one.

## Actual behavior
Test execution stops at the first failing metric due to an assertion, and subsequent metrics are not checked.

## Potential Fix

Accumulate failures for each metric in a list, and after all metrics are evaluated, raise an AssertionError with details of all failed metrics.

## Version observed

v1.0.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Evals does not evaluate all metrics if one fails #1126

Title

Describe the bug

To Reproduce

Expected behavior

Actual behavior

Potential Fix

Version observed

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Evals does not evaluate all metrics if one fails #1126

Description

Title

Describe the bug

To Reproduce

Expected behavior

Actual behavior

Potential Fix

Version observed

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions