Skip to content

Releases: Ontotext-AD/graphrag-eval

5.2.0

21 Oct 14:16

Choose a tag to compare

Statnett-240: Compare SPARQL results with duplicated binding values

5.1.2

06 Oct 11:59

Choose a tag to compare

Bug fixes

Statnett-142: Cast micro and macro aggregation statistics to dict, so that the yaml serialization works as expected

5.1.1

06 Oct 06:58

Choose a tag to compare

Bug fixes

Statnett-142: Fix bug in the calculation of the aggregated results in case of an output from DESCRIBE or CONSTRUCT query containing the string "results"

5.1.0

30 Sep 09:56

Choose a tag to compare

Bug fixes

TTYG-126: Fix ragas errors caused by incompatibility of some libraries
TTYG-126: Rename openai extra and dependency group to ragas

5.0.2

26 Sep 16:21
75cdbbe

Choose a tag to compare

Bug fixes

TTYG-130: Package the prompts as part of the package

5.0.1

26 Sep 14:19

Choose a tag to compare

Bug fixes

openai is now an extra for the package

5.0.0

25 Sep 12:52

Choose a tag to compare

New features

  • TTYG-118: Retrieval correctness without reference
  • TTYG-119: Retrieval correctness using reference context texts

Bug fixes

  • TTYG-127: Refactor compare SPARQL results
  • Statnett-217: Rename key 'steps' to 'actual_steps' in the source code

4.0.0

19 Sep 09:05

Choose a tag to compare

Version 4.0.0: First Release of graphrag-eval

This update includes major structural changes and feature additions in preparation for the first official release.

Project Updates

  • Version bumped to 4.0.0
  • Project structure updated with folder renaming and import fixes

Features

Answer Relevance Evaluation

  • Integrated LangEvals for improved relevance evaluation
  • Added exception handling and error reporting in evaluation results
  • Introduced system and unit tests to validate evaluations
  • Extended output with fields: answer_relevance, answer_relevance_cost, answer_relevance_error
  • Added aggregated metrics for relevance evaluation, including micro/macro statistics

Answer Correctness Evaluation

  • Refined evaluation output fields (answer_eval_reasonanswer_correctness_reason)
  • Added CLI support for evaluation with input/output file paths
  • Implemented flattened and aggregated evaluation metrics
  • Added support for claim-based metrics (answer_*_claims_count)
  • Improved YAML output examples and aggregate key documentation

SPARQL Result Comparison

  • Major refactoring and optimization of compare_sparql_results
  • Reworked SPARQL evaluation for efficiency and accuracy

Improvements

  • Extensive README enhancements with clarified input/output formats
  • Expanded documentation of aggregates, error handling, and evaluation examples
  • Added installation instructions with OpenAI extras
  • Improved explanations of relevance/correctness evaluation, costs, and result interpretation

Refactoring

  • Modularization of evaluation code (steps and answer evaluation separated)
  • Aggregation logic moved to dedicated modules
  • Import logic refactored to avoid unnecessary OpenAI dependencies
  • Standardized naming conventions for variables, keys, and test data

Testing & CI

  • Added system tests to run after PRs and before releases
  • Expanded unit tests for evaluation functions, error handling, and aggregation
  • Implemented mocking and dependency separation for tests involving OpenAI

Bug Fixes

  • Fixed aggregation key errors when no steps are available
  • Fixed relevance result property access in LangEvals
  • Corrected type hints, key naming mismatches, and parsed value checks
  • Multiple fixes in README formatting (indentation, anchors, examples)
  • Fixed duplicate or inconsistent test cases

3.0.0

16 Jul 12:39
4821d1f

Choose a tag to compare

2.2.0

01 Jul 06:36
76930f8

Choose a tag to compare

Relax versions definitions