Releases: Ontotext-AD/graphrag-eval
Releases · Ontotext-AD/graphrag-eval
5.2.0
5.1.2
Bug fixes
Statnett-142: Cast micro and macro aggregation statistics to dict, so that the yaml serialization works as expected
5.1.1
Bug fixes
Statnett-142: Fix bug in the calculation of the aggregated results in case of an output from DESCRIBE or CONSTRUCT query containing the string "results"
5.1.0
Bug fixes
TTYG-126: Fix ragas errors caused by incompatibility of some libraries
TTYG-126: Rename openai extra and dependency group to ragas
5.0.2
Bug fixes
TTYG-130: Package the prompts as part of the package
5.0.1
Bug fixes
openai
is now an extra for the package
5.0.0
New features
- TTYG-118: Retrieval correctness without reference
- TTYG-119: Retrieval correctness using reference context texts
Bug fixes
- TTYG-127: Refactor compare SPARQL results
- Statnett-217: Rename key 'steps' to 'actual_steps' in the source code
4.0.0
Version 4.0.0: First Release of graphrag-eval
This update includes major structural changes and feature additions in preparation for the first official release.
Project Updates
- Version bumped to 4.0.0
- Project structure updated with folder renaming and import fixes
Features
Answer Relevance Evaluation
- Integrated LangEvals for improved relevance evaluation
- Added exception handling and error reporting in evaluation results
- Introduced system and unit tests to validate evaluations
- Extended output with fields:
answer_relevance
,answer_relevance_cost
,answer_relevance_error
- Added aggregated metrics for relevance evaluation, including micro/macro statistics
Answer Correctness Evaluation
- Refined evaluation output fields (
answer_eval_reason
→answer_correctness_reason
) - Added CLI support for evaluation with input/output file paths
- Implemented flattened and aggregated evaluation metrics
- Added support for claim-based metrics (
answer_*_claims_count
) - Improved YAML output examples and aggregate key documentation
SPARQL Result Comparison
- Major refactoring and optimization of
compare_sparql_results
- Reworked SPARQL evaluation for efficiency and accuracy
Improvements
- Extensive README enhancements with clarified input/output formats
- Expanded documentation of aggregates, error handling, and evaluation examples
- Added installation instructions with OpenAI extras
- Improved explanations of relevance/correctness evaluation, costs, and result interpretation
Refactoring
- Modularization of evaluation code (steps and answer evaluation separated)
- Aggregation logic moved to dedicated modules
- Import logic refactored to avoid unnecessary OpenAI dependencies
- Standardized naming conventions for variables, keys, and test data
Testing & CI
- Added system tests to run after PRs and before releases
- Expanded unit tests for evaluation functions, error handling, and aggregation
- Implemented mocking and dependency separation for tests involving OpenAI
Bug Fixes
- Fixed aggregation key errors when no steps are available
- Fixed relevance result property access in LangEvals
- Corrected type hints, key naming mismatches, and parsed value checks
- Multiple fixes in README formatting (indentation, anchors, examples)
- Fixed duplicate or inconsistent test cases
3.0.0
TTYG-106 i/o format v2.0
https://graphwise.atlassian.net/browse/TTYG-103
2.2.0
Relax versions definitions