Skip to content

Conversation

@Leomyh
Copy link

@Leomyh Leomyh commented Aug 11, 2025

Description

This PR introduces several new evaluation metrics and a new data loader to enhance the capability of the project in assessing synthetic data quality and privacy:
• Added Distance Matrix Similarity metric to measure statistical similarity between datasets.
• Added Dendrogram Matrix Similarity metric for hierarchical clustering comparison.
• Added TGTG Similarity and TFTG Similarity metrics for advanced similarity evaluation.
• Implemented t-closeness privacy metric to quantify attribute distribution closeness.
• Added Adversarial Accuracy and Epsilon Identifiability privacy evaluation metrics for more comprehensive privacy risk analysis.
• Introduced a new Gene Expression Data Loader to facilitate handling of gene expression datasets within the framework.

These additions provide more robust tools for both utility and privacy assessment of synthetic data.

Affected Dependencies

List any dependencies that are required for this change.

How has this been tested?

    • Unit tests added covering all new metrics and the data loader.
• Tested on benchmark datasets to verify correctness and stability.

Checklist

@Leomyh
Copy link
Author

Leomyh commented Aug 11, 2025

Hi team,
Could someone please review this PR?
It introduces new metrics and a data loader to enhance our framework’s evaluation capabilities.
Thanks in advance!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant