HalluSegBench: Counterfactual Visual Reasoning for Segmentation Hallucination Evaluation

Recent progress in vision-language models and segmentation methods has significantly advanced grounded visual understanding. However, these models often exhibit hallucination by producing segmentation masks for objects not grounded in the image content or by incorrectly labeling irrelevant regions. Current evaluation paradigms primarily focus on label or textual hallucinations without manipulating the visual context, limiting their capacity to diagnose critical failures. In response, we introduce HalluSegBench, the first benchmark specifically designed to evaluate hallucination in visual grounding through the lens of counterfactual visual reasoning. Our benchmark consists of a novel dataset of 1.4K counterfactual image pairs spanning 287 unique object classes, and a set of newly introduced metrics to assess hallucination robustness in reasoning-based segmentation models. Experiments on HalluSegBench with state-of-the-art pixel-grounding models reveal that vision-driven hallucinations are significantly more prevalent than label-driven ones, with models often persisting in false segmentation, highlighting the necessity for counterfactual reasoning to diagnose true visual grounding. We open-source the benchmark at https://huggingface.co/datasets/PLAN-Lab/Hallu to encourage future research in this area.

Requirements

pip install -r requirements.txt

Prepare predictions

Run your model on all four settings mentioned in the datast and save the masks in four different directories as mentioned below.

The mask names should follow the format of {image_id}_{ann_id}_mask.png where image_id and ann_id is from RefCOCO, also found in filter_anno.json in the dataset.

eg. COCO_train2014_000000533293_299985_mask.png

├── <path_to_prediction>
│   ├── orgl_orgi  # original label original image
│   ├── orgl_edti  # original label edited image
│   ├── edtl_edti  # edited label edited image
│   ├── edtl_orgi  # edited label original image

Prepare a json file based on filter_anno.json in the dataset

python generate_json.py \
    --data_ann_path <path_to_filter_anno.json> \
    --output_json_path <output_json_path> \
    --prdiction_data_path <path_to_prediction> \

Get evaluation scores on the metrics

Get consistency based performance metrics

python get_consistency.py \
    --json_path <path_to_generated_json.json> \
    --base_path <path_to_prediction> \

Get the hallucination based performance metrics

python get_hallucination.py \
    --json_path <path_to_generated_json.json> \
    --base_path <path_to_prediction> \

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
assets/fig		assets/fig
README.md		README.md
generate_json.py		generate_json.py
get_consistency.py		get_consistency.py
get_hallucination.py		get_hallucination.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

HalluSegBench: Counterfactual Visual Reasoning for Segmentation Hallucination Evaluation

Requirements

Prepare predictions

Get evaluation scores on the metrics

About

Uh oh!

Releases

Packages

Languages

PLAN-Lab/HalluSegBench

Folders and files

Latest commit

History

Repository files navigation

HalluSegBench: Counterfactual Visual Reasoning for Segmentation Hallucination Evaluation

Requirements

Prepare predictions

Get evaluation scores on the metrics

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages