๐ GitHub Repository | ๐ MediaEval 2025 | ๐ Registration Form | ๐ Leaderboard / Registered Submissions
The MediaEval Medico 2025 Challenge ๐ฌ focuses on Visual Question Answering (VQA) for Gastrointestinal (GI) imaging, emphasizing explainability ๐ค๐ to foster trustworthy AI for clinical adoption โ๏ธ.
This task continues the long-running Medico series at MediaEval, now leveraging the newly developed Kvasir-VQA-x1 dataset, designed to support multimodal reasoning and interpretable clinical decision support ๐.
The annual MediaEval Workshop ๐ฃ๏ธ will be held on: ๐๏ธ SaturdayโSunday, 25โ26 October 2025 | ๐ Dublin, Ireland ๐ฎ๐ช & Online ๐(between CMBI 2025 and ACM Multimedia 2025). Participants are invited to join the workshop and present their work submitted to the competition. ๐๐ค.
๐ Goal: Develop AI models that can accurately answer clinical questions using GI endoscopic images.
๐ง The task uses Kvasir-VQA-x1, an advanced dataset comprising 159,549 QA pairs from 6,500 original GI images, featuring:
- Multi-step reasoning questions
- Naturalized medical language
- Complexity scores for curriculum training
๐ Question Types include:
- Yes/No
- Single-Choice
- Multiple-Choice
- Color-related
- Location-related
- Numerical Count
- Merged reasoning-based questions
๐ก Example Training Notebook:
Not sure where to start? Check out: Training with ms-swift
It is acceptable to use the full test set for training in your final submission to get competitive score. However, we strongly recommend using proper splits for training and clearly reporting in your paper which splits were used for training, and validation.
๐ Goal: Move beyond simply predicting an answer (Subtask 1) and generate rich, multimodal explanations that are transparent, understandable, and trustworthy for clinicians.
Your system should justify its predictions using multiple complementary reasoning formsโe.g., combining a detailed textual clinical explanation with a visual localization and/or a confidence measure.
Requirements:
- Faithful to the modelโs reasoning.
- Clinically relevant and medically sound.
- Useful for real-world decision-making.
from datasets import load_dataset, Image as HfImage
ds = load_dataset("SimulaMet/Kvasir-VQA-x1")["test"]
val_set_task2 = (
ds.filter(lambda x: x["complexity"] == 1)
.shuffle(seed=42)
.select(range(1500))
.add_column("val_id", list(range(1500)))
.remove_columns(["complexity", "answer", "original", "question_class"])
.cast_column("image", HfImage())
)val_set_task2 is a ๐ค Dataset containing the columns val_id, img_id, image, and question, where image is Pillow Image for easy access.
A JSONL file where each entry corresponds to one test case:
{
"val_id": "index of validation subset for subtask 2, as in val_set_task2",
"img_id": "UNIQUE_IMAGE_IDENTIFIER",
"question": "Original question posed to the model.",
"answer": "Prediction from your model from Subtask 1.",
"textual_explanation": "Detailed narrative in clinical language justifying the answer.",
"visual_explanation": [{
"type": "heatmap | segmentation_mask | bounding_box | etc.",
"data": "path/to/visual.png | [[x1,y1,x2,y2]]",
"description": "(Optional) Highlights the region of interest that supports the answer (e.g., bounding box around the polyp, or heatmap showing focus on mucosal irregularity)."
}],
"confidence_score": 0.92
}Field-by-Field Requirements:
img_id/question/answerโ Must match Subtask 1 data and predictions exactly.textual_explanation(Mandatory) โ Clinician-oriented reasoning referencing visual cues (location, morphology, color, size, vascular pattern, etc.).visual_explanation(Optional but encouraged) โ Heatmaps, segmentation masks, or bounding boxes linked to the textual explanation.confidence_score(Optional but encouraged) โ Float in [0, 1], from model confidence or uncertainty estimation.
- VLM Self-Probing for Explanations โ Ask auxiliary questions (e.g., "What is the abnormality?", "Where is it located?", "Describe its morphology") and combine answers into the
textual_explanation. - Visual Grounding โ Generate heatmaps or attention maps showing influential regions and link them to textual descriptions.
- Segmentation / Detection โ Produce masks or bounding boxes highlighting relevant pathology, reinforcing clinician trust.
Built on HyperKvasir and Kvasir-Instrument, the Kvasir-VQA-x1 dataset includes:
- ๐งฌ 159,549 QA pairs
- ๐ผ๏ธ 6,500 original GI images
- โป๏ธ 10 weakly augmented images per original (augmentation script provided)
- ๐ง Complexity levels 1โ3
- ๐งช Realistic medical question reformulations using LLMs
๐ฅ Dataset: Kvasir-VQA-x1 @ SimulaMet on Hugging Face
Subtask 1 (VQA Performance)
- Metrics: BLEU, ROUGE (1/2/L), METEOR
- Settings: Original & augmented images
- Criteria: Accuracy, relevance, medical correctness
The official challenge score will be computed on a separate hidden challenge set with more metrics. This ensures fairness and that final results truly reflect model performance.
Subtask 2 (Explainability)
Rated by experts on:
- Answer correctness
- Clarity & clinical relevance
- Visual alignment
- Confidence calibration
- Methodology & novelty
๐ง Please do not hesitate to contact us if you encounter any issues.
๐ View Registered Submissions
We use the medvqa Python package to validate and submit models to the official system.
pip install -U medvqaAlways use the latest version.
The model that needs to be submitted is expected to be in a HuggingFace repository. Your HuggingFace repo must include a standalone script named:
- submission_task1.py for task 1.
- submission_task2.py for task 2.
Use the provided template script, and make sure to:
- Modify all
TODOsections - Add required information (e.g., model path, inference logic, preprocessing steps) directly in the script
- Keep the required input/output format unchanged
You have two template options for the Task 1 inference script:
- MS-Swift version: submission_task1_swift.py
- PyTorch version: submission_task1.py
Both scripts already include template example code for model loading and inference.
submission_task1.py.
Host your submission in a Hugging Face model repository containing:
submission_task2.jsonlโ one object perval_idvisuals/โ optional folder with any referenced visual artifacts (heatmaps, masks, boxes as JSON, etc.)submission_task2.pyfile with you team details- A short
README.mdexplaining how you created the explanations and any post-processing you want to share
Demo submission repo:
https://huggingface.co/SushantGautam/Medico2025_subtask2_demo_submission/tree/main
Naming tips
- Keep
datapaths invisual_explanationrelative to repo root (e.g.,visuals/1234_heatmap.png). - Ensure every
val_idin the file corresponds to an item inval_set_task2.
First make sure your submission script works fine in your working environment and it loads the model correctly from your submission repo and generates outputs in the required format.
python submission_task1.pyNext, you can validate the script to work independently. The .py script should now be in the root of the same HuggingFace repo as your model. You can try this in a new venv:
medvqa validate --competition=medico-2025 --task=1/2 --repo_id=<your_repo_id>--competition: Set tomedico-2025--task: Use1for Task 1 or2for Task 2--repo_id: Your HuggingFace model repo ID (e.g., SushantGautam/Florence-2-vqa-demo)
If your code requires extra packages, you must include a requirements.txt in the root of the repo. The system will install these automatically during validation/submission.
Else you will get package missing errors.
If validation is okey, you can just run:
medvqa validate_and_submit --competition=medico-2025 --task=1/2 --repo_id=<your_repo_id>This will make a submisision and your username, along with the task and time, should be visible on the leaderboard for it to be considered officially submitted. The submission library will make your Hugging Face repository public but gated, granting the organizers access to your repo. It must remain unchanged at least until the results of the competition are announced. However, you are free to make your model fully public (non-gated). If you encounter any issues with submission, donโt hesitate to contact us.
- Scripts for augmentation, splits, and baselines
- Submission templates
- Fine-tuned model configs
- Attention & saliency visualization methods
- ๐ April 2025 โ Registration for task participation opens โ
- ๐ฆ May 2025 โ Development data release โ
- ๐งช June 2025 โ Test data release โ
- ๐ 24 September 2025 (Wed.) โ Runs due
- ๐ 8 October 2025 (Wed.) โ Working Notes deadline
- ๐ซ 25โ26 October 2025 (Sat.โSun.) โ MediaEval Workshop (Dublin + Online)
- ๐จโ๐ฌ Steven A. Hicks โ [email protected]
- ๐งโ๐ป Michael A. Riegler โ [email protected]
- ๐งโ๐ฌ Vajira Thambawita โ [email protected]
- ๐จโ๐ซ Pรฅl Halvorsen โ [email protected]
- ๐งโ๐ Sushant Gautam โ [email protected]
Letโs build the future of trustworthy, explainable medical AI.
๐ GI diagnostics needs interpretable answers. Your model can help save lives.
๐ Register: MediaEval 2025
๐ Repo: GitHub
If you are inspired by the MediaEval Medico 2025 Challenge or the Kvasir-VQA-x1 dataset in your research, please cite the following papers:
@article{Gautam2025Aug,
author = {Gautam, Sushant and Thambawita, Vajira and Riegler, Michael and others},
title = {{Medico 2025: Visual Question Answering for Gastrointestinal Imaging}},
journal = {arXiv},
year = {2025},
month = aug,
eprint = {2508.10869},
doi = {10.48550/arXiv.2508.10869}
}
@article{Gautam2025Jun,
author = {Gautam, Sushant and Riegler, Michael A. and Halvorsen, P{\aa}l},
title = {{Kvasir-VQA-x1: A Multimodal Dataset for Medical Reasoning and Robust MedVQA in Gastrointestinal Endoscopy}},
journal = {arXiv},
year = {2025},
month = jun,
eprint = {2506.09958},
doi = {10.48550/arXiv.2506.09958}
}