This repository supports full reproducibility of our paper on predicting multiplex immunofluorescence (mIF) from standard H&E-stained histology images.
It includes all code, pretrained models, and preprocessing steps to replicate our results or apply the approach to new datasets.
We introduce MIPHEI-vit, a U-Net-style model using H-Optimus-0, a ViT foundation model, as its encoder to predict multi-channel mIF images from H&E slides.
Inspired by ViTMatte, the architecture combines transformer-based encoding with a convolutional decoder.
Paired with CellPose for nuclei segmentation, MIPHEI-vit enables single-cell level cell type prediction directly from H&E.
We cover key markers from the ORION dataset, including:
Hoechst, CD31, CD45, CD68, CD4, FOXP3, CD8a, CD45RO, CD20, PD-L1, CD3e, CD163, E-cadherin, Ki67, Pan-CK, SMA.
📊 Performance for each marker is detailed in the paper.
To get started, you can install the environment with:
conda env create -f environment.yaml --name miphei
conda activate miphei
pip install -r requirements_torch.txt
pip install -r requirements.txt
pip install -e slidevips-python
pip install -r requirements_preprocessings.txt # only if you want to run the preprocessing pipeline
We recommend using Conda, as it simplifies the installation of certain dependencies like pyvips
, which are not always easy to install via pip
.
We provide processed and cleaned data derived from the Orion CRC and HEMIT datasets. IMMUCAN data is not yet released due to ongoing privacy restrictions.
You can download the full dataset from Zenodo:
🔗 https://doi.org/10.5281/zenodo.15340874
- Orion: Fully included in the Zenodo archive.
- HEMIT: Supplementary data only (e.g., cell segmentations, inferred cell types).
- You must download the original HEMIT dataset separately from here.
- Then, run
preprocessings/hemit_preprocessing.ipynb
to merge it with our annotations and generate required dataframes. You can also regenerate the additional data from this notebook
After downloading the data, update paths in the following config files:
configs/orion.yaml, configs/hemit.yaml
- Make sure to set:
slide_dataframe_path, train_dataframe_path, val_dataframe_path, test_dataframe_path, augmentation_dir
(optional; CycleGAN-augmented tiles),channel_stats_path
,targ_channel_names
- Make sure to set:
Also update the paths inside the dataframes, if needed.
Figure: MIPHEI-vit Architecture
-
The MIPHEI-ViT model weights can be downloaded from
Hugging Face .
-
The other models used for comparison in the paper—HEMIT*, HEMIT-ORION, UNETR H-Optimus-0, and U-NET ConvNeXtv2—are accessible on
Weights & Biases .
- You can download original HEMIT checkpoint here.
Each model is organized in a folder containing:
- the model checkpoint (
.ckpt
or.safetensors
) - a
config.yaml
file with training and architecture parameters .csv
files with evaluation results for the 3 datasets
You can use the pretrained models to run inference on ORION, HEMIT, or your own custom H&E images.
To visualize predictions on ORION or HEMIT datasets, use the following notebook:
notebooks/inference_orion_hemit.ipynb
You can also run this python script:
python run_inference.py \
--checkpoint_dir path/to/model_folder \
--dataset_config_path path/to/config.yaml \
--batch_size 16
This will generate a new folder insidecheckpoint_dir
containing predicted TIFF images for the entire dataset.
If you want to try the model on your own H&E images:
You can reproduce the evaluation results reported in the paper on the ORION and HEMIT datasets using the following scripts inside evaluations
folder:
-
ORION:
python eval_orion.py --checkpoint_dir path/to/model
-
HEMIT:
python eval_hemit.py --checkpoint_dir path/to/model
-
IMMUCAN (not publicly available):
The script
evaluations/eval_immucan.py
was used to evaluate on the IMMUCAN dataset, but the data is not included due to access restrictions.
We also provide evaluation scripts — evaluations/eval_orion_hemit.py
and evaluations/eval_hemit_hemit.py
— to evaluate models trained using the original HEMIT codebase, including:
- the official HEMIT checkpoint
- HEMIT-ORION: a model trained on the ORION dataset using the HEMIT codebase
The hemit/
folder contains the modified training and preprocessing scripts used to train HEMIT-ORION on the original HEMIT codebase.
All figures from the paper can be reproduced using the notebooks in the figure/
directory.
Figure: Example of mIF prediction from H&E on 3 datasets
To train MIPHEI-vit from scratch on the ORION dataset, run:
python run.py +default_configs=miphei-vit
If you don’t want to use Weights & Biases, run:
WANDB_MODE=offline python run.py +default_configs=miphei-vit
You can find the list of available default configurations in configs/default_configs/
.
To apply MIPHEI-vit model to your own dataset, create a config file like own_data.yaml
in configs/data/
and run
python run.py +default_configs=miphei-vit data=own_data
You can override any parameter directly via the command line. For example, to set the number of training epochs to 100:
python run.py +default_configs=miphei-vit ++train.epochs=100
All experiments from the paper are located in configs/experiments/
. You can run one of them like this:
python run.py -m +experiments/foundation_models='glob(*)'
Alongside this code, we developed a high-performance pyvips
-based tile reader and processing engine for efficient WSI operations, supporting both H&E and high-dimensional mIF images. This provides an alternative to tools like OpenSlide, with full support for multi-channel fluorescence.
You can refer to slidevips-python/README.md
To reproduce the preprocessing steps for the ORION dataset or to apply them to your own data, please refer to: preprocessings/README.md
. It contains detailed instructions on running the full pipeline, including tile extraction, autofluorescence subtraction, artifact removal, cell segmentation, etc.
If you use this work, please cite:
G. Balezo, R. Trullo, A. Pla Planas, E. Decenciere, and T. Walter, “MIPHEI-vit: Multiplex Immunofluorescence Prediction from H&E Images using ViT Foundation Models,” arXiv preprint arXiv:…, 2025.