Merlin: Vision Language Foundation Model for 3D Computed Tomography

Merlin is a 3D VLM for computed tomography that leverages both structured electronic health records (EHR) and unstructured radiology reports for pretraining.

⚡️ Installation

To install Merlin, you can simply run:

pip install merlin-vlm

For an editable installation, use the following commands to clone and install this repository.

conda create --name merlin python==3.10
conda activate merlin

git clone https://github.com/StanfordMIMI/Merlin.git
cd Merlin
pip install -e .

# Alternatively, to install exact package versions as tested:
# uv sync

🚀 Inference with Merlin

To create a Merlin model with both image and text embeddings enabled, use the following:

from merlin import Merlin

model = Merlin()

To initialize the model with only image embeddings active, use:

from merlin import Merlin

model = Merlin(ImageEmbedding=True)

To initialize the model for phenotype classification, use:

from merlin import Merlin

model = Merlin(PhenotypeCls=True)

To initialize the model for radiology report generation, use:

from merlin import Merlin

model = Merlin(RadiologyReport=True)

For inference on a demo CT scan, please check out the general demo and report generation demo.

For additional information, please read the inference documentation and report generation documentation.

📂 Merlin Abdominal CT Dataset

We are excited to release the Merlin Abdominal CT Dataset to the community!

For details on accessing and using the dataset, please see the download documentation!

📎 Citation

If you find this repository useful for your work, please cite the cite the original paper:

@article{blankemeier2024merlin,
  title={Merlin: A vision language foundation model for 3d computed tomography},
  author={Blankemeier, Louis and Cohen, Joseph Paul and Kumar, Ashwin and Van Veen, Dave and Gardezi, Syed Jamal Safdar and Paschali, Magdalini and Chen, Zhihong and Delbrouck, Jean-Benoit and Reis, Eduardo and Truyts, Cesar and others},
  journal={Research Square},
  pages={rs--3},
  year={2024}
}

Name		Name	Last commit message	Last commit date
Latest commit History 37 Commits
.github/workflows		.github/workflows
documentation		documentation
merlin		merlin
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Merlin: Vision Language Foundation Model for 3D Computed Tomography

⚡️ Installation

🚀 Inference with Merlin

For inference on a demo CT scan, please check out the general demo and report generation demo.

For additional information, please read the inference documentation and report generation documentation.

📂 Merlin Abdominal CT Dataset

📎 Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

License

StanfordMIMI/Merlin

Folders and files

Latest commit

History

Repository files navigation

Merlin: Vision Language Foundation Model for 3D Computed Tomography

⚡️ Installation

🚀 Inference with Merlin

For inference on a demo CT scan, please check out the general demo and report generation demo.

For additional information, please read the inference documentation and report generation documentation.

📂 Merlin Abdominal CT Dataset

📎 Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages