NAME |
---|
Paria Mehrbod |
Phuong Thao Quach |
In computational pathology, distribution shifts in imaging protocols and staining techniques pose significant challenges for analyzing digital histopathology images, impacting the accuracy of disease diagnosis and treatment planning. This project employs Test-Time Adaptation (TTA) techniques to enhance the generalization of deep learning models across different datasets, focusing on prostate cancer diagnosis in whole slide images (WSIs).
The methodology involves two main phases:
- Model Training: Utilizing the ResNet50 architecture, models are trained on a primary dataset to classify prostate cancer tissues, incorporating modifications for domain-specific challenges.
- Test-Time Adaptation: Several TTA techniques are applied to adapt the pre-trained models to a secondary dataset from a different distribution, aiming to reduce prediction uncertainty and improve model reliability.
The primary challenge is the high variability in medical images, which requires specialized TTA strategies. Techniques like TENT, SAR, LAME, and DELTA are utilized to address these variations during model adaptation.
The performance of TTA techniques is continuously evaluated against baseline metrics to quantify improvements. This iterative process helps refine the techniques to ensure optimal adaptation to new data distributions.
Download links for the datasets required for this project are provided below. We also refer to the techniques used in the paper "Quality control stress test for deep learning-based diagnostic model in digital pathology" in the Artifact folder to apply 9 different kinds of artifacts on dataset 1 for experiments in phase 2.
To successfully run the Python code in this repository, several libraries and dependencies need to be installed. The code primarily relies on popular Python libraries such as NumPy, Matplotlib, Pandas, Seaborn, and Scikit-Learn for data manipulation, statistical analysis, and machine learning tasks.
For deep learning models, the code uses PyTorch, along with its submodules such as torchvision
and torch.nn
. Ensure that you have the latest version of PyTorch installed, which can handle neural networks and various related functionalities.
Additionally, the project uses the Orion
library, an asynchronous hyperparameter optimization framework. This can be installed directly from its GitHub repository using the command !pip install git+https://github.com/epistimio/orion.git@develop
and its related profet
package with !pip install orion[profet]
.
Here is a comprehensive list of all the required libraries:
- NumPy
- Pandas
- Matplotlib
- Seaborn
- Scikit-Learn
- PyTorch (along with
torch.nn
,torch.optim
,torch.utils.data
, etc.) - Torchvision (including datasets, models, transforms)
- Argparse (for parsing command-line options)
- TSNE (from Scikit-Learn for dimensionality reduction techniques)
- KNeighborsClassifier, GridSearchCV (from Scikit-Learn for machine learning models)
- RandomForestClassifier (from Scikit-Learn for machine learning models)
- Classification metrics from Scikit-Learn (confusion_matrix, classification_report, etc.)
For visualization and data analysis, Matplotlib and Seaborn are extensively used. Ensure all these libraries are installed in your environment to avoid any runtime errors.
To install these libraries, you can use pip (Python's package installer). For most libraries, the installation can be as simple as running pip install library-name
. For specific versions or sources, refer to the respective library documentation.
For cluster environments like Compute Canada, utilize the provided shell scripts to train and validate the models. Ensure you clone the project repository and have all the required files before proceeding.
To begin training the model using your cluster, follow these steps:
- Navigate to the cloned project directory.
- Ensure that the
job_train.sh
script has execution permissions, setting them withchmod +x job_train.sh
if needed. - Submit the training job to the cluster using the command
sbatch job_train.sh
. - Monitor the job's progress through your cluster's job management tools.
After training, you can validate the model using the following steps:
- Make sure
job_<>.sh
is executable, modifying permissions similarly if required. Each job script corresponds to a different experiment. - Launch the validation process by submitting
sbatch job_<>.sh
to the cluster's scheduler. - Check the output and error files generated by the scheduler for logs and results.
Additional Scripts:
job_corrupt_dataset.sh
: Use this script if you need to simulate data corruption as part of your validation process.download.sh
: This script helps with downloading the necessary datasets for training and validation if they are not present in the local environment.
Note: Ensure that all the datasets and models are in the correct directories as expected by the scripts. Refer to the scripts' internal documentation for detailed information on their expected environments and parameters.
For detailed results analysis, use the analyze_results.ipynb
notebook in a Jupyter environment to visualize and interpret your model's performance.