-
Notifications
You must be signed in to change notification settings - Fork 33
Implementation of SingleR module #187
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: dev
Are you sure you want to change the base?
Conversation
…for the celldex reference. There are downstream pipeline errors however.
…x column explicitly
celldex reference stored at the relative path: `celldex_references/celldex_hpca__2024-02-26_h5_se`` The reference was copied from the results of a run with profile test_full and should give identical results. This works on Dardel for me, e.g. with this command: ``` module load PDC java singularity export NXF_SINGULARITY_CACHEDIR="~/singularity_cache" ./nextflow run main.nf -profile pdc_kth,test_offline --project=naiss2024-22-1150 --outdir ./results ```
test_full profile. Added: - celldex_reference_label parameter with default value 'label.main' - support for multiple references in CELLTYPES_SINGLER - proper downloading of multiple references using CELLTYPES_CELLDEXDOWNLOAD
…oved to template; updated containers
…bled singleR in case unsupported env methods like Conda/mamba is used; removed unwanted lines and print statements from singleR
…moved them in logical group
…ort for previously missing SingleCellExperiment; removed some unwanted/ commented lines
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good start, but there will be a couple of adjustments necessary
conf/test_offline.config
Outdated
integration_methods = 'scvi,harmony,bbknn,combat' | ||
doublet_detection = 'solo,scrublet,doubletdetection,scds' | ||
celltypist_model = 'Adult_Human_Skin' | ||
celldex_reference = 'celldex_references/celldex_hpca__2024-02-26_h5_se' // using a locally stored celldex reference for this one. The output should be identical to test_full profile |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This relies on local files that are not part of the repository, so the user needs to save the file in the appropriate location before being able to run the test. This is not desirable, as it adds an extra bit of friction.
I think there are two ways of resolving this:
- Add the celldex reference to the test_datasets repo and add an URL to that place
- Point to a web URL that can be used for fetching the celldex reference
You might now think that using URLs to test datasets/web locations contradicts the idea of the "offline" testing.
In fact, Nextflow handles URLs just the same way as local files. If it detects a file to be a URL instead of a local file, it will download it to a location in the pipeline working directory and use it just like an offline one.
I would prefer a solution along this lines over the current implementation.
Also the pipeline uses nf-test for testing, it might make more sense to implement this as a subworkflow-level nf-test instead of running the entire pipeline
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do the other modules of the pipeline test offline? Asking because if they don't, it won't make much sense to invest time in implementing this to run offline. I've merged the dev
branch into singleR
now. I will go ahead and work on this last element if it makes sense. Or we can keep it for later when all other modules test offline. Let me know.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Implemented the offline test. I'm currently hosting the references in my own branch of test_datasets
. I've opened another pull request for this purpose. The paths can be changed in the test_offline
once that pull request for the test_dataset is approved.
Commit: e071b37
container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ? | ||
'docker://saditya88/singler:0.0.1': | ||
'docker.io/saditya88/singler:0.0.1' }" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The if/else is not necessary if both branches evaluate to the same value. Also please add the Dockerfile to the module directory instead of the containers/
directory. Furthermore I think it's more elegant to host the docker image using wave in this case. Using the CLI you can use it for hosting any docker images, not just ones built on conda files.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Relating to the issue below this container is not related to the accidentally included containers/singler.def file, I think the dockerfile is hosted on github and Aditya can point you to it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct me if I am wrong but I thought that if I used a Docker hub image for Singularity, I should prepend it with docker://
while in case I'm using a Docker image repo different from the one defined in the config
I must prepend it with docker.io
. That's why I retained the if/else
statement. I've currently hosted the container on Dockerhub and the Dockerfile is in a Github repo: https://github.com/addityea/scdocks
I'll checkout the Wave method..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nextflow automatically prepends the docker://
when using a docker image with singularity
For wave you find the docs here. Once you have it installed, all you need to do is wave -f Dockerfile
and it will give you a hosted image URL.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wave fails to build this container.. something to do with APT
sources forbidden errors... I guess we can keep this container to the Dockerhub for now.. I've also removed the switch logic (singularity/ docker) and kept only 1 container name...
Commit: 5150fa2
…ted meta.yml, removed container switch logic
@nf-core-bot fix linting |
Warning Newer version of the nf-core template is available. Your pipeline is using an old version of the nf-core template: 3.2.1. For more documentation on how to update your pipeline, please see the nf-core documentation and Synchronisation documentation. |
… for cellDex reference.
Merging as tests have succeeded.
PR checklist
nf-core pipelines lint
).nextflow run . -profile test,docker --outdir <OUTDIR>
).nextflow run . -profile debug,test,docker --outdir <OUTDIR>
).docs/usage.md
is updated.docs/output.md
is updated.CHANGELOG.md
is updated.README.md
is updated (including new tool citations and authors/contributors).Key points
singleR
cell type annotation module in addition to the currentcelltypist
singleR
supports both fetching ofcelldex
models as well as running offline utilizing the user supplied reference file(s)singleR
supports multiple references at onceschema
in proper grouptest_full
profile without errorLint status
Results
The
nf-core pipelines lint
only fails two tests that pertain to the image generation, which is because of different platforms the image is generated and hence can be ignored.Good to know
arm
profile in addition to the rest.Seqera
due to limitations of having an R package,anndataR
. This package is maintained by scverse hence we can be almost certain that it'll show up onConda
in near future, then, the image can be moved toSeqera
easily as this customDocker
has all packages throughConda
except this one.