Skip to content

Implementation of SingleR module #187

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 28 commits into
base: dev
Choose a base branch
from
Open

Conversation

addityea
Copy link

@addityea addityea commented May 28, 2025

PR checklist

  • This comment contains a description of changes (with reason).
  • If you've fixed a bug or added code that should be tested, add tests!
  • If you've added a new tool - have you followed the pipeline conventions in the contribution docs
  • If necessary, also make a PR on the nf-core/scdownstream branch on the nf-core/test-datasets repository.
  • Make sure your code lints (nf-core pipelines lint).
  • Ensure the test suite passes (nextflow run . -profile test,docker --outdir <OUTDIR>).
  • Check for unexpected warnings in debug mode (nextflow run . -profile debug,test,docker --outdir <OUTDIR>).
  • Usage Documentation in docs/usage.md is updated.
  • Output Documentation in docs/output.md is updated.
  • CHANGELOG.md is updated.
  • README.md is updated (including new tool citations and authors/contributors).

Key points

  • Adds support for singleR cell type annotation module in addition to the current celltypist
  • singleR supports both fetching of celldex models as well as running offline utilizing the user supplied reference file(s)
  • singleR supports multiple references at once
  • Added schema in proper group
  • Updated citations and changelog
  • Updated and ran the test_full profile without error

Lint status

Tool Version/ Description
NF-CORE tools v3.2.1
Nextflow v25.04.2 build 5947
Lint test system MacBook Pro (M4 arm)

Results

Status Description Count
Tests Passed 530
? Tests Ignored 0
! Test Warnings 135
Tests Failed 2

The nf-core pipelines lint only fails two tests that pertain to the image generation, which is because of different platforms the image is generated and hence can be ignored.

Good to know

  • Tests were run using arm profile in addition to the rest.
  • The pipeline currently uses a custom Docker image not suitable to be built on Seqera due to limitations of having an R package, anndataR. This package is maintained by scverse hence we can be almost certain that it'll show up on Conda in near future, then, the image can be moved to Seqera easily as this custom Docker has all packages through Conda except this one.

addityea and others added 18 commits May 21, 2025 16:27
…for the celldex reference. There are downstream pipeline errors however.
celldex reference stored at the relative path:

`celldex_references/celldex_hpca__2024-02-26_h5_se``

The reference was copied from the results of a run with
profile test_full and should give identical results.

This works on Dardel for me, e.g. with this command:

```
module load PDC java singularity
export NXF_SINGULARITY_CACHEDIR="~/singularity_cache"

./nextflow run main.nf -profile pdc_kth,test_offline --project=naiss2024-22-1150 --outdir ./results
```
test_full profile.

Added:
- celldex_reference_label parameter with default value 'label.main'
- support for multiple references in CELLTYPES_SINGLER
- proper downloading of multiple references using CELLTYPES_CELLDEXDOWNLOAD
…bled singleR in case unsupported env methods like Conda/mamba is used;

removed unwanted lines and print statements from singleR
…ort for previously missing SingleCellExperiment; removed some unwanted/ commented lines
Copy link
Collaborator

@nictru nictru left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good start, but there will be a couple of adjustments necessary

integration_methods = 'scvi,harmony,bbknn,combat'
doublet_detection = 'solo,scrublet,doubletdetection,scds'
celltypist_model = 'Adult_Human_Skin'
celldex_reference = 'celldex_references/celldex_hpca__2024-02-26_h5_se' // using a locally stored celldex reference for this one. The output should be identical to test_full profile
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This relies on local files that are not part of the repository, so the user needs to save the file in the appropriate location before being able to run the test. This is not desirable, as it adds an extra bit of friction.

I think there are two ways of resolving this:

  1. Add the celldex reference to the test_datasets repo and add an URL to that place
  2. Point to a web URL that can be used for fetching the celldex reference

You might now think that using URLs to test datasets/web locations contradicts the idea of the "offline" testing.
In fact, Nextflow handles URLs just the same way as local files. If it detects a file to be a URL instead of a local file, it will download it to a location in the pipeline working directory and use it just like an offline one.

I would prefer a solution along this lines over the current implementation.

Also the pipeline uses nf-test for testing, it might make more sense to implement this as a subworkflow-level nf-test instead of running the entire pipeline

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do the other modules of the pipeline test offline? Asking because if they don't, it won't make much sense to invest time in implementing this to run offline. I've merged the dev branch into singleR now. I will go ahead and work on this last element if it makes sense. Or we can keep it for later when all other modules test offline. Let me know.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implemented the offline test. I'm currently hosting the references in my own branch of test_datasets. I've opened another pull request for this purpose. The paths can be changed in the test_offline once that pull request for the test_dataset is approved.

Commit: e071b37

Comment on lines 5 to 7
container "${ workflow.containerEngine == 'singularity' && !task.ext.singularity_pull_docker_container ?
'docker://saditya88/singler:0.0.1':
'docker.io/saditya88/singler:0.0.1' }"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The if/else is not necessary if both branches evaluate to the same value. Also please add the Dockerfile to the module directory instead of the containers/ directory. Furthermore I think it's more elegant to host the docker image using wave in this case. Using the CLI you can use it for hosting any docker images, not just ones built on conda files.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Relating to the issue below this container is not related to the accidentally included containers/singler.def file, I think the dockerfile is hosted on github and Aditya can point you to it.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct me if I am wrong but I thought that if I used a Docker hub image for Singularity, I should prepend it with docker:// while in case I'm using a Docker image repo different from the one defined in the config I must prepend it with docker.io. That's why I retained the if/else statement. I've currently hosted the container on Dockerhub and the Dockerfile is in a Github repo: https://github.com/addityea/scdocks
I'll checkout the Wave method..

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nextflow automatically prepends the docker:// when using a docker image with singularity

For wave you find the docs here. Once you have it installed, all you need to do is wave -f Dockerfile and it will give you a hosted image URL.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wave fails to build this container.. something to do with APT sources forbidden errors... I guess we can keep this container to the Dockerhub for now.. I've also removed the switch logic (singularity/ docker) and kept only 1 container name...
Commit: 5150fa2

@nictru nictru linked an issue May 29, 2025 that may be closed by this pull request
@mashehu
Copy link
Contributor

mashehu commented Jun 2, 2025

@nf-core-bot fix linting

@nf-core-bot
Copy link
Member

Warning

Newer version of the nf-core template is available.

Your pipeline is using an old version of the nf-core template: 3.2.1.
Please update your pipeline to the latest version.

For more documentation on how to update your pipeline, please see the nf-core documentation and Synchronisation documentation.

@addityea addityea requested a review from nictru June 16, 2025 08:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Implement SingleR cell type assignment
6 participants