Skip to content

Update blastn module #8765

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 4 commits into
base: master
Choose a base branch
from
Open

Conversation

AnnaNoren
Copy link

@AnnaNoren AnnaNoren commented Jul 10, 2025

This pr adds taxid input (to decrease run time) and updates the test+snapshot

PR checklist

Closes #XXX

  • This comment contains a description of changes (with reason).
  • If you've fixed a bug or added code that should be tested, add tests!
  • If you've added a new tool - have you followed the module conventions in the contribution docs
  • If necessary, include test data in your PR.
  • Remove all TODO statements.
  • Emit the versions.yml file.
  • Follow the naming conventions.
  • Follow the parameters requirements.
  • Follow the input/output options guidelines.
  • Add a resource label
  • Use BioConda and BioContainers if possible to fulfil software requirements.
  • Ensure that the test works with either Docker / Singularity. Conda CI tests can be quite flaky:
    • For modules:
      • nf-core modules test <MODULE> --profile docker
      • nf-core modules test <MODULE> --profile singularity
      • nf-core modules test <MODULE> --profile conda
    • For subworkflows:
      • nf-core subworkflows test <SUBWORKFLOW> --profile docker
      • nf-core subworkflows test <SUBWORKFLOW> --profile singularity
      • nf-core subworkflows test <SUBWORKFLOW> --profile conda

Copy link
Contributor

@SPPearce SPPearce left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a test for the new input?

Copy link
Member

@muffato muffato left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are multiple taxon filters:

  • -taxids and -taxidlist
  • -negative_taxids and -negative_taxidlist

Can we find a way of supporting all four cases ?


"""
if [ "${is_compressed}" == "true" ]; then
gzip -c -d ${fasta} > ${fasta_name}
fi

export BLASTDB=${db}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is this addition for ?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see now how it is unnecessary in this context.

@AnnaNoren
Copy link
Author

@muffato @SPPearce
I've updated the pr to include -taxidlist -negative_taxidlist -taxids -negative_taxids. To not overdo the inputs, it relies on ext.taxid_mode in nextflow.config.

A problem I've had is that the nf-test relies on the makeblastdb module which does not create the database with a taxid map (vital for taxid filtering to work) and if I were to update that module as well one problem still stands, which is the addition of another two vital files from ncbi (taxdb.bti (18MB) and taxdb.btd (170MB)) which I don't know how to implement into the database through nf-core makeblastdb module. I've tested this blastn module pr with nf-test and a local database (including ncbi files) and succeeded however.

I'm not sure how to proceed. Do you have any inputs of the possibility of this?

@SPPearce
Copy link
Contributor

SPPearce commented Aug 4, 2025

Please don't add extra ext.foo bits, we'd rather just add another input to swap the mode.
Is that the minimum size of the database? Doesn't have to contain everything, just to check the functionality works.
If you can't test it in the CI then it isn't the end of the world, many modules don't have complete tests.

@AnnaNoren
Copy link
Author

@SPPearce Okay, I see. That would mean four inputs instead of two. So as for the minimum size of the database, created with only one reference genome and taxid mapping and including those vital ncbi files it constitutes 189MB (locally from what I can see). In order to actually run blastn with taxid flags the database needs to be created through taxid mapping and include those files so I don't think there is a way around that. I will update the pr with these latest changes including nf-test with empty taxid inputs.

@LilyAnderssonLee
Copy link
Contributor

I’m not sure about including an input channel for each of the arguments: --taxidlist, --negative_taxidlist, --taxids, and --negative_taxids, since they are not compatible with each other. In other words, only one of these four arguments can be used at a time.

@muffato
Copy link
Member

muffato commented Aug 14, 2025

In other words, only one of these four arguments can be used at a time.

That can be handled with if statements that raise errors if incompatible options are passed together, like in the hifiasm module.

To limit the number of input channels of the pipeline, an idea is to have 3 arguments:

  • taxids as a list of values
  • taxidlist as a path
  • negative_tax as a boolean

The only thing to check is that taxid and taxidlist are not passed simultaneously.
Then, the name of the command-line argument is built from the presence/absence of the input channels.

Another idea would be to not have an input channel named negative_tax but instead expect the user to add --negative_tax to ext.args and have the code manipulate ext.args to turn --negative_tax into --negative_taxids 1234 or --negative_taxidlist /path/to/....

@muffato
Copy link
Member

muffato commented Aug 15, 2025

Another idea would be to not have an input channel named negative_tax but instead expect the user to add --negative_tax to ext.args and have the code manipulate ext.args to turn --negative_tax into --negative_taxids 1234 or --negative_taxidlist /path/to/....

There's an example of that in the vcftools module: https://github.com/nf-core/modules/blob/master/modules/nf-core/vcftools/main.nf#L103-L108

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants