Skip to content

Conversation

edmundmiller
Copy link
Contributor

PR checklist

  • This comment contains a description of changes (with reason).
  • If you've fixed a bug or added code that should be tested, add tests!
  • If you've added a new tool - have you followed the pipeline conventions in the contribution docs
  • If necessary, also make a PR on the nf-core/rnaseq branch on the nf-core/test-datasets repository.
  • Make sure your code lints (nf-core pipelines lint).
  • Ensure the test suite passes (nextflow run . -profile test,docker --outdir <OUTDIR>).
  • Check for unexpected warnings in debug mode (nextflow run . -profile debug,test,docker --outdir <OUTDIR>).
  • Usage Documentation in docs/usage.md is updated.
  • Output Documentation in docs/output.md is updated.
  • CHANGELOG.md is updated.
  • README.md is updated (including new tool citations and authors/contributors).

edmundmiller and others added 2 commits August 20, 2025 10:46
Adds optional fast mode for dupRadar analysis to address performance
issues with Fusion S3 filesystem. New implementation bypasses R overhead
by using native featureCounts binaries while maintaining full MultiQC
compatibility and identical output formats.

Key improvements:
- 10-100x performance improvement for high-latency storage systems
- Maintains backward compatibility (traditional R mode remains default)
- Full MultiQC integration with identical file formats
- Modular bin script architecture for maintainability
- Comprehensive documentation and configuration examples

Enable via: ext.use_fast_dupradar = true

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <[email protected]>
- Add pull_request_review trigger for approved PRs
- Add revision variable step for better tracking
- Add parameter validation step for debugging
- Add labels for workflow organization
- Update artifact collection to include JSON logs
- Improve workdir/outdir naming with revision tracking

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <[email protected]>
@edmundmiller edmundmiller self-assigned this Aug 20, 2025
Copy link

github-actions bot commented Aug 20, 2025

nf-core pipelines lint overall result: Passed ✅ ⚠️

Posted for pipeline commit 6169482

+| ✅ 292 tests passed       |+
#| ❔   7 tests were ignored |#
!| ❗   9 tests had warnings |!

❗ Test warnings:

  • files_exist - File not found: assets/multiqc_config.yml
  • pipeline_todos - TODO string in base.config: Check the defaults for all processes
  • pipeline_todos - TODO string in main.nf: Optionally add in-text citation tools to this list.
  • pipeline_todos - TODO string in main.nf: Optionally add bibliographic entries to this list.
  • pipeline_todos - TODO string in main.nf: Only uncomment below if logic in toolCitationText/toolBibliographyText has been filled!
  • pipeline_todos - TODO string in methods_description_template.yml: #Update the HTML below to your preferred methods description, e.g. add publication citation for this pipeline
  • pipeline_todos - TODO string in nextflow.config: Specify any additional parameters here
  • pipeline_todos - TODO string in awsfulltest.yml: You can customise AWS full pipeline tests as required
  • pipeline_if_empty_null - ifEmpty(null) found in main.nf: _ versions = ch_versions.ifEmpty(null) // channel: [ versions.yml ]
    _

❔ Tests ignored:

✅ Tests passed:

Run details

  • nf-core/tools version 3.3.2
  • Run at 2025-08-22 15:11:46

edmundmiller and others added 3 commits August 20, 2025 13:13
Moves embedded Python script from dupRadar main.nf to separate bin script
for better maintainability and cleaner code organization. The Python analysis
logic is now contained in dupradar_fast_analysis.py with proper argument parsing.

- Extract 150+ line Python script to bin/dupradar_fast_analysis.py
- Replace embedded script with simple bin script call
- Maintain identical functionality and output formats
- Improve code readability and maintainability

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <[email protected]>
- Change default value from false to true for task.ext.use_fast_dupradar
- Fast mode uses featureCounts + Python instead of R for better performance
- Users can still explicitly disable with ext.use_fast_dupradar = false

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <[email protected]>
- Add scratch directive to RSEM processes for local NVMe temp storage
- Change temporary folder from ./tmp/ to configurable /tmp/ path
- Configure optimal memory (64GB) and disk (200GB) allocations
- Apply optimizations to both standard and Sentieon RSEM modules

This reduces RSEM runtime by up to 2.2x by using local storage for
intensive temporary file operations instead of network-based Fusion.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <[email protected]>
@edmundmiller edmundmiller force-pushed the dupradar-fusion branch 2 times, most recently from 317d030 to 6169482 Compare August 22, 2025 15:09
Copy link
Member

@pinin4fjords pinin4fjords left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure how serious you are with this, or if you're just mucking about, but in general having actual 'software' in the modules repo, or the pipeline repos, gives me the heebie-jeebies. It's likely to be poorly maintained. I'd rather you took this up with the Dupradar authors.

I also worked quite hard to move scripts out of the workflow and into modules, so I'll fight attempts to increase the bin dir again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants