-
Notifications
You must be signed in to change notification settings - Fork 795
Dupradar fusion #1599
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: dev
Are you sure you want to change the base?
Dupradar fusion #1599
Conversation
Adds optional fast mode for dupRadar analysis to address performance issues with Fusion S3 filesystem. New implementation bypasses R overhead by using native featureCounts binaries while maintaining full MultiQC compatibility and identical output formats. Key improvements: - 10-100x performance improvement for high-latency storage systems - Maintains backward compatibility (traditional R mode remains default) - Full MultiQC integration with identical file formats - Modular bin script architecture for maintainability - Comprehensive documentation and configuration examples Enable via: ext.use_fast_dupradar = true 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>
- Add pull_request_review trigger for approved PRs - Add revision variable step for better tracking - Add parameter validation step for debugging - Add labels for workflow organization - Update artifact collection to include JSON logs - Improve workdir/outdir naming with revision tracking 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>
|
Moves embedded Python script from dupRadar main.nf to separate bin script for better maintainability and cleaner code organization. The Python analysis logic is now contained in dupradar_fast_analysis.py with proper argument parsing. - Extract 150+ line Python script to bin/dupradar_fast_analysis.py - Replace embedded script with simple bin script call - Maintain identical functionality and output formats - Improve code readability and maintainability 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>
- Change default value from false to true for task.ext.use_fast_dupradar - Fast mode uses featureCounts + Python instead of R for better performance - Users can still explicitly disable with ext.use_fast_dupradar = false 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>
- Add scratch directive to RSEM processes for local NVMe temp storage - Change temporary folder from ./tmp/ to configurable /tmp/ path - Configure optimal memory (64GB) and disk (200GB) allocations - Apply optimizations to both standard and Sentieon RSEM modules This reduces RSEM runtime by up to 2.2x by using local storage for intensive temporary file operations instead of network-based Fusion. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>
317d030
to
6169482
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure how serious you are with this, or if you're just mucking about, but in general having actual 'software' in the modules repo, or the pipeline repos, gives me the heebie-jeebies. It's likely to be poorly maintained. I'd rather you took this up with the Dupradar authors.
I also worked quite hard to move scripts out of the workflow and into modules, so I'll fight attempts to increase the bin dir again.
PR checklist
nf-core pipelines lint
).nextflow run . -profile test,docker --outdir <OUTDIR>
).nextflow run . -profile debug,test,docker --outdir <OUTDIR>
).docs/usage.md
is updated.docs/output.md
is updated.CHANGELOG.md
is updated.README.md
is updated (including new tool citations and authors/contributors).