Skip to content

Conversation

dianichj
Copy link
Contributor

FOR CONTRIBUTOR:

  • I have read the CONTRIBUTING.md document and this tool is appropriate for the tools-iuc repo.
  • License permits unrestricted use (educational + commercial)
  • This PR adds a new tool or tool collection
  • This PR updates an existing tool or tool collection
  • This PR does something else (explain below)

This PR adds a new Galaxy tool for the LEMUR R package, designed for fitting a latent embedding multivariate regression model to multi-condition single-cell data.

🧬 Tool purpose

LEMUR provides a parametric framework to:

  • Align multi-condition single-cell transcriptomics data
  • Predict log fold changes between conditions at single-cell resolution
  • Identify spatially coherent neighborhoods of cells with consistent transcriptional shifts
  • Perform pseudobulk differential expression testing on these neighborhoods

It is especially suited for complex experimental designs such as treatment vs. control, time-course, or disease progression studies.

📂 Tool contents

  • lemur.xml: Galaxy wrapper with required parameters and outputs
  • lemur.R: R script that runs the LEMUR pipeline (fit, align, test)
  • test-data/: Includes example RDS input and expected PDF/TSV outputs:
    • UMAPs and volcano plots
    • DE results and neighborhood summaries

🧪 Test data

The test data included are derived from a publicly available glioblastoma single-cell dataset. They demonstrate:

  • Successful model fitting
  • Harmony-based alignment
  • Visualization of DE results per condition

🛠️ Requirements

Conda packages required:

  • bioconductor-lemur
  • bioconductor-singlecellexperiment
  • r-optparse
  • r-tidyverse
  • r-uwot

These are defined in the wrapper's <requirements> section.

📌 Notes

Please help me improve this tool wrapper, it still needs work. Thanks so much for reviewing and for your help!

cc: @nilchia 🚀🖖

@nilchia
Copy link
Contributor

nilchia commented Jul 25, 2025

Cool 🎉

Copy link
Contributor

@nilchia nilchia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Super nice!!
Thanks @dianichj

@nilchia
Copy link
Contributor

nilchia commented Jul 25, 2025

Please add a .shed.yml file.
https://github.com/dianichj/tools-iuc/blob/main/tools/seurat/.shed.yml

@dianichj dianichj marked this pull request as ready for review August 1, 2025 14:58
@@ -9,10 +9,19 @@ suppressPackageStartupMessages({
library(ggplot2)
})

#----- Function to save plots in different formats ----
save_plot <- function(filename, plot, format = "pdf", width = 6, height = 5, dpi = 300) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please make the dpi, width, and height configurable.

<param name="meta_table" type="data" format="tabular" label="Sample metadata table (TSV)" help="TSV file with one row per sample/cell. Must contain columns for condition and optionally for batch." />
<param name="condition_column" type="data_column" data_ref="meta_table" label="Condition column" help="Select the condition column (e.g., treatment vs control). Only appears after loading metadata." />
<param name="batch_column" type="data_column" data_ref="meta_table" optional="true" label="Batch column (optional)" help="Optional batch variable (e.g., patient ID). Only appears after loading metadata." />
<param name="contrast_condition" type="text" value="panobinostat" label="Condition for contrast (e.g. treatment)" />
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if it is OK here to have a default value.
Maybe at least keep it as "treatment" please.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And please add validators for both text params.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ping for the validator

@@ -9,10 +9,19 @@ suppressPackageStartupMessages({
library(ggplot2)
})

#----- Function to save plots in different formats ----
save_plot <- function(filename, plot, format = "pdf", width = 6, height = 5, dpi = 300) {
ext <- tolower(format)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your formats are already lowercase, no?

<data name="gene_hist_pdf" format="pdf" from_work_dir="gene_hist.pdf" label="Gene histogram plot" />
<data name="chr_scatter_pdf" format="pdf" from_work_dir="chr_scatter.pdf" label="Chromosome scatter plot" />
<data name="tumor_umap_pdf" format="pdf" from_work_dir="tumor_umap.pdf" label="Tumor UMAP plot" />
<data name="gene_umap" format="pdf" from_work_dir="gene_umap.pdf" label="Gene UMAP plot">
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this work? You are hardcoding it to look for gene_umap.pdf file.
Please add a test using other formats as output.

<when input="plot_format" value="png" format="png"/>
<when input="plot_format" value="jpg" format="jpg"/>
</change_format>
</data>
<data name="de_results" format="tabular" from_work_dir="de_results.tsv" label="LEMUR DE results" />
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use this lable for all outputs:
${tool.name} on ${on_string}:

<assert_contents>
<has_line_matching expression="^name\tn_cells\t.*did_lfc$"/>
<has_text text="ENSG00000210082"/>
</assert_contents>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please also add has_n_lines so we know the file has correct number of genes

LEMUR uses the `SingleCellExperiment` (SCE) format because it is the standard Bioconductor structure for single-cell data in R. Unlike Seurat or AnnData, SCE is lightweight, interoperable, and not tied to a specific framework. It cleanly separates assays (e.g., expression values), cell metadata (`colData`), feature metadata (`rowData`), and reduced dimensions (`reducedDims`).

For LEMUR to work correctly, the SCE object **must include**:
- **`logcounts()`**: matrix of log-normalized gene expression.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It also needs counts

@nilchia
Copy link
Contributor

nilchia commented Aug 2, 2025

Categories [['Statistical Analysis']] unknown.

dianichj and others added 5 commits August 3, 2025 02:55
Co-authored-by: Amirhossein Nilchi <[email protected]>
Co-authored-by: Amirhossein Nilchi <[email protected]>
Co-authored-by: Amirhossein Nilchi <[email protected]>
Co-authored-by: Amirhossein Nilchi <[email protected]>
Co-authored-by: Amirhossein Nilchi <[email protected]>
@dianichj
Copy link
Contributor Author

dianichj commented Aug 6, 2025

Ping @nilchia :)!

@@ -0,0 +1,271 @@
<tool id="lemur" name="LEMUR" version="1.0.1+galaxy0" profile="25.0" license="MIT">
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The version should be 1.4.0+galaxy0 I think to indicate the version for the main dependency.

For that you can use a macro

<param name="meta_table" type="data" format="tabular" label="Sample metadata table (TSV)" help="TSV file with one row per sample/cell. Must contain columns for condition and optionally for batch." />
<param name="condition_column" type="data_column" data_ref="meta_table" label="Condition column" help="Select the condition column (e.g., treatment vs control). Only appears after loading metadata." />
<param name="batch_column" type="data_column" data_ref="meta_table" optional="true" label="Batch column (optional)" help="Optional batch variable (e.g., patient ID). Only appears after loading metadata." />
<param name="contrast_condition" type="text" value="panobinostat" label="Condition for contrast (e.g. treatment)" />
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ping for the validator

<param name="batch_column" type="data_column" data_ref="meta_table" optional="true" label="Batch column (optional)" help="Optional batch variable (e.g., patient ID). Only appears after loading metadata." />

<param name="contrast_condition" type="text" value="panobinostat" label="Condition for contrast (e.g. treatment)" />
<param name="reference_condition" type="text" value="ctrl" label="Reference condition (e.g. control)" />
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a validator would be nice here as well

<param name="tumor_annotation_column" type="text" value="chromosome"
label="Tumor annotation column in rowData"
help="Used for tumor classification. Default is 'chromosome'. Override if your SCE uses a different annotation column." />
</inputs>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

indentation is a bit off here, please use the galaxy-language-server to reformat this tool

help="Used for tumor classification. Default is 'chromosome'. Override if your SCE uses a different annotation column." />
</inputs>

<outputs>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these are a lot of outputs, are all of them always needed? Should the user be able to select only a few of them?

--output_umap 'umap.$plot_format'
--output_volcano 'volcano.$plot_format'
--output_de 'de_results.tsv'
#if str($sel_gene):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"#end if" is missing for some reason.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants