This repository supplements the main COMPASS repository. It provides details about the data pre-processing for the COMPASS input and a reproducible pipeline to guide users from raw FASTQ data to TPM expression results, as well as an example workflow to handle the Cancer Genome Atlas (TCGA) data.
This repository provides a pipeline for generating TPM (Transcripts Per Million) expression data from FASTQ files, as well as scripts and examples for processing TCGA data. The following sections outline the repository structure, usage instructions, and important considerations. RNA-seq data often require a series of preprocessing and normalization steps before downstream analyses. These steps typically include:
- Quality control and filtering of raw reads from FASTQ files.
- Alignment to a reference genome or transcriptome.
- Quantification of gene or transcript expression (commonly expressed as TPM, RPKM, FPKM, etc.).
- Processing of data from public databases like TCGA (The Cancer Genome Atlas), which can involve downloading, reformatting, and integrating with other data for comprehensive analysis.
If you use our resources, please cite our work as follows:
Wanxiang Shen, Thinh H. Nguyen, Michelle M. Li, Yepeng Huang, Intae Moon, Nitya Nair, Daniel Marbach‡, and Marinka Zitnik‡. Generalizable AI predicts immunotherapy outcomes across cancers and treatments [J]. medRxiv.