Analysis and comparison scripts for various peak callers, supported by the ChIP-seq analysis pipeline.
- 1_benchmark.ipynb - Jupyter notebook with initial running time benchmarking of peak callers.
- 2_peaks.ipynb - Notebook with the number of peaks / length / jaccard between replicates, etc.
- 3_rnaseq.ipynb - Notebook of peak calling comparison with RNA-seq data.
- 4_chips.ipynb - Benchmark of peak callers versus artificially simulated ChIP-seq data.
- 5_control.ipynb - Assessment of the effect of control on peak calling.
- 6_immgen.ipynb - Benchmark of peak callers on ATAC-seq data.
- 7_omnipeak.ipynb - Deep model analysis of Omnipeak models.
- 8_hyperparameters.ipynb - Analysis of hyperparameters of Omnipeak.
Prepare datasets by downloading files mentioned in Datasets.xlsx.
- Download
fastqfiles from the tabGSE26320into~/data/2023_GSE26320folder. - Download
bamfiles from the tabRoadmapEpigenomicsinto~/data/2023_Immunefolder. - Download
bed.gzfiles from the tabABFinto~/data/2018_chipseq_y20o20folder. Convert them tobamformat usingsamtools. - Download
bamfiles from the tabCTCFinto~/data/2025_TFsfolder. - Download
fastqfiles from the tabImmgeninto~/data/2025_Immgenfolder. - Download
tsvfiles with transcription counts from the tabRNAseqinto~/data/2025_transcriptionfolder. - Download
bamfiles from the tabChipsinto~/data/2025_chipsfolder.
Files layout - please place fastq datasets into fastq subfolder, and bam datasets into bam subfolder.
Datasets without control should be prepared by copying all the raw data without control files into the corresponding folders with _no_control suffix.
Please ensure to use a correct genome version for the datasets - mm10 for Immgen, hg19 for ABF and hg38 for
the rest.
- Fetch chipseq-smk-pipeline GitHub repository into
~/work/chipseq-smk-pipeline. - Navigate to the dataset folder.
- Launch alignment of datasets to the reference genome (optional).
echo "Alignment"
snakemake --printshellcmds -s ~/work/chipseq-smk-pipeline/Snakefile \
all --cores all --use-conda --directory $(pwd) --config genome=<genome> \
fastq_dir=$(pwd)/fastq fastq_ext=fastq \
--rerun-incomplete --rerun-trigger mtime;Use additional bowtie2_params="-X 2000 --dovetail" parameter for ATAC-seq alignment.
- Peak calling of ChIP-seq / ATAC-seq datasets.
echo "Peak calling with default settings (MACS2 narrow, HOMER factor)"
snakemake --printshellcmds -s ~/work/chipseq-smk-pipeline/Snakefile \
all --cores all --use-conda --directory $(pwd) --config genome=<genome> \
start_with_bams=true \
macs2=True sicer=True homer=True hotspot=True peakseq=True lanceotron=True omnipeak=True \
--rerun-incomplete --rerun-trigger mtime;
echo "Peak calling other settings (MACS2 broad, HOMER histone)"
snakemake --printshellcmds -s ~/work/chipseq-smk-pipeline/Snakefile \
all --cores all --use-conda --directory $(pwd) --config genome=<genome> \
start_with_bams=true \
macs2=True macs2_mode=broad macs2_params="--broad --broad-cutoff 0.1" macs2_suffix=broad0.1 \
homer=True homer_style=histone homer_suffix=regions.bed \
--rerun-incomplete --rerun-trigger mtime;See Simulation instructions for details.
benchmark.sh- preliminary benchmark to launch peak calling on a limited set of input data to estimate running timehyperparameters.sh- hyperparameter selection procedure for Omnipeakpeps.sh- launch Omnipeak on a limited set of input data to demonstrate the effect of the PEP threshold
Please ensure that you have the following Python packages installed:
- Jupyter
- Pandas
- PyRanges
- PyBigwig
- Seaborn
- Statannotations
- Scipy
Please ensure that the following tools are available:
- bedtools
- samtools