This is a simple Snakemake wrapper around the Arima Genomics Capture Hi-C (CHiC) workflow. This wrapper allows the Arima workflow to be executed in parallel and with a reproducible Conda environment. Software versions used by this wrapper versus the ones mentioned in the Arima workflow differ, since the Arima versions are quite old and require manual installation. Software versions were chosen to be compatible with the ones validated by Arima.
- Install the Miniconda software distribution at a convenient location.
- Add the Bioconda software channel, as described.
- Install Snakemake, e.g. in a new conda environment:
conda create -n snakemake snakemake - Check out the Arima workflow repository:
git clone https://github.com/ArimaGenomics/CHiC.git arima-chic - In the
arima-chicdirectory above, uncompress thechicagoTools.tar.gzfile:tar xvf chicagoTools.tar.gz, resulting in aarima-chic/chicagoToolsdirectory. - Check out the
snake-chicrepository:git clone https://github.com/insilicoconsulting/snake-chic snake-chic
The workflow expects paired-end FASTQ files in the directory fastq, relative to the snake-chic directory containing the workflow.
The files must be named in the format samplename_[R1|R2].fastq.gz, e.g. sample1_R1.fastq.gz and sample1_R2.fastq.gz.
It's probably easiest to create this naming format using symbolic links, e.g. ln -s /datadir/sample1_S1_L001_R1_001.fastq.gz fastq/sample1_R1.fastq.gz.
- Adapt the parameters in
config/config.yamlto the requirements. Adapt thearima_dirparameter to the location of thearima-chicworkflow directory above, and thechicago_dirparameter to the location of thechicagoToolsdirectory above. The other parameters are explained in the Arima repository README file. - Adapt the sample metadata file
config/samples.tsvto use the sample names corresponding to the fastq files, and the suitable capture BED file for the sample. - Activate the
snakemakeConda environment:conda activate snakemake - Execute the workflow:
snakemake --use-conda -p --cores 16