Post-processing & Lightweight Updates To pipeline Output
CWL files and workflows to accompany the helix_filters_01 repo. Supported by infrastructure in the pluto submodule.
Clone this repo with
git clone --recursive https://github.com/mskcc/pluto-cwl.git
cd pluto-cwl
Install dependencies for the repo with the command:
make install 
This will checkout the included git submodules and install a local conda with extra dependencies.
Use this command to activate the installed environment for running workflows:
. env.juno.sh toil
This will:
- update your environment to use the cwltoolandtoilinstalled in the localconda
- (if running on Juno HPC) update your environment with Toil variables needed to run on Juno
- (if running on Juno HPC) upate your environment to use pre-cached Singularity containers located on Juno
The primary entry point for the workflow is cwl/workflow_with_facets.cwl.
You can run a CWL included in this repo by using the wrapper scripts bundled in the pluto submodule;
- pluto/run-cwltool.shfor simple use cases
- pluto/run-toil.shif parallel processing and HPC (LSF) useage is required
Development and testing takes place via the test suite.
The included test suite can be run with:
make test
It typically takes about 45 minutes to run all included tests
- NOTE: tests require data sets that are pre-saved on the junoserver
Some very large integration tests are skipped by default. To include all tests, export the environment variable LARGE_TESTS=True or include it in the command line invocation. You can also change the CWL engine from cwltool to toil, among other settings, the same way. For example;
LARGE_TESTS=True CWL_ENGINE=Toil PRINT_COMMAND=True TMP_DIR=/scratch USE_LSF=True make test
Available environment variable settings are derived from the pluto.settings submodule.
An extra recipe is included which can run the tests in parallel, for example to run 8 tests at once you can use this command:
make parallel-test
For development purposes, it is helpful to be able to run only a specific test case, or subset of tests.
You can run just the script with the tests you are interested in, such as;
python tests/test_workflow_cwl.py
You can further select which test case(s) from the script you wish to run by adding their labels as args;
python tests/test_workflow_cwl.py TestClassName
python tests/test_workflow_cwl.py TestClassName.test_function
This can be combined with the environment variables described above (such as LARGE_TESTS, PRINT_COMMAND, KEEP_TMP, etc.).