Genome-wide quantification of differential transcription factor activity
Allocate an interactive session and run the program. For this small test data we will run with the local profile where all snakemake rules are executed as part of the same job. For real data either the small or large profile should be used.
[user@biowulf]$ sinteractive --cpus-per-task=8 --mem=16g --gres=lscratch:20 salloc.exe: Pending job allocation 46116226 salloc.exe: job 46116226 queued and waiting for resources salloc.exe: job 46116226 has been allocated resources salloc.exe: Granted job allocation 46116226 salloc.exe: Waiting for resource configuration salloc.exe: Nodes cn3144 are ready for job [user@cn3144 ~]$ module load difftf [+] Loading singularity 4.0.1 on cn3144 [+] Loading snakemake 7.32.3 [+] Loading difftf 1.9 [user@cn3144 ~]$ cp -r ${DIFFTF_TEST_DATA:-none}/input . [user@cn3144 ~]$ cd input [user@cn3144 ~]$ cat config.json { "par_general": { "outdir": "../output", "maxCoresPerRule": 2, "dir_TFBS_sorted": false, "regionExtension": 100, "comparisonType": "GMPvsMPP.all", "conditionComparison": "GMP,MPP", "designContrast": "~ conditionSummary", "designVariableTypes": "conditionSummary:factor", "nPermutations": 100, "nBootstraps": 0, "nCGBins": 10, "TFs": "CTCF,CEBPB,SNAI2,CEBPA,UBIP1,CEBPG,CEBPD,ZFX,AP2D,PAX5.S,SNAI1,ZEB1,SP4,MBD2,IRF1,MECP2,PAX5.D,SP3,NFIA.C,SP1.A,IRF7,MYF6,NRF1,DBP,MAZ,NKX28,DLX2,GATA1,P53,ZN143,AIRE,NR2C2,HMGA1,FUBP1,TEAD3,OVOL1,HXD4,KLF1,RXRG,HNF1B,ZIC3,HNF1A,NANOG.S,GFI1,PO3F1,NR2C1,ELF5,TF65.C,NFAC3,TEAD1", "dir_scripts": "/usr/local/apps/difftf/1.9/src/R", "RNASeqIntegration": true, "debugMode": false }, "samples": { "summaryFile": "sampleData.tsv", "pairedEnd": true }, "peaks": { "consensusPeaks": "", "peakType": "narrow", "minOverlap": 2 }, "additionalInputFiles": { "refGenome_fasta": "referenceGenome/mm10.fa", "dir_TFBS": "/fdb/difftf/1.9/mm10/PWMScan_HOCOMOCOv10", "RNASeqCounts": "data/RNA-Seq/RNA.counts.tsv", "HOCOMOCO_mapping": "/usr/local/apps/difftf/1.9/src/TF_Gene_TranslationTables/HOCOMOCO_v10/translationTable_mm10.csv" } } # - you can use --dry-run to see what the execution would look like [user@cn3144 ~]$ snakemake --profile $DIFFTF_PROFILE/local --cores $SLURM_CPUS_PER_TASK \ --configfile config.json -s $DIFFTF_SNAKEFILE --dry-run ...snip... job count -------------------------------- ------- DiffPeaks 1 all 1 analyzeTF 50 binningTF 50 calcNucleotideContent 1 checkParameterValidity 1 cleanUpLogFiles 1 concatenateMotifsPerm 71 filterSexChromosomesAndSortPeaks 1 intersectPeaksAndBAM 1 intersectPeaksAndTFBS 1 intersectTFBSAndBAM 50 produceConsensusPeaks 1 resortBAM 8 sortTFBSParallel 1 summary1 1 summaryFinal 1 total 241 ... [user@cn3144 ~]$ snakemake --profile $DIFFTF_PROFILE/local --cores $SLURM_CPUS_PER_TASK \ --configfile config.json -s $DIFFTF_SNAKEFILE ...much output... ################################# # Workflow finished, no error # # Check the FINAL_OUTPUT folder # ################################# [user@cn3144 ~]$ tree ../output/FINAL_OUTPUT ../output/FINAL_OUTPUT/ └── [user 4.0K] extension100 ├── [user 754K] GMPvsMPP.all.allMotifs.tsv.gz ├── [user 19K] GMPvsMPP.all.diagnosticPlotsClassification1.pdf ├── [user 642K] GMPvsMPP.all.diagnosticPlotsClassification2.pdf ├── [user 316K] GMPvsMPP.all.diagnosticPlots.pdf ├── [user 2.5M] GMPvsMPP.all.summary.plots.rds ├── [user 2.0K] GMPvsMPP.all.summary.tsv.gz ├── [user 4.2K] GMPvsMPP.all.summary.volcano.pdf ├── [user 247K] GMPvsMPP.all.summary.volcano.q0.001.pdf ├── [user 247K] GMPvsMPP.all.summary.volcano.q0.01.pdf ├── [user 247K] GMPvsMPP.all.summary.volcano.q0.05.pdf ├── [user 247K] GMPvsMPP.all.summary.volcano.q0.1.pdf └── [user 169K] GMPvsMPP.all.TF_vs_peak_distribution.tsv.gz [user@cn3144 ~]$ exit salloc.exe: Relinquishing job allocation 46116226 [user@biowulf ~]$
For more relistic size data you would run this in cluster mode in which case the main interactive session would need minimal resources.
[user@biowulf]$ sinteractive --cpus-per-task=8 --mem=16g --gres=lscratch:20 salloc.exe: Pending job allocation 46116226 salloc.exe: job 46116226 queued and waiting for resources salloc.exe: job 46116226 has been allocated resources salloc.exe: Granted job allocation 46116226 salloc.exe: Waiting for resource configuration salloc.exe: Nodes cn3144 are ready for job [user@cn3144 ~]$ module load difftf [+] Loading singularity 4.0.1 on cn3144 [+] Loading snakemake 7.32.3 [+] Loading difftf 1.9 [user@cn3144 ~]$ cp -r ${DIFFTF_TEST_DATA:-none}/input . [user@cn3144 ~]$ cd input [user@cn3144 ~]$ snakemake --profile $DIFFTF_PROFILE/small --configfile config.json -s $DIFFTF_SNAKEFILE ...much output... ################################# # Workflow finished, no error # # Check the FINAL_OUTPUT folder # ################################# [user@cn3144 ~]$ exit salloc.exe: Relinquishing job allocation 46116226 [user@biowulf ~]$
Note that for the test data set this will produce shorter jobs which are not an efficient use of the cluster.
Create a batch input file (e.g. difftf.sh). For example:
#!/bin/bash set -e module load difftf/1.9 cp -r ${DIFFTF_TEST_DATA:-none}/input . >> cd input snakemake --profile $DIFFTF_PROFILE/small --configfile config.json -s $DIFFTF_SNAKEFILE
Submit this job using the Slurm sbatch command.
sbatch --cpus-per-task=2 --mem=4g --time=2:00:00 difftf.sh