diffTF on Biowulf

Genome-wide quantification of differential transcription factor activity

References:

Documentation
Important Notes

Interactive job
Interactive jobs should be used for debugging, graphics, or applications that cannot be run as batch jobs.

Allocate an interactive session and run the program. For this small test data we will run with the local profile where all snakemake rules are executed as part of the same job. For real data either the small or large profile should be used.

[user@biowulf]$ sinteractive --cpus-per-task=8 --mem=16g --gres=lscratch:20
salloc.exe: Pending job allocation 46116226
salloc.exe: job 46116226 queued and waiting for resources
salloc.exe: job 46116226 has been allocated resources
salloc.exe: Granted job allocation 46116226
salloc.exe: Waiting for resource configuration
salloc.exe: Nodes cn3144 are ready for job

[user@cn3144 ~]$ module load difftf
[+] Loading singularity  4.0.1  on cn3144
[+] Loading snakemake  7.32.3
[+] Loading difftf  1.9

[user@cn3144 ~]$ cp -r ${DIFFTF_TEST_DATA:-none}/input .
[user@cn3144 ~]$ cd input
[user@cn3144 ~]$ cat config.json
{
  "par_general": {
    "outdir": "../output",
    "maxCoresPerRule": 2,
    "dir_TFBS_sorted": false,
    "regionExtension": 100,
    "comparisonType": "GMPvsMPP.all",
    "conditionComparison": "GMP,MPP",
    "designContrast": "~ conditionSummary",
    "designVariableTypes": "conditionSummary:factor",
    "nPermutations": 100,
    "nBootstraps": 0,
    "nCGBins": 10,
    "TFs": "CTCF,CEBPB,SNAI2,CEBPA,UBIP1,CEBPG,CEBPD,ZFX,AP2D,PAX5.S,SNAI1,ZEB1,SP4,MBD2,IRF1,MECP2,PAX5.D,SP3,NFIA.C,SP1.A,IRF7,MYF6,NRF1,DBP,MAZ,NKX28,DLX2,GATA1,P53,ZN143,AIRE,NR2C2,HMGA1,FUBP1,TEAD3,OVOL1,HXD4,KLF1,RXRG,HNF1B,ZIC3,HNF1A,NANOG.S,GFI1,PO3F1,NR2C1,ELF5,TF65.C,NFAC3,TEAD1",
    "dir_scripts": "/usr/local/apps/difftf/1.9/src/R",
    "RNASeqIntegration": true,
    "debugMode": false
  },
  "samples": {
    "summaryFile": "sampleData.tsv",
    "pairedEnd": true
  },
  "peaks": {
    "consensusPeaks": "",
    "peakType": "narrow",
    "minOverlap": 2
  },
  "additionalInputFiles": {
    "refGenome_fasta": "referenceGenome/mm10.fa",
    "dir_TFBS": "/fdb/difftf/1.9/mm10/PWMScan_HOCOMOCOv10",
    "RNASeqCounts": "data/RNA-Seq/RNA.counts.tsv",
    "HOCOMOCO_mapping": "/usr/local/apps/difftf/1.9/src/TF_Gene_TranslationTables/HOCOMOCO_v10/translationTable_mm10.csv"
  }
}

# - you can use --dry-run to see what the execution would look like
[user@cn3144 ~]$ snakemake --profile $DIFFTF_PROFILE/local --cores $SLURM_CPUS_PER_TASK \
    --configfile config.json -s $DIFFTF_SNAKEFILE --dry-run
...snip...
job                                 count
--------------------------------  -------
DiffPeaks                               1
all                                     1
analyzeTF                              50
binningTF                              50
calcNucleotideContent                   1
checkParameterValidity                  1
cleanUpLogFiles                         1
concatenateMotifsPerm                  71
filterSexChromosomesAndSortPeaks        1
intersectPeaksAndBAM                    1
intersectPeaksAndTFBS                   1
intersectTFBSAndBAM                    50
produceConsensusPeaks                   1
resortBAM                               8
sortTFBSParallel                        1
summary1                                1
summaryFinal                            1
total                                 241
...

[user@cn3144 ~]$ snakemake --profile $DIFFTF_PROFILE/local --cores $SLURM_CPUS_PER_TASK \
    --configfile config.json -s $DIFFTF_SNAKEFILE
...much output...
#################################
#  Workflow finished, no error  #
# Check the FINAL_OUTPUT folder #
#################################
[user@cn3144 ~]$ tree ../output/FINAL_OUTPUT
../output/FINAL_OUTPUT/
└── [user    4.0K]  extension100
    ├── [user    754K]  GMPvsMPP.all.allMotifs.tsv.gz
    ├── [user     19K]  GMPvsMPP.all.diagnosticPlotsClassification1.pdf
    ├── [user    642K]  GMPvsMPP.all.diagnosticPlotsClassification2.pdf
    ├── [user    316K]  GMPvsMPP.all.diagnosticPlots.pdf
    ├── [user    2.5M]  GMPvsMPP.all.summary.plots.rds
    ├── [user    2.0K]  GMPvsMPP.all.summary.tsv.gz
    ├── [user    4.2K]  GMPvsMPP.all.summary.volcano.pdf
    ├── [user    247K]  GMPvsMPP.all.summary.volcano.q0.001.pdf
    ├── [user    247K]  GMPvsMPP.all.summary.volcano.q0.01.pdf
    ├── [user    247K]  GMPvsMPP.all.summary.volcano.q0.05.pdf
    ├── [user    247K]  GMPvsMPP.all.summary.volcano.q0.1.pdf
    └── [user    169K]  GMPvsMPP.all.TF_vs_peak_distribution.tsv.gz
[user@cn3144 ~]$ exit
salloc.exe: Relinquishing job allocation 46116226
[user@biowulf ~]$

For more relistic size data you would run this in cluster mode in which case the main interactive session would need minimal resources.

[user@biowulf]$ sinteractive --cpus-per-task=8 --mem=16g --gres=lscratch:20
salloc.exe: Pending job allocation 46116226
salloc.exe: job 46116226 queued and waiting for resources
salloc.exe: job 46116226 has been allocated resources
salloc.exe: Granted job allocation 46116226
salloc.exe: Waiting for resource configuration
salloc.exe: Nodes cn3144 are ready for job

[user@cn3144 ~]$ module load difftf
[+] Loading singularity  4.0.1  on cn3144
[+] Loading snakemake  7.32.3
[+] Loading difftf  1.9

[user@cn3144 ~]$ cp -r ${DIFFTF_TEST_DATA:-none}/input .
[user@cn3144 ~]$ cd input
[user@cn3144 ~]$ snakemake --profile $DIFFTF_PROFILE/small --configfile config.json -s $DIFFTF_SNAKEFILE
...much output...
#################################
#  Workflow finished, no error  #
# Check the FINAL_OUTPUT folder #
#################################

[user@cn3144 ~]$ exit
salloc.exe: Relinquishing job allocation 46116226
[user@biowulf ~]$

Note that for the test data set this will produce shorter jobs which are not an efficient use of the cluster.

Batch job
Most jobs should be run as batch jobs.

Create a batch input file (e.g. difftf.sh). For example:

#!/bin/bash
set -e
module load difftf/1.9
cp -r ${DIFFTF_TEST_DATA:-none}/input . >> cd input
snakemake --profile $DIFFTF_PROFILE/small --configfile config.json -s $DIFFTF_SNAKEFILE

Submit this job using the Slurm sbatch command.

sbatch --cpus-per-task=2 --mem=4g --time=2:00:00 difftf.sh