High-Performance Computing at the NIH
GitHub YouTube @nih_hpc RSS Feed
CREST

CREST (Clipping Reveals Structure) is an algorithm for detecting genomic structural variations at base-pair resolution using next-generation sequencing data.

There are multiple versions of CREST available. An easy way of selecting the version is to use modules. To see the modules available, type

module avail CREST

To select a module, type

module load CREST/[ver]

where [ver] is the version of choice. This will set the $PATH and $PERLLIB variables, as well as $CRESTTHOME.

Submitting a single batch job

CREST requires a BLAT server, so this must be started within the job. The following sbatch script will automatically start a BLAT server on the given node, and wait until it is ready for queries.

#!/bin/bash

BLAT_PORT=50000

function safe_start_blat {
    local genome_2bit=$1
    # sleep random amount up to 30s to avoid risk of race condition
    sleep $((RANDOM % 30))

    # count how many gfServers there are already running
    local other_gfservers=$(ps -eo comm | grep -c gfServer)
    echo "there are $other_gfservers other gfServers running"
    BLAT_PORT=$((BLAT_PORT + other_gfservers))

    # start gfServer
    echo "startig gfServer on port $BLAT_PORT"
    gfServer start localhost $BLAT_PORT $genome_2bit -canStop \
        -log=blatServer_${BLAT_PORT}.log &> /dev/null &

    # wait until gfServer is running
    while [[ $(gfServer files localhost $BLAT_PORT 2>&1) =~ "Error in TCP" ]]; do 
        echo "Waiting for BLAT server to start..."
        sleep 10
    done
    echo "BLAT server is running!"
}


module load CREST || exit 1

tmp=$(mktemp -d ./XXXX)
export TMPDIR=${tmp}

# Start up BLAT server:
safe_start_blat $CRESTHOME/hg18.2bit
trap "gfServer stop localhost $BLAT_PORT; rm -rf ${tmp}" EXIT

# Run
extractSClip.pl -i example/tumor.bam --ref_genome $CRESTHOME/hg18.fa
#extractSClip.pl -i example/germline.bam --ref_genome $CRESTHOME/hg18.fa
CREST.pl -f tumor.bam.cover -d example/tumor.bam -g example/germline.bam \
  --ref_genome $CRESTHOME/hg18.fa \
  -t $CRESTHOME/hg18.2bit \
  --blatserver localhost \
  --blatport $BLAT_PORT

Submit the script using the 'sbatch' command on Biowulf

$ sbatch --mem=4GB /data/$USER/theScriptFileAbove
Documentation

CREST home at St. Jude Children's Hospital