Biowulf High Performance Computing at the NIH
crossmap on Biowulf

CrossMap is a program for convenient conversion of genome coordinates between different assemblies (e.g. mm9->mm10). It can convert SAM, BAM, bed, GTF, GFF, wig/bigWig, and VCF files.


Important Notes

Interactive job
Interactive jobs should be used for debugging, graphics, or applications that cannot be run as batch jobs.

Allocate an interactive session and run the program. Sample session (user input in bold):

[user@biowulf ~]$ sinteractive -c2 --mem=4g --gres=lscratch:10
salloc.exe: Pending job allocation 11342506
salloc.exe: job 11342506 queued and waiting for resources
salloc.exe: job 11342506 has been allocated resources
salloc.exe: Granted job allocation 11342506
salloc.exe: Waiting for resource configuration
salloc.exe: Nodes cn0865 are ready for job
srun: error: x11: no local DISPLAY defined, skipping
error: unable to open file /tmp/slurm-spank-x11.11342506.0
slurmstepd: error: x11: unable to read DISPLAY value

[user@cn0865 ~]$ cd /lscratch/$SLURM_JOB_ID

[user@cn0865 11342506]$ module load crossmap
[+] Loading crossmap  0.5.2  on cn0865
[+] Loading singularity  3.7.2  on cn0865

[user@cn0865 11342506]$ cp $CMAP_DATA/* .

[user@cn0865 11342506]$ crossmap -h
Program: CrossMap (v0.2.8)

  CrossMap is a program for convenient conversion of genome coordinates and genome
  annotation files between assemblies (eg. lift from human hg18 to hg19 or vice
  versa). It supports file in BAM, SAM, BED, Wiggle, BigWig, GFF, GTF and VCF

Usage:  [options]

  bam   convert alignment file in BAM or SAM format.
  bed   convert genome cooridnate or annotation file in BED or BED-like format.
  bigwig        convert genome coordinate file in BigWig format.
  gff   convert genome cooridnate or annotation file in GFF or GTF format.
  vcf   convert genome coordinate file in VCF format.
  wig   convert genome coordinate file in Wiggle, or bedGraph format.

[user@cn0865 11342506]$ crossmap bed hg18ToHg19.over.chain test_input > test_output
@ 2021-03-25 12:17:39: Read chain_file:  hg18ToHg19.over.chain

[user@cn0865 11342506]$ head -n2 test_output
chr1    142614848       142617697       ->      chr1    143903503       143906352
chr1    142617697       142623312       ->      chr1    143906355       143911970

[user@cn0865 11342506]$ diff --ignore-all-space expected_output test_output

[user@cn0865 11342506]$ exit
salloc.exe: Relinquishing job allocation 11342506

[user@biowulf ~]$ 

Batch job
Most jobs should be run as batch jobs.

Create a batch input file (e.g. For example:

function fail() {
    echo "$@" >&2
    exit 1

module load crossmap || fail "could not load crossmap module"
if [[ ! -f hg19ToHg38.over.chain.gz ]]; then
crossmap bam hg19ToHg38.over.chain.gz hg19_example.bam out

Submit this job using the Slurm sbatch command.

Swarm of Jobs
A swarm of jobs is an easy way to submit a set of independent commands requiring identical resources.

Create a swarmfile (e.g. crossmap.swarm). For example:

crossmap bam hg19ToHg38.over.chain.gz sample1.bam sample1_hg38.bam
crossmap bam hg19ToHg38.over.chain.gz sample2.bam sample2_hg38.bam
crossmap bam hg19ToHg38.over.chain.gz sample3.bam sample3_hg38.bam

Submit this job using the swarm command.

swarm -f crossmap.swarm --module crossmap
--module crossmap Loads the crossmap module for each subjob in the swarm