crossmap on Biowulf
CrossMap is a program for convenient conversion of genome coordinates between different assemblies (e.g. mm9->mm10). It can convert SAM, BAM, bed, GTF, GFF, wig/bigWig, and VCF files.
References:
- Hao Zhao et al. CrossMap: a versatile tool for coordinate conversion between genome assemblies. Bioinformatics 2014(30): 1006-1007. PubMed | PMC | Journal
Documentation
Important Notes
- Module Name: crossmap (see the modules page for more information)
- Singlethreaded app
- Example files in /usr/local/apps/crossmap/TEST_DATA
Interactive job
Interactive jobs should be used for debugging, graphics, or applications that cannot be run as batch jobs.
Allocate an interactive session and run the program. Sample session (user input in bold):
[user@biowulf ~]$ sinteractive -c2 --mem=4g --gres=lscratch:10 salloc.exe: Pending job allocation 11342506 salloc.exe: job 11342506 queued and waiting for resources salloc.exe: job 11342506 has been allocated resources salloc.exe: Granted job allocation 11342506 salloc.exe: Waiting for resource configuration salloc.exe: Nodes cn0865 are ready for job srun: error: x11: no local DISPLAY defined, skipping error: unable to open file /tmp/slurm-spank-x11.11342506.0 slurmstepd: error: x11: unable to read DISPLAY value [user@cn0865 ~]$ cd /lscratch/$SLURM_JOB_ID [user@cn0865 11342506]$ module load crossmap [+] Loading crossmap 0.5.2 on cn0865 [+] Loading singularity 3.7.2 on cn0865 [user@cn0865 11342506]$ cp $CMAP_DATA/* . [user@cn0865 11342506]$ crossmap -h Program: CrossMap (v0.2.8) Description: CrossMap is a program for convenient conversion of genome coordinates and genome annotation files between assemblies (eg. lift from human hg18 to hg19 or vice versa). It supports file in BAM, SAM, BED, Wiggle, BigWig, GFF, GTF and VCF format. Usage: CrossMap.py[options] bam convert alignment file in BAM or SAM format. bed convert genome cooridnate or annotation file in BED or BED-like format. bigwig convert genome coordinate file in BigWig format. gff convert genome cooridnate or annotation file in GFF or GTF format. vcf convert genome coordinate file in VCF format. wig convert genome coordinate file in Wiggle, or bedGraph format. [user@cn0865 11342506]$ crossmap bed hg18ToHg19.over.chain test_input > test_output @ 2021-03-25 12:17:39: Read chain_file: hg18ToHg19.over.chain [user@cn0865 11342506]$ head -n2 test_output chr1 142614848 142617697 -> chr1 143903503 143906352 chr1 142617697 142623312 -> chr1 143906355 143911970 [user@cn0865 11342506]$ diff --ignore-all-space expected_output test_output [user@cn0865 11342506]$ exit exit salloc.exe: Relinquishing job allocation 11342506 [user@biowulf ~]$
Batch job
Most jobs should be run as batch jobs.
Create a batch input file (e.g. crossmap.sh). For example:
#!/bin/bash function fail() { echo "$@" >&2 exit 1 } module load crossmap || fail "could not load crossmap module" if [[ ! -f hg19ToHg38.over.chain.gz ]]; then wget http://hgdownload.soe.ucsc.edu/goldenPath/mm9/liftOver/hg19ToHg38.over.chain.gz fi crossmap bam hg19ToHg38.over.chain.gz hg19_example.bam out
Submit this job using the Slurm sbatch command.
sbatch crossmap.sh
Swarm of Jobs
A swarm of jobs is an easy way to submit a set of independent commands requiring identical resources.
Create a swarmfile (e.g. crossmap.swarm). For example:
crossmap bam hg19ToHg38.over.chain.gz sample1.bam sample1_hg38.bam crossmap bam hg19ToHg38.over.chain.gz sample2.bam sample2_hg38.bam crossmap bam hg19ToHg38.over.chain.gz sample3.bam sample3_hg38.bam
Submit this job using the swarm command.
swarm -f crossmap.swarm --module crossmapwhere
--module crossmap | Loads the crossmap module for each subjob in the swarm |