xTea: comprehensive transposable element analyzer

xTea: comprehensive transposable element analyzer

Quick Links

xTea (x-Transposable element analyzer), is a tool for identifying TE insertions in whole-genome sequencing data. Whereas existing methods are mostly designed for shortread data, xTea can be applied to both short-read and long-read data. xTea outperforms other short read-based methods for both germline and somatic TE insertion discovery.

References:

Chong Chu, Rebeca Borges-Monroy, Vinayak V. Viswanadham, Soohyun Lee, Heng Li, Eunjung Alice Lee and Peter J. Park,
Comprehensive identification of transposable element insertions using multiple sequencing technologies
Nature Communications 12, Article number: 3836 (2021).

Documentation

Important Notes

Module Name: xTea (see the modules page for more information)
Unusual environment variables set
- XTEA_HOME installation directory
- XTEA_BIN executable directory
- XTEA_DATA sample data directory

Interactive job

Interactive jobs should be used for debugging, graphics, or applications that cannot be run as batch jobs.

Allocate an interactive session and run the program. Sample session:

[user@biowulf]$ sinteractive --mem=8g -c8 --gres=lscratch:10
[user@cig 3335 ~]$ module load xTea
[+] Loading singularity  3.10.5  on cn3335
[+] Loading xTea  1.0.0 
[user@cn3335 ~]$ xtea -h
Usage: xtea [options]

Options:
  -h, --help            show this help message and exit
  -D, --decompress      Decompress the rep lib and reference file
  -M, --mosaic          Calling mosaic events from high coverage data
  -C, --case_control    Run in case control mode
  --denovo              Run in de novo mode
  -U, --user            Use user specific parameters instead of automatically
                        calculated ones
  --force               Force to start from the very beginning
  --hard                This is hard-cut for fitering out coverage abnormal
                        candidates
  --tumor               Working on tumor samples
  --purity=PURITY       Tumor purity
  --lsf                 Indiates submit to LSF system
  --slurm               Indiates submit to slurm system
  --resume              Resume the running, which will skip the step if output
                        file already exists!
  -V, --version         Print xTea version
  -i FILE, --id=FILE    sample id list file
  -a FILE, --par=FILE   parameter file
  -l FILE, --lib=FILE   TE lib config file
  -b FILE, --bam=FILE   Input bam file
  -x FILE, --x10=FILE   Input 10X bam file12878
  -n CORES, --cores=CORES
                        number of cores
  -m MEMORY, --memory=MEMORY
                        Memory limit in GB
  -q PARTITION, --partition=PARTITION
                        Which queue to run the job
  -t TIME, --time=TIME  Time limit
  -p WFOLDER, --path=WFOLDER
                        Working folder
  -r REF, --ref=REF     reference genome
  -g GENE, --gene=GENE  Gene annotation file
  --xtea=XTEA           xTEA folder
  -f FLAG, --flag=FLAG  Flag indicates which step to run (1-clip, 2-disc,
                        4-barcode, 8-xfilter, 16-filter, 32-asm)
  -y REP_TYPE, --reptype=REP_TYPE
                        Type of repeats working on: 1-L1, 2-Alu, 4-SVA,
                        8-HERV, 16-Mitochondrial
  --flklen=FLKLEN       flank region file
  --nclip=NCLIP         cutoff of minimum # of clipped reads
  --cr=CLIPREP          cutoff of minimum # of clipped reads whose mates map
                        in repetitive regions
  --nd=NDISC            cutoff of minimum # of discordant pair
  --nfclip=NFILTERCLIP  cutoff of minimum # of clipped reads in filtering step
  --nfdisc=NFILTERDISC  cutoff of minimum # of discordant pair of each sample
                        in filtering step
  --teilen=TEILEN       minimum length of the insertion for future analysis
  -o FILE, --output=FILE
                        The output file
  --blacklist=FILE      Reference panel database for filtering, or a blacklist
                        region
[user@cn3335 ~]$ ls $XTEA_BIN
python  shell  xtea  xtea_hg19  xtea_long

Prepare the sample data data to be used:

[user@cn3335 ~]$ ln -s $XTEA_DATA/NA12878_S1.bam
[user@cn3335 ~]$ samtools index NA12878_S1.bam

[user@cn3335 ~]$ ln -s $XTEA_DATA/gencode.v33.chr_patch_hapl_scaff.basic.annotation.gff3
[user@cn3335 ~]$ ln -s /fdb/igenomes/Homo_sapiens/Ensembl/GRCh37/Sequence/WholeGenomeFasta/genome.fa 
[user@cn3335 ~]$ ln -s /fdb/igenomes/Homo_sapiens/Ensembl/GRCh37/Sequence/WholeGenomeFasta/genome.fa.index 
[user@cn3335 ~]$ ln -s /fdb/igenomes/Homo_sapiens/Ensembl/GRCh37/Sequence/WholeGenomeFasta/genome.fa.fai 
[user@cn3335 ~]$ cp $XTEA_DEMO/* .

Create a script to be submitted to the cluster:

[user@cn3335 ~]$ sh run_gnrt_pipeline.sh

Submit the script:

[user@cn3335 ~]$ source submit_jobs.sh
59538871
59538874
59538877