xTea: comprehensive transposable element analyzer
xTea (x-Transposable element analyzer), is a tool for identifying TE insertions in whole-genome sequencing data. Whereas existing methods are mostly designed for shortread data, xTea can be applied to both short-read and long-read data. xTea outperforms other short read-based methods for both germline and somatic TE insertion discovery.
References:
- Chong Chu, Rebeca Borges-Monroy, Vinayak V. Viswanadham, Soohyun Lee, Heng Li,
Eunjung Alice Lee and Peter J. Park,
Comprehensive identification of transposable element insertions using multiple sequencing technologies
Nature Communications 12, Article number: 3836 (2021).
Documentation
Important Notes
- Module Name: xTea (see the modules page for more information)
- Unusual environment variables set
- XTEA_HOME installation directory
- XTEA_BIN executable directory
- XTEA_DATA sample data directory
Interactive job
Interactive jobs should be used for debugging, graphics, or applications that cannot be run as batch jobs.
Allocate an interactive session and run the program. Sample session:
[user@biowulf]$ sinteractive --mem=8g -c8 --gres=lscratch:10 [user@cig 3335 ~]$ module load xTea [+] Loading singularity 3.10.5 on cn3335 [+] Loading xTea 1.0.0 [user@cn3335 ~]$ xtea -h Usage: xtea [options] Options: -h, --help show this help message and exit -D, --decompress Decompress the rep lib and reference file -M, --mosaic Calling mosaic events from high coverage data -C, --case_control Run in case control mode --denovo Run in de novo mode -U, --user Use user specific parameters instead of automatically calculated ones --force Force to start from the very beginning --hard This is hard-cut for fitering out coverage abnormal candidates --tumor Working on tumor samples --purity=PURITY Tumor purity --lsf Indiates submit to LSF system --slurm Indiates submit to slurm system --resume Resume the running, which will skip the step if output file already exists! -V, --version Print xTea version -i FILE, --id=FILE sample id list file -a FILE, --par=FILE parameter file -l FILE, --lib=FILE TE lib config file -b FILE, --bam=FILE Input bam file -x FILE, --x10=FILE Input 10X bam file12878 -n CORES, --cores=CORES number of cores -m MEMORY, --memory=MEMORY Memory limit in GB -q PARTITION, --partition=PARTITION Which queue to run the job -t TIME, --time=TIME Time limit -p WFOLDER, --path=WFOLDER Working folder -r REF, --ref=REF reference genome -g GENE, --gene=GENE Gene annotation file --xtea=XTEA xTEA folder -f FLAG, --flag=FLAG Flag indicates which step to run (1-clip, 2-disc, 4-barcode, 8-xfilter, 16-filter, 32-asm) -y REP_TYPE, --reptype=REP_TYPE Type of repeats working on: 1-L1, 2-Alu, 4-SVA, 8-HERV, 16-Mitochondrial --flklen=FLKLEN flank region file --nclip=NCLIP cutoff of minimum # of clipped reads --cr=CLIPREP cutoff of minimum # of clipped reads whose mates map in repetitive regions --nd=NDISC cutoff of minimum # of discordant pair --nfclip=NFILTERCLIP cutoff of minimum # of clipped reads in filtering step --nfdisc=NFILTERDISC cutoff of minimum # of discordant pair of each sample in filtering step --teilen=TEILEN minimum length of the insertion for future analysis -o FILE, --output=FILE The output file --blacklist=FILE Reference panel database for filtering, or a blacklist region [user@cn3335 ~]$ ls $XTEA_BIN python shell xtea xtea_hg19 xtea_longPrepare the sample data data to be used:
[user@cn3335 ~]$ ln -s $XTEA_DATA/NA12878_S1.bam [user@cn3335 ~]$ samtools index NA12878_S1.bam [user@cn3335 ~]$ ln -s $XTEA_DATA/gencode.v33.chr_patch_hapl_scaff.basic.annotation.gff3 [user@cn3335 ~]$ ln -s /fdb/igenomes/Homo_sapiens/Ensembl/GRCh37/Sequence/WholeGenomeFasta/genome.fa [user@cn3335 ~]$ ln -s /fdb/igenomes/Homo_sapiens/Ensembl/GRCh37/Sequence/WholeGenomeFasta/genome.fa.index [user@cn3335 ~]$ ln -s /fdb/igenomes/Homo_sapiens/Ensembl/GRCh37/Sequence/WholeGenomeFasta/genome.fa.fai [user@cn3335 ~]$ cp $XTEA_DEMO/* .Create a script to be submitted to the cluster:
[user@cn3335 ~]$ sh run_gnrt_pipeline.shSubmit the script:
[user@cn3335 ~]$ source submit_jobs.sh 59538871 59538874 59538877