xTea (x-Transposable element analyzer), is a tool for identifying TE insertions in whole-genome sequencing data. Whereas existing methods are mostly designed for shortread data, xTea can be applied to both short-read and long-read data. xTea outperforms other short read-based methods for both germline and somatic TE insertion discovery.
Allocate an interactive session and run the program. Sample session:
[user@biowulf]$ sinteractive --mem=8g -c8 --gres=lscratch:10
[user@cig 3335 ~]$ module load xTea
[+] Loading singularity 3.10.5 on cn3335
[+] Loading xTea 1.0.0
[user@cn3335 ~]$ xtea -h
Usage: xtea [options]
Options:
-h, --help show this help message and exit
-D, --decompress Decompress the rep lib and reference file
-M, --mosaic Calling mosaic events from high coverage data
-C, --case_control Run in case control mode
--denovo Run in de novo mode
-U, --user Use user specific parameters instead of automatically
calculated ones
--force Force to start from the very beginning
--hard This is hard-cut for fitering out coverage abnormal
candidates
--tumor Working on tumor samples
--purity=PURITY Tumor purity
--lsf Indiates submit to LSF system
--slurm Indiates submit to slurm system
--resume Resume the running, which will skip the step if output
file already exists!
-V, --version Print xTea version
-i FILE, --id=FILE sample id list file
-a FILE, --par=FILE parameter file
-l FILE, --lib=FILE TE lib config file
-b FILE, --bam=FILE Input bam file
-x FILE, --x10=FILE Input 10X bam file12878
-n CORES, --cores=CORES
number of cores
-m MEMORY, --memory=MEMORY
Memory limit in GB
-q PARTITION, --partition=PARTITION
Which queue to run the job
-t TIME, --time=TIME Time limit
-p WFOLDER, --path=WFOLDER
Working folder
-r REF, --ref=REF reference genome
-g GENE, --gene=GENE Gene annotation file
--xtea=XTEA xTEA folder
-f FLAG, --flag=FLAG Flag indicates which step to run (1-clip, 2-disc,
4-barcode, 8-xfilter, 16-filter, 32-asm)
-y REP_TYPE, --reptype=REP_TYPE
Type of repeats working on: 1-L1, 2-Alu, 4-SVA,
8-HERV, 16-Mitochondrial
--flklen=FLKLEN flank region file
--nclip=NCLIP cutoff of minimum # of clipped reads
--cr=CLIPREP cutoff of minimum # of clipped reads whose mates map
in repetitive regions
--nd=NDISC cutoff of minimum # of discordant pair
--nfclip=NFILTERCLIP cutoff of minimum # of clipped reads in filtering step
--nfdisc=NFILTERDISC cutoff of minimum # of discordant pair of each sample
in filtering step
--teilen=TEILEN minimum length of the insertion for future analysis
-o FILE, --output=FILE
The output file
--blacklist=FILE Reference panel database for filtering, or a blacklist
region
[user@cn3335 ~]$ ls $XTEA_BIN
python shell xtea xtea_hg19 xtea_long
Prepare the sample data data to be used:
[user@cn3335 ~]$ ln -s $XTEA_DATA/NA12878_S1.bam [user@cn3335 ~]$ samtools index NA12878_S1.bam [user@cn3335 ~]$ ln -s $XTEA_DATA/gencode.v33.chr_patch_hapl_scaff.basic.annotation.gff3 [user@cn3335 ~]$ ln -s /fdb/igenomes/Homo_sapiens/Ensembl/GRCh37/Sequence/WholeGenomeFasta/genome.fa [user@cn3335 ~]$ ln -s /fdb/igenomes/Homo_sapiens/Ensembl/GRCh37/Sequence/WholeGenomeFasta/genome.fa.index [user@cn3335 ~]$ ln -s /fdb/igenomes/Homo_sapiens/Ensembl/GRCh37/Sequence/WholeGenomeFasta/genome.fa.fai [user@cn3335 ~]$ cp $XTEA_DEMO/* .Create a script to be submitted to the cluster:
[user@cn3335 ~]$ sh run_gnrt_pipeline.shSubmit the script:
[user@cn3335 ~]$ source submit_jobs.sh 59538871 59538874 59538877