xTea (x-Transposable element analyzer), is a tool for identifying TE insertions in whole-genome sequencing data. Whereas existing methods are mostly designed for shortread data, xTea can be applied to both short-read and long-read data. xTea outperforms other short read-based methods for both germline and somatic TE insertion discovery.
Allocate an interactive session and run the program. Sample session:
[user@biowulf]$ sinteractive --mem=8g -c8 --gres=lscratch:10 [user@cig 3335 ~]$ module load xTea [+] Loading singularity 3.10.5 on cn3335 [+] Loading xTea 1.0.0 [user@cn3335 ~]$ xtea -h Usage: xtea [options] Options: -h, --help show this help message and exit -D, --decompress Decompress the rep lib and reference file -M, --mosaic Calling mosaic events from high coverage data -C, --case_control Run in case control mode --denovo Run in de novo mode -U, --user Use user specific parameters instead of automatically calculated ones --force Force to start from the very beginning --hard This is hard-cut for fitering out coverage abnormal candidates --tumor Working on tumor samples --purity=PURITY Tumor purity --lsf Indiates submit to LSF system --slurm Indiates submit to slurm system --resume Resume the running, which will skip the step if output file already exists! -V, --version Print xTea version -i FILE, --id=FILE sample id list file -a FILE, --par=FILE parameter file -l FILE, --lib=FILE TE lib config file -b FILE, --bam=FILE Input bam file -x FILE, --x10=FILE Input 10X bam file12878 -n CORES, --cores=CORES number of cores -m MEMORY, --memory=MEMORY Memory limit in GB -q PARTITION, --partition=PARTITION Which queue to run the job -t TIME, --time=TIME Time limit -p WFOLDER, --path=WFOLDER Working folder -r REF, --ref=REF reference genome -g GENE, --gene=GENE Gene annotation file --xtea=XTEA xTEA folder -f FLAG, --flag=FLAG Flag indicates which step to run (1-clip, 2-disc, 4-barcode, 8-xfilter, 16-filter, 32-asm) -y REP_TYPE, --reptype=REP_TYPE Type of repeats working on: 1-L1, 2-Alu, 4-SVA, 8-HERV, 16-Mitochondrial --flklen=FLKLEN flank region file --nclip=NCLIP cutoff of minimum # of clipped reads --cr=CLIPREP cutoff of minimum # of clipped reads whose mates map in repetitive regions --nd=NDISC cutoff of minimum # of discordant pair --nfclip=NFILTERCLIP cutoff of minimum # of clipped reads in filtering step --nfdisc=NFILTERDISC cutoff of minimum # of discordant pair of each sample in filtering step --teilen=TEILEN minimum length of the insertion for future analysis -o FILE, --output=FILE The output file --blacklist=FILE Reference panel database for filtering, or a blacklist region [user@cn3335 ~]$ ls $XTEA_BIN python shell xtea xtea_hg19 xtea_longPrepare the sample data data to be used:
[user@cn3335 ~]$ ln -s $XTEA_DATA/NA12878_S1.bam [user@cn3335 ~]$ samtools index NA12878_S1.bam [user@cn3335 ~]$ ln -s $XTEA_DATA/gencode.v33.chr_patch_hapl_scaff.basic.annotation.gff3 [user@cn3335 ~]$ ln -s /fdb/igenomes/Homo_sapiens/Ensembl/GRCh37/Sequence/WholeGenomeFasta/genome.fa [user@cn3335 ~]$ ln -s /fdb/igenomes/Homo_sapiens/Ensembl/GRCh37/Sequence/WholeGenomeFasta/genome.fa.index [user@cn3335 ~]$ ln -s /fdb/igenomes/Homo_sapiens/Ensembl/GRCh37/Sequence/WholeGenomeFasta/genome.fa.fai [user@cn3335 ~]$ cp $XTEA_DEMO/* .Create a script to be submitted to the cluster:
[user@cn3335 ~]$ sh run_gnrt_pipeline.shSubmit the script:
[user@cn3335 ~]$ source submit_jobs.sh 59538871 59538874 59538877