Chromeister: An ultra fast, heuristic approach to detect conserved signals in extremely large pairwise genome comparisons.
Allocate an interactive session and run the program.
Sample session (user input in bold):
[user@biowulf]$ sinteractive [user@cn4274 ~]$ module load chromeister [+] Loading chromeister 1.5.a on cn4274 [+] Loading singularity 3.10.5 on cn4274 [+] Loading gcc 11.3.0 ... [+] Loading HDF5 1.12.2 [+] Loading netcdf 4.9.0 [-] Unloading gcc 11.3.0 ... [+] Loading gcc 11.3.0 ... [+] Loading openmpi/4.1.3/gcc-11.3.0 ... [+] Loading pandoc 2.18 on cn4274 [+] Loading pcre2 10.40 [+] Loading R 4.3.0
Execute Chromeister binary
#copy over test-data [user@cn4274 ~]$ cp -a /usr/local/apps/chromeister/test-data . [user@cn4274 ~]$ cd test-data #run chromeister with inputs [user@cn4274 test-data]$ CHROMEISTER -query mycoplasma-232.fasta \ -db mycoplasma-232.fasta \ -out mycoplasma-232-7422.mat \ -dimension 500 && Rscript ${SCRIPTS}/compute_score.R mycoplasma-232-7422.mat 500 [INFO] Generating a 500x500 matrix [INFO] Loading database 99%...[INFO] Database loaded and of length 892758. [INFO] Ratios: Q [1.785516e+03] D [1.785516e+03]. Lenghts: Q [892758] D [892758] [INFO] Pixel size: Q [5.600622e-04] D [5.600622e-04]. [INFO] Computing absolute hit numbers. 99%...Scanning hits table. 99%... [INFO] Query length 892758. [INFO] Writing matrix. [INFO] Found 25819 unique hits for z = 4. 0
Annotate an SV:
[user@cn4338] cp -a /usr/local/apps/duphold/0.2.3/test_data . [user@cn4338 test_data]$ duphold \ --threads 4 \ --vcf sparse_in.vcf \ --bam sparse.cram \ --fasta sparse.fa \ --output output.bcf #To view output, load samtools and view with bcftools [user@cn4338 test_data] module load samtools [user@cn4338 test_data] bcftools view test-out.bcf ##fileformat=VCFv4.2 ... ##bcftools_viewVersion=1.4-19-g1802ff3+htslib-1.4-29-g42bfe70 ##bcftools_viewCommand=view CHM1_CHM13/full.37d5.vcf.gz; Date=Mon Sep 24 13:48:04 2018 ... ##bcftools_viewVersion=1.17+htslib-1.17 ##bcftools_viewCommand=view test-out.bcf; Date=Thu May 25 12:49:34 2023 #CHROM POS ID REF ALT QUAL FILTER INFO FORMAT Eluc-CR2.F NW_017858824.1 135118 72454 N DEL 5875.46 . SVTYPE=DEL;END=135332;CIPOS=0,0;CIEND=0,0;CIPOS95=0,0;CIEND95=0,0;GCF=0.306977 GT:DP:DHFC:DHFFC:DHBFC:DHSP 0/1:200:1.91667:0.597403:1.76923:0
For more information on pre and post processing, please visit the Duphold Github Page |