GenomeSyn: constructing and visualizing genome synteny

GenomeSyn, implemented in Perl, is a bioinformatics tool for visualizing genome synteny and structural variations. It provides genome synteny visualization for two or three genomic sequences. Additionally, genome annotation can be uploaded to visualize various SVs among different genomes.

References:

Documentation
Important Notes

Interactive job
Interactive jobs should be used for debugging, graphics, or applications that cannot be run as batch jobs.

Allocate an interactive session and run the program. Sample session:

[user@biowulf]$ sinteractive
[user@cn3144 ~]$ module load 
[+] Loading singularity  4.0.3  on cn3144
[+] Loading GenomeSyn  20240614
[user@cn3144 ~]$ GenomeSyn -h
Usage:
    GenomeSyn [options]

    example:

            a) GenomeSyn -g1 ../data/rice_MH63.fa -g2 ../data/rice_ZS97.fa

        b) GenomeSyn -t 3 -g1 ../data/rice_MH63.fa -g2 ../data/rice_ZS97.fa -cf1 ../data/rice_MH63vsZS97.delta.filter.coords

        c) GenomeSyn -t 3 -g1 ../data/rice_MH63.fa -g2 ../data/rice_ZS97.fa -cf1 ../data/rice_MH63vsZS97.delta.filter.coords -cen1 ../data/rice_MH63_centromere.bed -cen2 ../data/rice_ZS97_centromere.bed -tel1 ../data/rice_MH63_telomere.bed -tel2 ../data/rice_ZS97_telomere.bed -TE1 ../data/rice_MH63_repeat.bed -TE2 ../data/rice_ZS97_repeat.bed -PAV1 ../data/rice_MH63_PAV.bed -PAV2 ../data/rice_ZS97_PAV.bed -NLR1 ../data/rice_MH63_NLR.bed -NLR2 ../data/rice_ZS97_NLR.bed -r MH63 -q ZS97 -GD1 ../data/rice_MH63_nonTEgene.gff3 -GD2 ../data/rice_ZS97_nonTEgene.gff3 -GC1 ../data/rice_MH63_GC_10000.bed -GC2 ../data/rice_ZS97_GC_10000.bed -GC_win 100000 -TE_min 40

        d) GenomeSyn -t 3 -n3 12 -g1 ../data/rice_MH63.fa -g2 ../data/rice_ZS97.fa -g3 ../data/rice_R498.fasta -cf1 ../data/rice_MH63vsZS97.delta.filter.coords -cf2 ../data/rice_MH63vsR498.delta.filter.coords -cen1 ../data/rice_MH63_centromere.bed -cen2 ../data/rice_ZS97_centromere.bed -cen3 ../data/rice_R498_centromere.bed -tel1 ../data/rice_MH63_telomere.bed -tel2 ../data/rice_ZS97_telomere.bed -tel3 ../data/rice_R498_telomere.bed -TE2 ../data/rice_ZS97_repeat.bed -PAV1 ../data/rice_MH63_PAV.bed -PAV2 ../data/rice_ZS97_PAV.bed -NLR1 ../data/rice_MH63_NLR.bed -NLR2 ../data/rice_ZS97_NLR.bed -r MH63 -q1 ZS97 -q2 R498 -GD1 ../data/rice_MH63_nonTEgene.gff3 -GD2 ../data/rice_ZS97_nonTEgene.gff3 -GD3 ../data/rice_R498_IGDBv3_coreset.gff -GC2 ../data/rice_ZS97_GC_10000.bed -GC_win 100000 -TE_min 40

Options:
    -aligntype/-at/-t
            The output mode is divided into four output modes, the parameter
            value is 1/2/3/4, and the default value is 1; When the value is
            1, only the one-to-one double/triple sequence comparison chart
            will be output; when the value is 2, only the
            multiple-to-multiple double/triple sequence comparison chart
            will be output; when the value is 3, it will output
            simultaneously in 1, 2 mode Two comparison graphs of; when the
            value is 4, in addition to the first two comparison graphs, some
            statistical sub-graphs will be generated, such as consistency
            heat map, consistency histogram, and coverage histogram.

    -genomeSeq1/-g1
            Input the genome1 fasta file to obtain the length of each
            chromosome in the genome1(ie reference genome).

    -genomeSeq2/-g2
            Input the genome2 fasta file to obtain the length of each
            chromosome in the genome2(ie query genome).

    -genomeSeq3/-g3
            Input the genome3 fasta file to obtain the length of each
            chromosome in the genome3(ie query genome2).

    -comparison_file/-comparison_file1/-cf/-cf1
            Input the coordinate file for comparing genome1 and genome2, if
            there is no coordinate file, call mummer to compare genome1 and
            genome2 to generate this coordinate file, such as
            ReferencevsQuery1.delta.filter.coords.

    -comparison_file2/-cf2
            Input the coordinate file for comparing genome1 and genome3, if
            there is no coordinate file, call mummer to compare genome1 and
            genome3 to generate this coordinate file, such as
            ReferencevsQuery2.delta.filter.coords.

    -comparison_file3/-cf3
            Input the coordinate file for comparing genome2 and genome3, if
            there is no coordinate file, call mummer to compare genome2 and
            genome3 to generate this coordinate file, such as
            Query1vsQuery2.delta.filter.coords.

    -SVG_PDF/-pdf
            Format transition, generate the corresponding PDF format file
            with the SVG format file as the original, value is 1/0, default
            true(1), that is default output SVG format and PDF format files
            are output at the same time.

    -sort   -sort��ģʽ"match""reference_length_match",Ĭϲ
            "match"ģʽܣreferenceȾɫŶqueryȾɫ
            ƥ"reference_length_match"ģʽܣȰȾɫ峤
            ȴӳ̶referenceȾɫŽٰƥquery
            ȾɫŽ

    -chromosomename/-cn
            Chromosome numeration setting, the value is 1/0, and the default
            value is false (0); when the value is 0, the unified chromosome
            numeration(Chromosome numeration for reference genome) will be
            displayed on the output map, and when the value is 1, the actual
            chromosome numeration in the comparison file will be displayed
            on the output map.

    -referencename/-reference/-ref/-r
            Set the name of the genome1, default output is "reference".eg.
            MH63

    -queryname/-queryname1/-query/-query1/-q/-q1
            Set the name of the genome2, default output is
            "query"/"query1".eg. ZS97

    -queryname2/-query2/-q2
            Set the name of the genome3, default output is "query2".eg. R498

    -centromere_genome1/-centromere1/-cen1
            Input the centromere position file of genome1, the file uses the
            bed (Browser Extensible Data) format, and draw centromeres on
            each chromosome of genome1.

    -centromere_genome2/-centromere2/-cen2
            Input the centromere position file of genome2, the file uses the
            bed (Browser Extensible Data) format, and draw centromeres on
            each chromosome of genome2.

    -centromere_genome3/-centromere3/-cen3
            Input the centromere position file of genome3, the file uses the
            bed (Browser Extensible Data) format, and draw centromeres on
            each chromosome of genome3.

    -telomere_genome1/-telomere1/-tel1
            Input the telomere position file of genome1, the file uses the
            bed format, and draw telomere on each chromosome of genome1.

    -telomere_genome2/-telomere2/-tel2
            Input the telomere position file of genome2, the file uses the
            bed format, and draw telomere on each chromosome of genome2.

    -telomere_genome3/-telomere3/-tel3
            Input the telomere position file of genome3, the file uses the
            bed format, and draw telomere on each chromosome of genome3.

    -snp_genome1/-snp1
            Input the SNP file of genome1, which uses the bed format to map
            the SNP distribution of genome1.

    -snp_genome2/-snp2
            Input the SNP file of genome2, which uses the bed format to map
            the SNP distribution of genome2.

    -snp_genome3/-snp3
            Input the SNP file of genome3, which uses the bed format to map
            the SNP distribution of genome3.

    -snp_thresholds/-snp_max
            SNP threshold setting, that is, setting the upper limit of SNP
            statistics, the default value is 2000.

    -TE_genome1/-TE1
            Input the TE file of genome1, which uses the bed format to map
            the TE distribution of genome1.

    -TE_genome2/-TE2
            Input the TE file of genome2, which uses the bed format to map
            the TE distribution of genome2.

    -TE_genome3/-TE3
            Input the TE file of genome3, which uses the bed format to map
            the TE distribution of genome3.

    -TE_thresholds/-TE_min
            Set the TE threshold that set the lower limit of TE statistics,
            default the integer value of the smallest TE proportion in the
            TE file used is the lower limit, for example, the minimum TE is
            11%, the icon in the lower right corner shows a scale of
            10%-100%, the minimum TE is 28%, and the icon in the lower right
            corner shows a scale of 20%-100%; if the user inputs the lower
            limit of TE, it will be output according to the lower limit of
            TE input by the user, and the value is 0-100. For example: input
            "-TE_min 50", then a 50%-100% TE statistical graph will be
            drawn, TE has two display forms, but only when TE is displayed
            in a histogram, the lower limit of TE can be adjusted.

    -GC_genome1/-GC_content1/-GC1
            Input the bed format file of the GC content of genome1 to plot
            the distribution of the GC content of genome1.

    -GC_genome2/-GC_content2/-GC2
            Input the bed format file of the GC content of genome2 to plot
            the distribution of the GC content of genome2.

    -GC_genome3/-GC_content3/-GC3
            Input the bed format file of the GC content of genome2 to plot
            the distribution of the GC content of genome3.

    -PAV_genome1/-PAV1
            Input the PAV file of genome1, which uses the bed format to map
            the PAV distribution of genome1.

    -PAV_genome2/-PAV2
            Input the PAV file of genome2, which uses the bed format to map
            the PAV distribution of genome2.

    -PAV_genome3/-PAV3
            Input the PAV file of genome3, which uses the bed format to map
            the PAV distribution of genome3.

    -NLR_genome1/-NLR1
            Input the NLR file of genome1, which uses the bed format to map
            the NLR distribution of genome1.

    -NLR_genome2/-NLR2
            Input the NLR file of genome2, which uses the bed format to map
            the NLR distribution of genome2.

    -NLR_genome3/-NLR3
            Input the NLR file of genome3, which uses the bed format to map
            the NLR distribution of genome3.

    -gene_density_genome1/-GD1
            Input the annotation file of genome1, which uses the gff3 format
            to map the gene density distribution of genome1.

    -gene_density_genome2/-GD2
            Input the annotation file of genome2, which uses the gff3 format
            to map the gene density distribution of genome2.

    -gene_density_genome3/-GD3
            Input the annotation file of genome3, which uses the gff3 format
            to map the gene density distribution of genome3.

    -GeneDensity_Window/-GD_win
            Set the window size for statistical gene density, this parameter
            is a required parameter when the gene density is counted in the
            annotation file of the input gene, the value can be set to
            100000.

    -SNP_Window/-SNP_win
            Set the window size for statistical SNPs, this parameter is
            optional, its value is determined by default according to the
            window size in the bed file of the input SNP of genome1.

    -TE_Window/-TE_win
            Set the window size for statistical TEs, this parameter is
            optional, its value is determined by default according to the
            window size in the bed file of the input TEs of genome1.

    -GC_Content_Window/-GC_win
            Set the window size for statistical GC content, this parameter
            is optional, its value is determined by default according to the
            window size in the bed file of the input GC content of genome1.

    -synteny_length_min/-synteny_min/-syn_min
            Set the minimum length for drawing synteny fragments, the
            default value is 10000.

    -inversion_length_min/-inversion_min/-inv_min
            Set the minimum length for drawing inversion fragments, the
            default value is 10000.

    -PAV_length_min/-PAV_min
            Set the minimum length for drawing PAV, the default value is
            10000.

    -NLR_length_min/-NLR_min
            Set the minimum length for drawing NLR, the default value is
            10000.

    -coverage_rate_min/-coverage_min/-cov_min
            Set the minimum coverage (%) for drawing synteny fragments, the
            default value is 90.

    -icon   Whether to output the main image icon, value is 1/0, default
            true(1).

    -proportion1/-p1
            Set the chromosome window size of the one-to-one double/triple
            sequence alignment chart, the default value is 25000.

    -proportion2/-p2
            Set the chromosome window size of the multiple-to-multiple
            double/triple sequence alignment chart, the default value is
            four times the value of -proportion1/-p1, that is, the default
            is 100000.

    -targetgene_genome1/-targetgene1/-gene1
            Input the target gene file of genome 1, the file uses the bed
            format, the target gene can be any gene that the user studies.

    -targetgene_genome2/-targetgene2/-gene2
            Input the target gene file of genome 2, the file uses the bed
            format, the target gene can be any gene that the user studies.

    -targetgene_genome3/-targetgene3/-gene3
            Input the target gene file of genome 3, the file uses the bed
            format, the target gene can be any gene that the user studies.

    -targetgene_name/-targetgene
            Set the name of the target gene, default output as "Target
            Gene".

    -genomenumber/-gn/-n
            Comparison mode, double/triple sequence comparison, this
            parameter is optional, the parameter value can be set to 2/3,the
            value is determined by the number of input genomes by default,
            that is, when two genomes are input, the value is 2, and when
            three genomes are input, the value is 3.

    -chromosomenumber1/-n1
            Set the number of chromosomes in genome1,this parameter is
            optional, and its value is determined by the number of
            chromosomes in the fasta file of the input genome1 by default,
            or it can be set by users.eg. 12

    -chromosomenumber2/-n2
            Set the number of chromosomes in genome2,this parameter is
            optional, and its value is determined by the number of
            chromosomes in the fasta file of the input genome2 by default,
            or it can be set by users.eg. 12

    -chromosomenumber3/-n3
            Set the number of chromosomes in genome3,this parameter is
            optional, and its value is determined by the number of
            chromosomes in the fasta file of the input genome3 by default,
            or it can be set by users.eg. 12

    -output1/-o1
            Set the name of output SVG format file1, default
            "GenomeSyn-main-1.svg".

    -output2/-o2
            Set the name of output SVG format file2, default
            "GenomeSyn-main-2.svg".

    -output3/-o3
            Set the name of output SVG format file3, default "GenomeSyn
            heatmap.svg".

    -output4/-o4
            Set the name of output SVG format file4, default "GenomeSyn
            identity.svg".

    -output5/-o5
            Set the name of output SVG format file5, default "GenomeSyn
            coverage.svg".

    -output6/-o6
            Set the name of output SVG format file6, default "GenomeSyn
            heatmap2.svg".

    -headline_identity/-headline1
            Set the title of illustration1, default output is "GenomeSyn
            identity".

    -headline_coverage/-headline2
            Set the title of illustration2, default output is "GenomeSyn
            coverage".

    -headline_heatmap/-headline3
            Set the title of illustration3, default output is "GenomeSyn
            heatmap".

    -genome1_color/-color1/-c1
            Set the drawing color of the chromosome in genome1,default color
            is LightBlue (#3979BC), recommended to input in hexadecimal
            color code or RGB code, eg. "#3979BC"/"rgb(57,121,188)".

    -genome2_color/-color2/-c2
            Set the drawing color of the chromosome in genome2,default color
            is Green(#499272), recommended to input in hexadecimal color
            code or RGB code, eg. "#499272"/"rgb(73,146,114)".

    -genome3_color/-color3/-c3
            Set the drawing color of the chromosome in genome3, default
            color is DarkBlue(#447784), recommended to input in hexadecimal
            color code or RGB code, eg. "#447784"/"rgb(68,119,132)".

    -synteny_color/-color4/-c4
            Set the drawing color of the synteny blocks, default color is
            LightGray(#DFDFE1), recommended to input in hexadecimal color
            code or RGB code, eg. "#DFDFE1"/"rgb(223,223,225)".

    -inversion_color/-color5/-c5
            Set the drawing color of the inversion blocks, default color is
            DarkOrange(#E56C1A), recommended to input in hexadecimal color
            code or RGB code, eg. "#E56C1A"/"rgb(229,108,26)".

    -translocation_color/-color6/-c6
            Set the drawing color of the translocation blocks, default color
            is Saffron(#EFCF48), recommended to input in hexadecimal color
            code or RGB code, eg. "#EFCF48"/"rgb(239,207,72)".

    -centromere_color/-color7/-c7
            Set the drawing color of the centromere blocks, default color is
            Orange(#E4993F), recommended to input in hexadecimal color code
            or RGB code, eg. "#E4993F"/"rgb(228,153,63)".

    -telomere_color/-color8/-c8
            Set the drawing color of the telomere blocks, default color is
            Purple(#441680), recommended to input in hexadecimal color code
            or RGB code, eg. "#441680"/"rgb(68,22,128)".

    -PAV_color/-color9/-c9
            Set the drawing color of PAVs, default color is
            LightYellow(#F9F067), recommended to input in hexadecimal color
            code or RGB code, eg. "#F9F067"/"rgb(249,240,103)".

    -NLR_color/-color10/-c10
            Set the drawing color of the NLRs, default color is
            Cyan(#00FFFF), recommended to input in hexadecimal color code or
            RGB code, eg. "#00FFFF"/"rgb(0,255,255)".

    -SNP_color/-color11/-c11
            Set the drawing color of the SNPs, default color is
            DoderBlue(#1E90FF), recommended to input in hexadecimal color
            code or RGB code, eg. "#1E90FF"/"rgb(30,144,255)".

    -TE_color/-color12/-c12
            Set the drawing color of the TEs, default color is
            DoderBlue(#1E90FF), recommended to input in hexadecimal color
            code or RGB code, eg. "#1E90FF"/"rgb(30,144,255)". TE has two
            forms of display, when it is displayed as a histogram only, the
            drawing color of TE can be adjusted.

    -genedensity_color/-color13/-c13
            Set the drawing color of the gene density, default color is
            DarkGreen(#368F5C), recommended to input in hexadecimal color
            code or RGB code, eg. "#368F5C"/"rgb(54,143,92)".

    -targetgene_color/-color14/-c14
            Set the drawing color of the target gene, default color is
            Crimson(#DC143C), recommended to input in hexadecimal color code
            or RGB code, eg. "#DC143C"/"rgb(220,20,60)".

    -curveto/-curve
            Draw synteny blocks with curve or straight line, value is 1/0,
            default true(1), that is default output as a curve.

    -highlightinversion/-highlight
            Highlight inversion, value is 1/0, default true(1), that is
            default the inverted information is highlighted.

    -help/-h/?
            Print a brief help message and exits.

    -man    Prints the manual page and exits.
[user@cn3144 ~]$ Transform  -h
Usage:
    Transform [options]

    example:

        a) Transform --PAF example.PAF

        b) Transform --GFF3 example.gff3

        c) Transform -1 rice_MH63.fa -2 rice_R498.fasta --SNP MH63vsR498.delta.filter.snps

        d) Transform -1 rice_MH63.fa -2 rice_R498.fasta --SNP MH63vsR498.delta.filter.snps -r

        e) Transform -1 rice_MH63.fa -2 rice_R498.fasta --SNP MH63vsR498.delta.filter.snps -rq

        f) Transform -1 rice_MH63.fa -2 rice_R498.fasta --SNP MH63vsR498.delta.filter.snps --noquery -r

        g) Transform --PAV MH63vsR498.delta.filter.qdiff -o MH63vsR498

Options:
    --PAFtoCOORDS/--PAF
            Enter a .PAF format file to generate a .coords format file.

    --PAV   Enter a .qdiff format file to generate a .bed format file.

    --GFF3toBED/--GFFtoBED/--GFF3/--GFF
            Enter a .gff3 format file to generate a .bed format file.
...

    --SNP   Enter a .snps format file to generate a .bed format file.

    --genomeSeq1/-1
            Input the genome1 fasta file to obtain the length of each
            chromosome in the genome1(ie reference genome).

    --genomeSeq2/-2
            Input the genome2 fasta file to obtain the length of each
            chromosome in the genome2(ie query genome).

    --query/-q/--noquery
...

    --help/-h
            Print a brief help message and exits.

    --man   Prints the manual page and exits.


End the interactive session:
[user@cn3111 ~]$ exit
salloc.exe: Relinquishing job allocation 46116226
[user@biowulf ~]$