TEMPLATE

GenomeSyn: constructing and visualizing genome synteny

Quick Links

GenomeSyn, implemented in Perl, is a bioinformatics tool for visualizing genome synteny and structural variations. It provides genome synteny visualization for two or three genomic sequences. Additionally, genome annotation can be uploaded to visualize various SVs among different genomes.

References:

Zu-Wen Zhou, Zhi-Guang Yu, Xiao-Ming Huang, Jin-Shen Liu, Yi-Xiong Guo, Ling-Ling Chen, Jia-Ming Song
GenomeSyn: a bioinformatics tool for visualizing genome synteny and structural variations.
Journal of Genetics and Genomics Volume 49, Issue 12 , December 2022, Pages 1174-1176

Documentation

GenomeSyn Github page

Important Notes

Module Name: GenomeSyn (see the modules page for more information)
Unusual environment variables set
- GENOMESYN_HOME installation directory
- GENOMESYN_BIN executable directory
- GENOMESYN_DATA sample data directory

Interactive job

Interactive jobs should be used for debugging, graphics, or applications that cannot be run as batch jobs.

Allocate an interactive session and run the program. Sample session:

[user@biowulf]$ sinteractive
[user@cn3144 ~]$ module load
[+] Loading singularity 4.0.3 on cn3144
[+] Loading GenomeSyn 20240614
[user@cn3144 ~]$ GenomeSyn -h
Usage:
GenomeSyn [options]

example:

a) GenomeSyn -g1 ../data/rice_MH63.fa -g2 ../data/rice_ZS97.fa

b) GenomeSyn -t 3 -g1 ../data/rice_MH63.fa -g2 ../data/rice_ZS97.fa -cf1 ../data/rice_MH63vsZS97.delta.filter.coords

c) GenomeSyn -t 3 -g1 ../data/rice_MH63.fa -g2 ../data/rice_ZS97.fa -cf1 ../data/rice_MH63vsZS97.delta.filter.coords -cen1 ../data/rice_MH63_centromere.bed -cen2 ../data/rice_ZS97_centromere.bed -tel1 ../data/rice_MH63_telomere.bed -tel2 ../data/rice_ZS97_telomere.bed -TE1 ../data/rice_MH63_repeat.bed -TE2 ../data/rice_ZS97_repeat.bed -PAV1 ../data/rice_MH63_PAV.bed -PAV2 ../data/rice_ZS97_PAV.bed -NLR1 ../data/rice_MH63_NLR.bed -NLR2 ../data/rice_ZS97_NLR.bed -r MH63 -q ZS97 -GD1 ../data/rice_MH63_nonTEgene.gff3 -GD2 ../data/rice_ZS97_nonTEgene.gff3 -GC1 ../data/rice_MH63_GC_10000.bed -GC2 ../data/rice_ZS97_GC_10000.bed -GC_win 100000 -TE_min 40

d) GenomeSyn -t 3 -n3 12 -g1 ../data/rice_MH63.fa -g2 ../data/rice_ZS97.fa -g3 ../data/rice_R498.fasta -cf1 ../data/rice_MH63vsZS97.delta.filter.coords -cf2 ../data/rice_MH63vsR498.delta.filter.coords -cen1 ../data/rice_MH63_centromere.bed -cen2 ../data/rice_ZS97_centromere.bed -cen3 ../data/rice_R498_centromere.bed -tel1 ../data/rice_MH63_telomere.bed -tel2 ../data/rice_ZS97_telomere.bed -tel3 ../data/rice_R498_telomere.bed -TE2 ../data/rice_ZS97_repeat.bed -PAV1 ../data/rice_MH63_PAV.bed -PAV2 ../data/rice_ZS97_PAV.bed -NLR1 ../data/rice_MH63_NLR.bed -NLR2 ../data/rice_ZS97_NLR.bed -r MH63 -q1 ZS97 -q2 R498 -GD1 ../data/rice_MH63_nonTEgene.gff3 -GD2 ../data/rice_ZS97_nonTEgene.gff3 -GD3 ../data/rice_R498_IGDBv3_coreset.gff -GC2 ../data/rice_ZS97_GC_10000.bed -GC_win 100000 -TE_min 40

Options:
-aligntype/-at/-t
The output mode is divided into four output modes, the parameter
value is 1/2/3/4, and the default value is 1; When the value is
1, only the one-to-one double/triple sequence comparison chart
will be output; when the value is 2, only the
multiple-to-multiple double/triple sequence comparison chart
will be output; when the value is 3, it will output
simultaneously in 1, 2 mode Two comparison graphs of; when the
value is 4, in addition to the first two comparison graphs, some
statistical sub-graphs will be generated, such as consistency
heat map, consistency histogram, and coverage histogram.

-genomeSeq1/-g1
Input the genome1 fasta file to obtain the length of each
chromosome in the genome1(ie reference genome).

-genomeSeq2/-g2
Input the genome2 fasta file to obtain the length of each
chromosome in the genome2(ie query genome).

-genomeSeq3/-g3
Input the genome3 fasta file to obtain the length of each
chromosome in the genome3(ie query genome2).

-comparison_file/-comparison_file1/-cf/-cf1
Input the coordinate file for comparing genome1 and genome2, if
there is no coordinate file, call mummer to compare genome1 and
genome2 to generate this coordinate file, such as
ReferencevsQuery1.delta.filter.coords.

-comparison_file2/-cf2
Input the coordinate file for comparing genome1 and genome3, if
there is no coordinate file, call mummer to compare genome1 and
genome3 to generate this coordinate file, such as
ReferencevsQuery2.delta.filter.coords.

-comparison_file3/-cf3
Input the coordinate file for comparing genome2 and genome3, if
there is no coordinate file, call mummer to compare genome2 and
genome3 to generate this coordinate file, such as
Query1vsQuery2.delta.filter.coords.

-SVG_PDF/-pdf
Format transition, generate the corresponding PDF format file
with the SVG format file as the original, value is 1/0, default
true(1), that is default output SVG format and PDF format files
are output at the same time.

-sort -sort��ģʽ"match""reference_length_match",Ĭϲ
"match"ģʽܣreferenceȾɫŶqueryȾɫ
ƥ"reference_length_match"ģʽܣȰȾɫ峤
ȴӳ̶referenceȾɫŽٰƥquery
ȾɫŽ

-chromosomename/-cn
Chromosome numeration setting, the value is 1/0, and the default
value is false (0); when the value is 0, the unified chromosome
numeration(Chromosome numeration for reference genome) will be
displayed on the output map, and when the value is 1, the actual
chromosome numeration in the comparison file will be displayed
on the output map.

-referencename/-reference/-ref/-r
Set the name of the genome1, default output is "reference".eg.
MH63

-queryname/-queryname1/-query/-query1/-q/-q1
Set the name of the genome2, default output is
"query"/"query1".eg. ZS97

-queryname2/-query2/-q2
Set the name of the genome3, default output is "query2".eg. R498

-centromere_genome1/-centromere1/-cen1
Input the centromere position file of genome1, the file uses the
bed (Browser Extensible Data) format, and draw centromeres on
each chromosome of genome1.

-centromere_genome2/-centromere2/-cen2
Input the centromere position file of genome2, the file uses the
bed (Browser Extensible Data) format, and draw centromeres on
each chromosome of genome2.

-centromere_genome3/-centromere3/-cen3
Input the centromere position file of genome3, the file uses the
bed (Browser Extensible Data) format, and draw centromeres on
each chromosome of genome3.

-telomere_genome1/-telomere1/-tel1
Input the telomere position file of genome1, the file uses the
bed format, and draw telomere on each chromosome of genome1.

-telomere_genome2/-telomere2/-tel2
Input the telomere position file of genome2, the file uses the
bed format, and draw telomere on each chromosome of genome2.

-telomere_genome3/-telomere3/-tel3
Input the telomere position file of genome3, the file uses the
bed format, and draw telomere on each chromosome of genome3.

-snp_genome1/-snp1
Input the SNP file of genome1, which uses the bed format to map
the SNP distribution of genome1.

-snp_genome2/-snp2
Input the SNP file of genome2, which uses the bed format to map
the SNP distribution of genome2.

-snp_genome3/-snp3
Input the SNP file of genome3, which uses the bed format to map
the SNP distribution of genome3.

-snp_thresholds/-snp_max
SNP threshold setting, that is, setting the upper limit of SNP
statistics, the default value is 2000.

-TE_genome1/-TE1
Input the TE file of genome1, which uses the bed format to map
the TE distribution of genome1.

-TE_genome2/-TE2
Input the TE file of genome2, which uses the bed format to map
the TE distribution of genome2.

-TE_genome3/-TE3
Input the TE file of genome3, which uses the bed format to map
the TE distribution of genome3.

-TE_thresholds/-TE_min
Set the TE threshold that set the lower limit of TE statistics,
default the integer value of the smallest TE proportion in the
TE file used is the lower limit, for example, the minimum TE is
11%, the icon in the lower right corner shows a scale of
10%-100%, the minimum TE is 28%, and the icon in the lower right
corner shows a scale of 20%-100%; if the user inputs the lower
limit of TE, it will be output according to the lower limit of
TE input by the user, and the value is 0-100. For example: input
"-TE_min 50", then a 50%-100% TE statistical graph will be
drawn, TE has two display forms, but only when TE is displayed
in a histogram, the lower limit of TE can be adjusted.

-GC_genome1/-GC_content1/-GC1
Input the bed format file of the GC content of genome1 to plot
the distribution of the GC content of genome1.

-GC_genome2/-GC_content2/-GC2
Input the bed format file of the GC content of genome2 to plot
the distribution of the GC content of genome2.

-GC_genome3/-GC_content3/-GC3
Input the bed format file of the GC content of genome2 to plot
the distribution of the GC content of genome3.

-PAV_genome1/-PAV1
Input the PAV file of genome1, which uses the bed format to map
the PAV distribution of genome1.

-PAV_genome2/-PAV2
Input the PAV file of genome2, which uses the bed format to map
the PAV distribution of genome2.

-PAV_genome3/-PAV3
Input the PAV file of genome3, which uses the bed format to map
the PAV distribution of genome3.

-NLR_genome1/-NLR1
Input the NLR file of genome1, which uses the bed format to map
the NLR distribution of genome1.

-NLR_genome2/-NLR2
Input the NLR file of genome2, which uses the bed format to map
the NLR distribution of genome2.

-NLR_genome3/-NLR3
Input the NLR file of genome3, which uses the bed format to map
the NLR distribution of genome3.

-gene_density_genome1/-GD1
Input the annotation file of genome1, which uses the gff3 format
to map the gene density distribution of genome1.

-gene_density_genome2/-GD2
Input the annotation file of genome2, which uses the gff3 format
to map the gene density distribution of genome2.

-gene_density_genome3/-GD3
Input the annotation file of genome3, which uses the gff3 format
to map the gene density distribution of genome3.

-GeneDensity_Window/-GD_win
Set the window size for statistical gene density, this parameter
is a required parameter when the gene density is counted in the
annotation file of the input gene, the value can be set to
100000.

-SNP_Window/-SNP_win
Set the window size for statistical SNPs, this parameter is
optional, its value is determined by default according to the
window size in the bed file of the input SNP of genome1.

-TE_Window/-TE_win
Set the window size for statistical TEs, this parameter is
optional, its value is determined by default according to the
window size in the bed file of the input TEs of genome1.

-GC_Content_Window/-GC_win
Set the window size for statistical GC content, this parameter
is optional, its value is determined by default according to the
window size in the bed file of the input GC content of genome1.

-synteny_length_min/-synteny_min/-syn_min
Set the minimum length for drawing synteny fragments, the
default value is 10000.

-inversion_length_min/-inversion_min/-inv_min
Set the minimum length for drawing inversion fragments, the
default value is 10000.

-PAV_length_min/-PAV_min
Set the minimum length for drawing PAV, the default value is
10000.

-NLR_length_min/-NLR_min
Set the minimum length for drawing NLR, the default value is
10000.

-coverage_rate_min/-coverage_min/-cov_min
Set the minimum coverage (%) for drawing synteny fragments, the
default value is 90.

-icon Whether to output the main image icon, value is 1/0, default
true(1).

-proportion1/-p1
Set the chromosome window size of the one-to-one double/triple
sequence alignment chart, the default value is 25000.

-proportion2/-p2
Set the chromosome window size of the multiple-to-multiple
double/triple sequence alignment chart, the default value is
four times the value of -proportion1/-p1, that is, the default
is 100000.

-targetgene_genome1/-targetgene1/-gene1
Input the target gene file of genome 1, the file uses the bed
format, the target gene can be any gene that the user studies.

-targetgene_genome2/-targetgene2/-gene2
Input the target gene file of genome 2, the file uses the bed
format, the target gene can be any gene that the user studies.

-targetgene_genome3/-targetgene3/-gene3
Input the target gene file of genome 3, the file uses the bed
format, the target gene can be any gene that the user studies.

-targetgene_name/-targetgene
Set the name of the target gene, default output as "Target
Gene".

-genomenumber/-gn/-n
Comparison mode, double/triple sequence comparison, this
parameter is optional, the parameter value can be set to 2/3,the
value is determined by the number of input genomes by default,
that is, when two genomes are input, the value is 2, and when
three genomes are input, the value is 3.

-chromosomenumber1/-n1
Set the number of chromosomes in genome1,this parameter is
optional, and its value is determined by the number of
chromosomes in the fasta file of the input genome1 by default,
or it can be set by users.eg. 12

-chromosomenumber2/-n2
Set the number of chromosomes in genome2,this parameter is
optional, and its value is determined by the number of
chromosomes in the fasta file of the input genome2 by default,
or it can be set by users.eg. 12

-chromosomenumber3/-n3
Set the number of chromosomes in genome3,this parameter is
optional, and its value is determined by the number of
chromosomes in the fasta file of the input genome3 by default,
or it can be set by users.eg. 12

-output1/-o1
Set the name of output SVG format file1, default
"GenomeSyn-main-1.svg".

-output2/-o2
Set the name of output SVG format file2, default
"GenomeSyn-main-2.svg".

-output3/-o3
Set the name of output SVG format file3, default "GenomeSyn
heatmap.svg".

-output4/-o4
Set the name of output SVG format file4, default "GenomeSyn
identity.svg".

-output5/-o5
Set the name of output SVG format file5, default "GenomeSyn
coverage.svg".

-output6/-o6
Set the name of output SVG format file6, default "GenomeSyn
heatmap2.svg".

-headline_identity/-headline1
Set the title of illustration1, default output is "GenomeSyn
identity".

-headline_coverage/-headline2
Set the title of illustration2, default output is "GenomeSyn
coverage".

-headline_heatmap/-headline3
Set the title of illustration3, default output is "GenomeSyn
heatmap".

-genome1_color/-color1/-c1
Set the drawing color of the chromosome in genome1,default color
is LightBlue (#3979BC), recommended to input in hexadecimal
color code or RGB code, eg. "#3979BC"/"rgb(57,121,188)".

-genome2_color/-color2/-c2
Set the drawing color of the chromosome in genome2,default color
is Green(#499272), recommended to input in hexadecimal color
code or RGB code, eg. "#499272"/"rgb(73,146,114)".

-genome3_color/-color3/-c3
Set the drawing color of the chromosome in genome3, default
color is DarkBlue(#447784), recommended to input in hexadecimal
color code or RGB code, eg. "#447784"/"rgb(68,119,132)".

-synteny_color/-color4/-c4
Set the drawing color of the synteny blocks, default color is
LightGray(#DFDFE1), recommended to input in hexadecimal color
code or RGB code, eg. "#DFDFE1"/"rgb(223,223,225)".

-inversion_color/-color5/-c5
Set the drawing color of the inversion blocks, default color is
DarkOrange(#E56C1A), recommended to input in hexadecimal color
code or RGB code, eg. "#E56C1A"/"rgb(229,108,26)".

-translocation_color/-color6/-c6
Set the drawing color of the translocation blocks, default color
is Saffron(#EFCF48), recommended to input in hexadecimal color
code or RGB code, eg. "#EFCF48"/"rgb(239,207,72)".

-centromere_color/-color7/-c7
Set the drawing color of the centromere blocks, default color is
Orange(#E4993F), recommended to input in hexadecimal color code
or RGB code, eg. "#E4993F"/"rgb(228,153,63)".

-telomere_color/-color8/-c8
Set the drawing color of the telomere blocks, default color is
Purple(#441680), recommended to input in hexadecimal color code
or RGB code, eg. "#441680"/"rgb(68,22,128)".

-PAV_color/-color9/-c9
Set the drawing color of PAVs, default color is
LightYellow(#F9F067), recommended to input in hexadecimal color
code or RGB code, eg. "#F9F067"/"rgb(249,240,103)".

-NLR_color/-color10/-c10
Set the drawing color of the NLRs, default color is
Cyan(#00FFFF), recommended to input in hexadecimal color code or
RGB code, eg. "#00FFFF"/"rgb(0,255,255)".

-SNP_color/-color11/-c11
Set the drawing color of the SNPs, default color is
DoderBlue(#1E90FF), recommended to input in hexadecimal color
code or RGB code, eg. "#1E90FF"/"rgb(30,144,255)".

-TE_color/-color12/-c12
Set the drawing color of the TEs, default color is
DoderBlue(#1E90FF), recommended to input in hexadecimal color
code or RGB code, eg. "#1E90FF"/"rgb(30,144,255)". TE has two
forms of display, when it is displayed as a histogram only, the
drawing color of TE can be adjusted.

-genedensity_color/-color13/-c13
Set the drawing color of the gene density, default color is
DarkGreen(#368F5C), recommended to input in hexadecimal color
code or RGB code, eg. "#368F5C"/"rgb(54,143,92)".

-targetgene_color/-color14/-c14
Set the drawing color of the target gene, default color is
Crimson(#DC143C), recommended to input in hexadecimal color code
or RGB code, eg. "#DC143C"/"rgb(220,20,60)".

-curveto/-curve
Draw synteny blocks with curve or straight line, value is 1/0,
default true(1), that is default output as a curve.

-highlightinversion/-highlight
Highlight inversion, value is 1/0, default true(1), that is
default the inverted information is highlighted.

-help/-h/?
Print a brief help message and exits.

-man Prints the manual page and exits.
[user@cn3144 ~]$ Transform -h
Usage:
Transform [options]

example:

a) Transform --PAF example.PAF

b) Transform --GFF3 example.gff3

c) Transform -1 rice_MH63.fa -2 rice_R498.fasta --SNP MH63vsR498.delta.filter.snps

d) Transform -1 rice_MH63.fa -2 rice_R498.fasta --SNP MH63vsR498.delta.filter.snps -r

e) Transform -1 rice_MH63.fa -2 rice_R498.fasta --SNP MH63vsR498.delta.filter.snps -rq

f) Transform -1 rice_MH63.fa -2 rice_R498.fasta --SNP MH63vsR498.delta.filter.snps --noquery -r

g) Transform --PAV MH63vsR498.delta.filter.qdiff -o MH63vsR498

Options:
--PAFtoCOORDS/--PAF
Enter a .PAF format file to generate a .coords format file.

--PAV Enter a .qdiff format file to generate a .bed format file.

--GFF3toBED/--GFFtoBED/--GFF3/--GFF
Enter a .gff3 format file to generate a .bed format file.
...

--SNP Enter a .snps format file to generate a .bed format file.

--genomeSeq1/-1
Input the genome1 fasta file to obtain the length of each
chromosome in the genome1(ie reference genome).

--genomeSeq2/-2
Input the genome2 fasta file to obtain the length of each
chromosome in the genome2(ie query genome).

--query/-q/--noquery
...

--help/-h
Print a brief help message and exits.

--man Prints the manual page and exits.

End the interactive session:

[user@cn3111 ~]$ exit
salloc.exe: Relinquishing job allocation 46116226
[user@biowulf ~]$