NetOglyc produces neural network predictions of mucin type GalNAc O-glycosylation sites in mammalian proteins.
>NM_001008540.2 Homo sapiens C-X-C motifBAD:
> NM_001008540.2 Homo sapiens C-X-C motif
>NM_000758.4 Homo sapiens colony stimulating factor 2 (CSF2), mRNABAD:
>NM_000758.4 Homo sapiens colony stimulating factor 2 | other stuff
Allocate an interactive session and run the program.
Sample session (user input in bold):
[user@biowulf]$ sinteractive --gres=lscratch:10 salloc.exe: Pending job allocation 46116226 salloc.exe: job 46116226 queued and waiting for resources salloc.exe: job 46116226 has been allocated resources salloc.exe: Granted job allocation 46116226 salloc.exe: Waiting for resource configuration salloc.exe: Nodes cn3144 are ready for job [user@cn3144 ~]$ module load netOglyc [user@cn3144 ~]$ netOglyc $NETOGLYC_EXAMPLES/GLP_MACFU.fsa ##gff-version 2 ##source-version NetOGlyc 4.0.0.11 ##date 23-9-6 ##Type Protein #seqname source feature start end score strand frame comment GLP_MACFU netOGlyc-4.0.0.11 CARBOHYD 1 1 0.680709 . . #POSITIVE GLP_MACFU netOGlyc-4.0.0.11 CARBOHYD 2 2 0.790723 . . #POSITIVE GLP_MACFU netOGlyc-4.0.0.11 CARBOHYD 3 3 0.848504 . . #POSITIVE GLP_MACFU netOGlyc-4.0.0.11 CARBOHYD 4 4 0.707939 . . #POSITIVE... ...
If something goes wrong, temporary files (including logs) are written to and compressed in /lscratch/$SLURM_JOB_ID:
[user@cn3144 ~]$ ls /lscratch/$SLURM_JOB_ID netOGlyc-1474660.tar.gz netOGlyc-1474970.tar.gz
Create a batch input file (e.g. netOglyc.sh). For example:
#!/bin/bash set -e module load netOglyc netOglyc my_fasta_file.fasta > my_fasta_file.out
Submit this job using the Slurm sbatch command.
sbatch --gres=lscratch:10 netOglyc.sh
Create a swarmfile (e.g. netOglyc.swarm). For example:
netOglyc < 1.fasta > 1.out netOglyc < 2.fasta > 2.out netOglyc < 3.fasta > 3.out netOglyc < 4.fasta > 4.out
Submit this job using the swarm command.
swarm -f netOglyc.swarm --module netOglyc --gres lscratch:10where
--module netOglyc | Loads the netOglyc module for each subjob in the swarm |
--gres lscratch:10 | Allocates 10 GB of /lscratch |
Very large, multi-sequence fasta files are not handled well with netOglyc. It is safer to keep the fasta input small.
Here is a perl script that will take a multi-sequence fasta file, break it up into single fasta files, and run netOglyc on each. The output is appended to a single output file.
#!/usr/bin/perl use File::Temp qw/ tempfile /; my (undef, $infile) = tempfile(); my $ii=0; my $i=0; my $order=""; my $tim=0; my $com="netOglyc $infile.fsa >& $infile.out"; my $nam=""; my $seq=""; my $INPUT = $ARGV[0]; my $OUTPUT = $ARGV[1]; open (CFGFILE, $ARGV[0]); while (){ chomp; my $line = $_; if ($line =~ />/) { $i++; $seq[$i]=$line."\n"; $nam[$i]=$line; } else { $seq[$i] = $seq[$i].$line."\n"; } } unlink $OUTPUT; while ($ii<$i){ $ii++; open OUTFILE, "> $infile.fsa"; $order=$seq[$ii]; print "Sent ".$order." ".$ii." of ".$i." to NetOglyc"."\n"."\n"; print OUTFILE $order; close OUTFILE; open OUTFILE, ">> $ARGV[1]"; $order=$nam[$ii]; print OUTFILE $order."\n"; close OUTFILE; unlink "$infile.out"; system ($com); open INFILE, "$infile.out"; open OUTFILE, ">> $ARGV[1]"; while ( ){ print OUTFILE $_; } print OUTFILE "\n"; close OUTFILE; } unlink "$infile.out"; unlink "$infile.fsa";
The script can be run like so:
perl multi.pl multi-fasta.fsa multi-fasta.out