Saturated BLAST User's Manual
Detect remote homologoues, maintain progein families, and hunt new genes
- Introduction
- What is Saturated BLAST
- What could be done with Saturated BLAST
- Reference
- Installation
- Obtain Saturated BLAST package
- System requirement
- Install required Perl packages
- Install local BLAST programs (optional)
- Configure Saturated BLAST
- Run Saturated BLAST
- Getting started
- Input query sequences
- Set parameters
- Set BLAST search options
- Send first search
- Repeat search loop
- View results
- Save results
- Restart
- Algorithm
- Multiple intermediate sequence search
- Saturated BLAST search
- Some terms
- Sturcture of Saturated BLAST
- Filter
- Seed selector
- Saturated BLAST alignment
- Cluster analysis
- Usage
- Interface
- Mouse operation
- File
- Edit
- Set parameters
- Set BLAST search
- Set seed
- Set break point
- Set smart filter
- Selection
- Advanced selection
- Display
- BLAST alignment
- Muitiple alignment
- Program LOG
- Pair alignment
- Cluster
- Help
- Action buttons
- FAQ
TOC
- What is Saturated BLAST
- What could be done with Saturated BLAST
- Reference
Just from the word "Saturated", it is not difficult to guess that
this program intents to search for as many sequences as possible in
database. Yes, and for this purpose, this program adopts
Intermediate Sequence Search (ISS) method.
ISS is a strategy for recognizing
distant homologues using transitive sequences. This
idea is when the similarity between two remotely homologous
sequences can not be detected by normal sequence
comparison, if there is an intermediate sequence with
significant alignment scores to both of them, their similarity
can still be established.
ISS and its extension, multiple ISS, which applies more than
one intermediate steps, have been proved to be sensitive
and practical. However, the brute-force search,
which repetitively runs database searching, is time-consuming
and hard to be done in an automated way.
Saturated BLAST is a software with graphic user interface,
it can perform the iterated multiple
intermediate sequence search more efficiently and automatically.
The Saturated BLAST package was developed under
LINUX system using Perl as the programming language.
Starting with a single query or a set of related sequences,
Saturated BLAST runs a BLAST search, organizes the output,
identifies representative as search seeds,
and then repetitively takes these new seeds as queries for
next generation of BLAST searches.
The friendly graphic user interface and the built-in BLAST
result parser, multiple alignment tools
and clustering algorithms provide an easy way to edit,
visualize, analyze, monitor and control the search.
Finding distant homologues is the primary usage of Saturaged BLAST.
It is also very good tool for maintaining large protein family,
and hunting new genes in genomics database.
It will save 90% time you do it by hands.
Please cite:
Weizhong Li, Frederic Pio, Krzysztof Pawlowski & Adam Godzik.
Saturated BLAST: An automated multiple intermediate sequence search
used to detect distant homology. Bioinformatics (2000) in press
TOC
- Obtain Saturated BLAST package
- System requirement
- Install required Perl packages
- Install local BLAST programs (optional)
- Configure Saturated BLAST
- Run Saturated BLAST
The Saturated BLAST package is distributed from
http://bioinformatics.burnham-inst.edu/xblast,
the official site of this package. Users can download a UNIX Gzipped
tar archive file in form of
xblast_1.00.tar.gz.
The Saturated BLAST was develped under LINUX operating system.
The authors have installed and used it on RedHat Linux distributions
6.0, 6.1 and 6.2. Since Saturated BLAST is implemented with Perl script
language, it is possible to install this package on other UNIX systems
supporting Perl.
So, UNIX and Perl are the basic requirements for installation of
Saturated BLAST. Because we use some third-party Perl packages (see below) in
Saturated BLAST, you may notice what version of Perl is needed by these
packages.
When looking at script of Saturated BLAST,
users may notice some lines like:
use Tk;
use LWP::UserAgent;
use LWP::Simple;
use HTTP::Request::Common;
We use two set of Perl packages in the Saturated BLAST program,
Perl/Tk is used for the Graphic User Interface,
and Libwww-perl is needed by the internet connection.
So, before you run Saturated BLAST, you should have them installed
on your computer.
Perl/Tk is a great toolkit to develop Graphic User Interface,
it is written by Nick Ing-Simmons. This package is available from either
Perl CPAN, or
author's site
I have installed Perl/Tk on several different computers with different
operating system. The installation has been very smooth. As described
in the 'INSTALL' of package, after you unpack the distribution,
you will have a new directory named Tk800.XXX.
Enter this directory, do the following:
perl Makefile.PL
make
make test
make install
The whole will take about 15 miniutes. There is a demo application -
demos/widget, if you can run this program well, you can run Saturated BLAST
well.
Libwww-perl is a general internet tool package for perl.
Since it is also a
bundled package, maybe you already have it, Please type following
commend under your UNIX shell:
perl -e 'use LWP::UserAgent'
If nothing appear (No error messages),
you already have it, and you can skip the installation.
Otherwise, do the following.
Download the libwww-perl from either
CPAN or
from its author's
site. Please read the instruction
carefully, libwww-perl need some other packages (
URI,
MIME-Base64,
HTML-Parser,
libnet, and
Digest::MD5 ). So, maybe you will have to install them,
But don't worry, this installations of them are very simple, you only need
repeatly download package and type
perl Makefile.PL
make
make test
make install
For several times.
My own expected time of installing Perl/Tk and libwww is less than 1.5 hours.
Saturated BLAST can run BLAST searches on remote NCBI BLAST server
and also local computer. It is flexible to have local copy of BLAST program
on your computer, so that you can build your own databases of insterests.
So we recommand you to install the BLAST program on your computer.
It is very simple. Open the
FTP site of NCBI BLAST,
and download the per-complied linux BLAST programs
executables/blast.linux.tar.Z.
Then unpack the compressed file into a directory such as "/data/ncbi"
(This dir should be same with the setting of Saturated BLAST, see next section),
make directories "/data/ncbi/bin" and "/data/ncbi/db",
and move the executables files (blastpgp, blastall and so on) into
"/data/ncbi/bin".
The FASTA format databases needed by BLAST are avaiable at
ftp://ftp.ncbi.nlm.nih.gov/blast/db.
You can also generated your own databases. Please refer to the related
documents within BLAST distribution on how to format BLAST databases.
Now, it is ready for the last step of the installation.
unpack the xblast_1.00.tar.gz by
gunzip < xblast_version_number.tar.gz | tar xvf -
and enter xblast subdirectory.
What need to be done is reconfiguring some local definitions in Perl scripts.
In the file
xblast.pl
when you meet the line like:
my $XBLAST_ROOT = "/usr/local/bin/xblast";
change this directory to your installation site.
In the file
blastruntool.pl
please change the directory
of your local BLAST. You will have lines like these:
my $blast_root = "/data/ncbi";
$ENV{"NCBI"} = $blast_root;
$ENV{"BLASTDB"} = "$blast_root/db";
$ENV{"BLASTMAT"} = "$blast_root/data";
Put the directory where you install Saturated BLAST into your search path,
then just type
xblast.pl, a nice window should appear on your
screen.
TOC
- Input query sequences
- Set parameters
- Set BLAST search options
- Send first search
- Repeat search loop
- View results
- Save results
- Restart
In this section, we demonstrate a quick tour working with Saturated BLAST,
Since this program has many adjustable parameters, and
they will be discussed in detail later,
we will skip most steps.
Saturated BLAST is window-based application, first let's have a look
how this program appears on screen. Through this graphic user interface,
it is easy to preform complicated jobs by simple mouse clicks.
To initialize a new Saturated BLAST search, go to menubar
File -> new,
a small window will popup. Simply supply a job name,
paste your query sequence in fasta format
or give the filename of sequence, and click
[Ok] button.
The main display window will refresh itself, and a new line
(your query sequence) will be added there.
The major important parameters are set through menubar
Set -> parameter.
You may open this window to have a look. Here, I suggest you just
use the default setting, because I will explain each these items in following
sections, so just press the
[Ok] button.
Follow menubar
Set -> BLAST search
to set program and options for BLAST.
The left panel of the window is a pre-defined BLAST job
corresponding to the some default BLAST options that can be set on
the right panel.
You may want to specify a BLAST job.
Just change the parameters, select current BLAST job at left panel by
a mouse click, and press
[Replace] button
so that the old job is replaced with yours.
Then press the
[Ok] button
Now you have already defined the query sequence, parameters and
details of BLAST search. On the top of main window,
there are several buttons used for sending BLAST searches. Press the
[next] button.
Wait for sometime, the window won't respond you
until the job is finished.
Then all the qulified hits in BLAST serarch will be tabluated in the window.
You can practice some basic mouse operation on the main window.
(a) When the pointer of mouse is moved over the descriptions of each line,
the text are redisplayed in red font.
Double clicking it will open a message window.
(b) To select or (unselect)
a sequence, click on the non-text area of that line, the selected
sequences are marked with light-yellow background.
(c) If there aleardy is netscape window on the display,
Double clicking on the gi number can open a netscape window connected with
NCBI entrzy database.
Some of the sequences has been marked with
red <x>,
indicating that these sequences have been selected as new queries
to be used in following searches. There is red triangle on
the left pointed to the first one of new queries. Now there are several ways
to continue Saturated BLAST search.
(a) press the
[next] button to run next query.
(b) set a break point at a query, and press
[cont] button to run searches from
current query to the query on the break point. To set break point,
first select that query, then pull down menubar to
Set -> break point on.
(c) press the
[cont] button to run all the queries one
by one until the search is saturated.
Any combination of above methods can be used to send BLAST jobs.
And perhaps you would like to check the new results before send new jobs.
There are several ways to view or analysis the results. These tools will be
discussed in following charpters, but you may try some simple function from
menubar
edit, set, select, display, and tools.
Most are self-explaining tools, so that you can practice without
reading further manual.
Before sending each job, Saturated BLAST save all the results and
all the parameters into a restart file : Your_Job_Id.Restart.
The orginal usage of this file was to recover everything
if the program crashes.
And then I found it is just the ouput file of Saturated BLAST.
You can save it from menubar
save .
Saturated BLAST can also save results in other formats:
plain FASTA, HTML, a tab-delimited table, and plain text.
It is easy to view, edit, and analysis the result in third-party
softwares such as ClustalX.
Since the Restart file restores all the results and setting of parameters
of a Saturated BLAST search. The search can be "Restarted" by importing
this file using
file -> open .
TOC
- Multiple intermediate sequence search
- Saturated BLAST search
- Some terms
- Sturcture of Saturated BLAST
- Filter
- Seed selector
- Saturated BLAST alignment
- Cluster analysis
The basis of Saturated BLAST is the so called intermediate sequence search
(ISS), Two proteins, let's call them query (Q) and target (T),
can have similar structure and biological function,
but have sequences sufficiently different that traditional protein sequence
comparison algorithms do not identify their relationship. But if there is
a intermediate sequences (I), and I is similar to both Q and T in sequence,
the homology between Q and T can be established. The connection can be
written in "Q-I-T".
ISS is very sensitive method to recognize remote homology.
The natural extension of ISS, the multiple intermediate sequence search
(MISS), which makes connection of
"Q-I1-I2---T", is more powerful than
simple ISS. Saturated BLAST uses MISS strategy to search database for
remote homologous sequences.
The starting point of Saturated BLAST is a single sequence
(but you can input multiple sequences),
With this query, Saturated BLAST run a BLAST search and parses the output.
For every hit sequence, if this sequence is qulified to the user defined
standard, it is added into the result database. At the same time,
some of these representatives of hit sequences are marked as
new BLAST queries.
Then, the program take next query run BLAST search, parse output,
filter result and select new queries as before. The program repeat this
search loop until no new sequences be found or pre-defined criteria are met.
In this manner, the program can dig out as many as possiable related
homologous sequences in the database. and it is why the program is called
Saturated BLAST.
Here, we define some terms used in Saturated BLAST.
- Seed: sequences found in database search
and used as queries in later searches.
- Parent: A seed is the parent of all the
sequences found with it.
- Level: The sequences input by user
has level of 0, its children have level of 1,
and its children's children have level
of 2. In other words, the level of a protein is the intermediate step
from top to itself.
- Break point: A seed can be marked
as a break point, where the Saturated BLAST search will stop and wait
for user's input.
- Hit no: The the number of parents
for a sequence. Different seeds can often
find same sequence, so a sequence may have more than one parents,
althrough Saturated BLAST only display the first one.
All the sequences found in Saturated BLAST search and the sequences input
by user are stored and maintained in a main database.
The BLAST search and the sequences are governed by several tools
including seed selector, BLAST result parser and filter. There are also
some other tools can be used to visualize and analyse the result such as
alignment builder and cluster function.
There are four kind of filters designed to confine a MISS to a
desired direction and to provent it from diverging.
They are redundancy filter, low significance filter,
keywords filter, and smart filter.
- Redundancy filter. It is used to prevent the addition of
redundant sequences.
Different queries can often find same sequence fragment.
For every new sequence, if there is another sequence with same ID,
and if the sequence overlap between them is bigger than a threshold,
this sequence is removed.
- Low significance filter. It is used to remove the sequences
with bad expect values, low sequence identities or short lengthes.
- Keywords filter. It is used to remove or keep sequences according
to some keyword patterns. The pattern is compared to the annotation
from BLAST search.
- Smart filter. The program can remember the sequcnes deleted by user,
and put the IDs of them in a list. When this function is enabled,
the sequences whose ID can be found in the list are deleted.
The output of BLAST can be very redundant.
Searches with redundant queries will not gain any new information
and will waste computer time and resources. After each new BLAST search,
all the filtered sequences are checked for significence
in terms of expect value, sequence identity and sequence length.
And significent sequences are
clustered into sub-groups according to sequence identity.
Saturated BLAST select the longest sequence as seed from each cluster.
Saturated BLAST provides four kinds of alignments,
which are derived directly or indirectly from the BLAST search output.
- Multiple alignment of single BLAST search.
Since the alignments between all the hit sequence with query are
printed in the single BLAST output, the multiple alignment of them can
be directly derived.
- Multiple alignment of MISS search chain. The alignment of
individual sequence, the orginal query, and all the intermediate sequence
between them can be built by recalling the intermediate steps.
- Alignment of any two sequences. If two sequences have same ancestor,
the alignment between them can be calculated by aligning them to the ancestor.
Otherwise, they can be aligned by the built-in dynamic program tool.
- Multiple alignment of any group of sequences.
This alignment can be drived if they have same ancestor as before. But if
the protein family is very huge, the alignment could be terriable.
The seed selector can cluster the sequences, but it only take the
sequences with same parent. And in most cases, it is good enough for seed
selection. However, when you have hundreds or more sequences
in your result database, you may need a tool to cluster all the
sequences of a selection from them.
Saturated BLAST uses a standard average linkage cluster method, so
a all-against-all similarity matrix is calculated using the
pairwise alignments available within the Saturated BLAST. There are two
options to form the similarity matrix, the normalized alignment score S
and sequence identity.
Given the pairwise alignment, the S is calculated as
S = s - [ln(Kmn)/ lambda ]
where s is the raw score, and n and m are the lengths of the sequences.
for the default BLAST matrix blosum62, lambda is 0.216 and K is 0.014.
And when sequence identity is used, it is the percentage of identical
residues of the shorter sequence of the alignment.
TOC
- Interface
- Mouse operation
- File
- Edit
- Set parameter
- Set BLAST search
- Set seed
- Set break point
- Set smart filter
- Selection
- Advanced selection
- Display
- BLAST alignment
- Muitiple alignment
- Program LOG
- Pair alignment
- Cluster
- Help
- Action button
If you have already run Saturated BLAST, you know the graphic user interface.
The results are tabulated in a main display window, and contents
of the columns appearing as the head table from left to right are:
- no, a serial number assigned to each
sequence.
- gi, the genebank gi identifier.
- description, the annotation of sequence.
- database, the database
(undisplayed as default).
- temporary, a temporary field
used by Saturated BLAST (undisplayed as default).
- parent, the serial number of its parent.
- seed,
a red 'x' is marked for active seed,
a normal 'x' is marked for used seed.
- break,
a red 'x' is marked for a break point,
where the Saturated BLAST will stop.
- level, the level of this sequence.
- hit no, the hit no of this sequence.
- score, the alignment of this sequence.
- expect, the expect value of thie sequence.
- identity, the sequence identity
of this sequence.
- query range, the range of query sequence
in the BLAST alignment.
- hit range, the range of itself
in the BLAST alignment.
- query sequence, the segment of query
sequence in the BLAST alignment.
- hit sequence, the segment of this
sequence in the BLAST alignment.
- Aligned sequence, the
sequence aligned its full-length parent without gaps in parent.
At the bottom of the window, there is a status bar which displays
the numbers of search, seed left and so on.
- double click on serial no or description
will open a message window of this sequence.
- double click on gi number will refresh
a netscape window to connect NCBI server and display the annotation of
this sequence. This require a existing netscape window on your desktop.
If nothing happen after clicking, check whether you have open a netscape window
or whether the action is performed on other desktop.
- double click on parent will open a
message window of this sequence's parent.
- click on the non-text part of a sequence
will select or unselect it. the selected sequences have light yellow
background.
- click on the non-text part of a sequence
while pressing shift key will select a range of
sequences.
Menu File -> open opens a Saturated BLAST
Restart file, which was automated saved by program before each BLAST
search or by user. When a new Restart file is read in, it replaces all
the data in current window.
Menu File -> new starts a new Saturated BLAST
job. A diaglog window will appear, and user need to supply the job name
and query sequence. Job name should be composed by normal word letters,
because it is first part of names of some temporary files and output
files. User can input one or more query sequences in FASTA format or
simple one-letter code.
After the OK button is pressed,
the input sequences will replace the current if any.
Menu File -> save saves the Restart file.
The Restart file is the basic output file of Saturated BLAST and can
be opened later by the program.
Menu File -> save as saves the results in
different formats including HTML, table and plain text. The HTML output files
are frame-based, and supporting dynamic display of message enabled by
Javascript. The table file is a text table delimited by 'TAB', so it can
be read by software like Microsoft Excel.
Menu File -> export exports sequences or
selected sequences in different ways: the gi number, FASTA format
and BLAST alignments. The gi numbers will be sorted.
Menu File -> quit quits Saturated BLAST
program.
Menu File -> delete selected will delete
the selected sequence. Acturally, these sequences are only marked as
deleted, they are still restored in the program. So the deleted sequences
can be recovered. Here, the sequences which have children can not be deleted,
otherwise some connections will be broken.
- Edit delete with children
Menu Edit -> delete with children will delete
the selected sequences along with their children, grand-children and so on.
Menu Edit -> delete with parent will delete
the selected sequences along with their parents and children of their parents.
Menu Edit -> undelete will recover all the
deleted sequences.
Menu Edit -> clear deletion remove the deleted
sequences from momery, and a new serial number will assign to all the
remaining sequences. After this operation, the deleted sequences can no
be recovered.
Menu Edit -> insert sequences allow user to
insert one or more query sequences. All the inserted sequences will
be marked as seeds, they have parent of -1 and level of 0. FASTA format
or one letter code file is permitted.
Menu Edit -> search can search the annotation
of all the sequences for simple word. The match is case insensetive. A
message window of next matching sequence will be opened.
Menu Set -> parameter set various parameters
and thresholds.
General parameters:
- Max iteration number is the max allowed
level. The program will stop and a message will printed in the status bar
if this value is reached.
- Max BLAST search number is the max allowed
number of BLAST searches. The program will stop and a message will
printed in the status bar if this value is reached.
- Batch work size is the max allowed
number of BLAST searches in one background batch.
Parameters for filter:
- Max expect of hits: sequences with expect
values higher than this value will be deleted from BLAST results.
- Min length of hits: sequences with length
shorter than this value will be deleted.
- Min sequence identity of hits:
sequence will
be deleted if the sequence identity with their parent is lower then this
value.
- Threshold overlap percent:
for a new sequence
B from BLAST results, if its id (gi number) is same with another sequence A
of Saturated BLAST, a overlap percent is calculated as the overlaped length
between a and b divided by the length of B. If this value is larger than
the "Threshold overlap percent", sequence B is considered as same as A, and
be deleted.
- Keywords to include: only sequences match
these keywords from BLAST results will added by the program.
- Keywords include pattern:
the logical pattern for "Keywords to include".
- Keywords to exclude: sequences matching
these keywords will be deleted from BLAST results.
- Keywords exclude pattern:
the logical pattern for "Keywords to exclude".
- Filter mode: it is for the
"Threshold overlap percent", the default value is strict, and the default
behavior is described above. But when it is set to greddy, if overlap
between B and A is greater than threshold, but A is not so significant to
be seed, B will not be deleted.
Parameters for select seed:
- Max expect for seed: sequences with
expect values higher than this value will not be selected as seed.
- Min length of seed: sequences with length
shorter than this value will not be selected as seeds.
- Min sequence identity of seed:
sequence will
not be selected as seeds if the sequence identity with their parent
is lower then this value.
- Cluster threshold:
the threshold of sequence
identity to cluster the sequence from BLAST output, and the longest sequences
from each cluster are selected as seeds.
Parameters to stop Saturated BLAST:
- Stop when match words: the program will
stop if these words are found in annotations of sequences.
- Logical pattern: The logical pattern for
above function.
Menu Set -> BLAST serarch defines
BLAST search program, site and options.
Left-upper panel of the window is a defined job or a job list. These jobs
will be assigned to each seed after the
Ok is pressed. Use
Add, Replace, and Delete buttons to
add new job, replace and delete selected job.
Right panel has the BLAST options:
Menu Set -> seed set the selected sequences
to seeds.
Menu Set -> unset seed clear the seed mark
of selected sequences.
Menu Set -> clears seed clear all
the seed selection,
Menu Set -> default seed clear all the seed
selection first, and set seed according to the parameter defined by
menu Set -> parameters.
Menu Set -> reuse seed activates the
selected seeds if they have been used before.
Menu Set -> break point on turns on break
points on selected seeds.
Menu Set -> break point off turns off break
points on selected seeds.
Menu Set -> clear break point turns off all
the break points.
Menu Set -> smart filter enables the smart
filter, then the program will remember all the sequences user deleted.
Menu Set -> show filter list lists the gi
numbers of all the sequences in smart filter list.
- Set export/import filter list
Menu Set -> export/import filter list
output or input the smart filter list.
Menu Set -> clear filter list emptys
the smart filter list.
Menu Select -> all selects all the sequences,
and refresh the main display window.
Menu Select -> clear clear all the selection.
Menu Select -> inverse make a inverse selection.
Menu Select -> logic set logical pattern
for advanced selection, and Menu Select -> seed.
Menu Select -> customize can make very
complecated selection. There are a list restrictions can be used to
make a selection. The logical of selection is "and".
For most of them, user need give a comparison operator such as
">", "==", "and" and so on.
For example, if you specify "expect < 0.0001" and "parent == 0",
the program will select all the children of sequence 0 with expect
lower than 0.0001.
The keywords matching, and regular expression are case insensitive.
Here, regular expression refers to the regular expression
of Perl language.
For example, "human|mouse" will match the sequences contain either
"human", "Human", "HUMAN", or "mouse". It is very useful, you can also use
it search sequence pattern.
If you are not familar with Perl, consult the
Perl manual
Menu Display -> refresh redraws the main
display window.
Menu Display -> all displays all the sequences.
Menu Display -> seed displays all seed
sequences.
Menu Display -> selected displays all selected
sequences.
Menu Display -> sort by sorts the displayed
sequences in order of serial no, gi no, expect, and so on.
Menu Display -> sort mode defines whether
the sequences are sorted in ascendent or descendent order.
- Display show (gi, hit no, and so on)
Menu Display -> show (gi, hit no, and so on)
turns on/off of each field.
Menu Tool -> BLAST alignment organizes a
multiple alignment of one seed and its children. This alignment is
derived from BLAST alignment. And the gaps in seed sequence are deleted.
User need to supply the serial no of seed sequence. The
Menu Display -> all, seed or selected can
also control the display of this window.
Menu Tool -> linked alignment calculates
the multiple alignment of any selection of sequences, derived directly
or indirectly from BLAST results. In the alignment, the first sequence should
be the ancestor of all other sequence, This is the master sequence.
This window has a simple pull-down menubar.
- Menu File -> refresh
refreshes the alignment.
- Menu File -> save saves the
alignment to a text file in FASTA format.
- Menu File -> close closes this window.
- Menu Configuration -> show gap in master
switches on or off the display of gaps in master sequences.
- Menu Configuration -> show what?
allowes user to define what sequences to be displayed. The selection tool
and display tool from main window can be used when the window is open.
Menu Tool -> view log displays the LOG file,
containing some important activities of Saturated BLAST, For example,
if a BLAST is failed, the program will record it into LOG file.
Menu Tool -> align 2 sequences can align
any pair of sequences, the alignment is either derived from BLAST or
is calculated by dynamic program function. Users need to supply the
serial numbers of two sequences.
Menu Tool -> cluster selected cluster the
selected sequences into subgroups by average linkage cluster method.
This window has a simple pull-down menubar. Both sequence identity
and normolized alignment score can be used to cluster the sequences.
Users need to give the threshold. Because the pairwise alignments
are required by cluster program, if they can not be derived from
BLAST output, the dynamic program function is a option.
The pairwise scores are saved after calculation, so
a new cluster computation will be faster than old one.
This window has some simple pull-down menubar.
- Menu File -> refresh refreshes results.
- Menu File -> save saves the cluster
results into a text file.
- Menu File -> close closes this window.
- Menu Set -> select representatives will
select a representative from each cluster, this selection will replace
the current setting in the main window.
- Menu Set -> mark clstr no assignes the
cluster number to the temporary filed of clustered sequences. User then can
use display tool to view this number in main display window or to sort results.
- Menu clear pairwise score delete
all saved pairwise alignment scores or identities.
- Menu use sequence identity use the
sequence identity rather than alignment score for clustering.
Menu Help -> help opens a small inline
help window.
- Help manual (offical, local)
Menu Help -> manual (offical, local) calls
a netscape window and open this "User's manul" from the offical
Saturated BLAST site or from user's local copy.
There are seven action button just below the menubar of Saturated BLAST.
They are used to control the BLAST search.
- Button next run BLAST search using
next seed.
- Button cont contiunely run BLAST
searches until break point is met or no new seed is available.
- Button jump to run BLAST searching
using next selected seed.
- Button send batch send a batch of
BLAST searches in background. The size of batch is defined by
menu Set->parameter. These jobs are then
hold in a waiting list. While the waiting list is full (has the same
size of defined batch), no more jobs of background can be submitted.
NOTE! if you sumbit jobs to NCBI server, please don't define big
batch size, because NCBI is a public server.
- Button get batch will check waiting
list whether the background jobs are finished. If some jobs are finished,
they are parsed and added into Saturated BLAST. If some jobs are still
running, program will return and leave them in the waiting list.
- Button run batch will run batch
BLAST automaticly, it will stop if break point is met or
no new seed is available.
- Button stop will send a stop sign to
program, and the BLAST search will stop after receives this signal.
TOC
I need your input! Thanks,