📄 clustalw.hlp

📁 clustalw1.83.DOS.ZIP,用于多序列比对的软件
💻 HLP
📖 第 1 页 / 共 3 页
字号:
上一页 1 23
tree and all branch lengths. The root of the tree can only be inferred by
using an outgroup (a sequence that you are certain branches at the outside
of the tree .... certain on biological grounds) OR if you assume a degree
of constancy in the 'molecular clock', you can place the root in the 'middle'
of the tree (roughly equidistant from all tips).

5) TOGGLE PHYLIP BOOTSTRAP POSITIONS
By default, the bootstrap values are correctly placed on the tree branches of
the phylip format output tree. The toggle allows them to be placed on the
nodes, which is incorrect, but some display packages (e.g. TreeTool, TreeView
and Phylowin) only support node labelling but not branch labelling. Care
should be taken to note which branches and labels go together.

6) OUTPUT FORMATS: four different formats are allowed. None of these displays
the tree visually. Useful display programs accepting PHYLIP format include
NJplot (from Manolo Gouy and supplied with Clustal W), TreeView (Mac-PC), and
PHYLIP itself - OR get the PHYLIP package and use the tree drawing facilities
there. (Get the PHYLIP package anyway if you are interested in trees). The
NEXUS format can be read into PAUP or MacClade.

>>HELP 8 <<      Help for choosing a weight matrix

For protein alignments, you use a weight matrix to determine the similarity of
non-identical amino acids.  For example, Tyr aligned with Phe is usually judged 
to be 'better' than Tyr aligned with Pro.

There are three 'in-built' series of weight matrices offered. Each consists of
several matrices which work differently at different evolutionary distances. To
see the exact details, read the documentation. Crudely, we store several
matrices in memory, spanning the full range of amino acid distance (from almost
identical sequences to highly divergent ones). For very similar sequences, it
is best to use a strict weight matrix which only gives a high score to
identities and the most favoured conservative substitutions. For more divergent
sequences, it is appropriate to use "softer" matrices which give a high score
to many other frequent substitutions.

1) BLOSUM (Henikoff). These matrices appear to be the best available for 
carrying out database similarity (homology searches). The matrices used are:
Blosum 80, 62, 45 and 30. (BLOSUM was the default in earlier Clustal W
versions)

2) PAM (Dayhoff). These have been extremely widely used since the late '70s.
We use the PAM 20, 60, 120 and 350 matrices.

3) GONNET. These matrices were derived using almost the same procedure as the
Dayhoff one (above) but are much more up to date and are based on a far larger
data set. They appear to be more sensitive than the Dayhoff series. We use the
GONNET 80, 120, 160, 250 and 350 matrices. This series is the default for
Clustal W version 1.8.

We also supply an identity matrix which gives a score of 1.0 to two identical 
amino acids and a score of zero otherwise. This matrix is not very useful.
Alternatively, you can read in your own (just one matrix, not a series).

A new matrix can be read from a file on disk, if the filename consists only
of lower case characters. The values in the new weight matrix must be integers
and the scores should be similarities. You can use negative as well as positive
values if you wish, although the matrix will be automatically adjusted to all
positive scores.



For DNA, a single matrix (not a series) is used. Two hard-coded matrices are 
available:


1) IUB. This is the default scoring matrix used by BESTFIT for the comparison
of nucleic acid sequences. X's and N's are treated as matches to any IUB
ambiguity symbol. All matches score 1.9; all mismatches for IUB symbols score 0.
 
 
2) CLUSTALW(1.6). The previous system used by Clustal W, in which matches score
1.0 and mismatches score 0. All matches for IUB symbols also score 0.

INPUT FORMAT  The format used for a new matrix is the same as the BLAST program.
Any lines beginning with a # character are assumed to be comments. The first
non-comment line should contain a list of amino acids in any order, using the
1 letter code, followed by a * character. This should be followed by a square
matrix of integer scores, with one row and one column for each amino acid. The
last row and column of the matrix (corresponding to the * character) contain
the minimum score over the whole matrix.

>>HELP 9 <<      Help for command line parameters
                DATA (sequences)

-INFILE=file.ext                             :input sequences.
-PROFILE1=file.ext  and  -PROFILE2=file.ext  :profiles (old alignment).


                VERBS (do things)

-OPTIONS	    :list the command line parameters
-HELP  or -CHECK    :outline the command line params.
-ALIGN              :do full multiple alignment 
-TREE               :calculate NJ tree.
-BOOTSTRAP(=n)      :bootstrap a NJ tree (n= number of bootstraps; def. = 1000).
-CONVERT            :output the input sequences in a different file format.


                PARAMETERS (set things)

***General settings:****
-INTERACTIVE :read command line, then enter normal interactive menus
-QUICKTREE   :use FAST algorithm for the alignment guide tree
-TYPE=       :PROTEIN or DNA sequences
-NEGATIVE    :protein alignment with negative values in matrix
-OUTFILE=    :sequence alignment file name
-OUTPUT=     :GCG, GDE, PHYLIP, PIR or NEXUS
-OUTORDER=   :INPUT or ALIGNED
-CASE        :LOWER or UPPER (for GDE output only)
-SEQNOS=     :OFF or ON (for Clustal output only)
-SEQNO_RANGE=:OFF or ON (NEW: for all output formats) 
-RANGE=m,n   :sequence range to write starting m to m+n. 

***Fast Pairwise Alignments:***
-KTUPLE=n    :word size
-TOPDIAGS=n  :number of best diags.
-WINDOW=n    :window around best diags.
-PAIRGAP=n   :gap penalty
-SCORE       :PERCENT or ABSOLUTE


***Slow Pairwise Alignments:***
-PWMATRIX=    :Protein weight matrix=BLOSUM, PAM, GONNET, ID or filename
-PWDNAMATRIX= :DNA weight matrix=IUB, CLUSTALW or filename
-PWGAPOPEN=f  :gap opening penalty        
-PWGAPEXT=f   :gap opening penalty


***Multiple Alignments:***
-NEWTREE=      :file for new guide tree
-USETREE=      :file for old guide tree
-MATRIX=       :Protein weight matrix=BLOSUM, PAM, GONNET, ID or filename
-DNAMATRIX=    :DNA weight matrix=IUB, CLUSTALW or filename
-GAPOPEN=f     :gap opening penalty        
-GAPEXT=f      :gap extension penalty
-ENDGAPS       :no end gap separation pen. 
-GAPDIST=n     :gap separation pen. range
-NOPGAP        :residue-specific gaps off  
-NOHGAP        :hydrophilic gaps off
-HGAPRESIDUES= :list hydrophilic res.    
-MAXDIV=n      :% ident. for delay
-TYPE=         :PROTEIN or DNA
-TRANSWEIGHT=f :transitions weighting


***Profile Alignments:***
-PROFILE      :Merge two alignments by profile alignment
-NEWTREE1=    :file for new guide tree for profile1
-NEWTREE2=    :file for new guide tree for profile2
-USETREE1=    :file for old guide tree for profile1
-USETREE2=    :file for old guide tree for profile2


***Sequence to Profile Alignments:***
-SEQUENCES   :Sequentially add profile2 sequences to profile1 alignment
-NEWTREE=    :file for new guide tree
-USETREE=    :file for old guide tree


***Structure Alignments:***
-NOSECSTR1     :do not use secondary structure-gap penalty mask for profile 1 
-NOSECSTR2     :do not use secondary structure-gap penalty mask for profile 2
-SECSTROUT=STRUCTURE or MASK or BOTH or NONE   :output in alignment file
-HELIXGAP=n    :gap penalty for helix core residues 
-STRANDGAP=n   :gap penalty for strand core residues
-LOOPGAP=n     :gap penalty for loop regions
-TERMINALGAP=n :gap penalty for structure termini
-HELIXENDIN=n  :number of residues inside helix to be treated as terminal
-HELIXENDOUT=n :number of residues outside helix to be treated as terminal
-STRANDENDIN=n :number of residues inside strand to be treated as terminal
-STRANDENDOUT=n:number of residues outside strand to be treated as terminal 


***Trees:***
-OUTPUTTREE=nj OR phylip OR dist OR nexus
-SEED=n        :seed number for bootstraps.
-KIMURA        :use Kimura's correction.   
-TOSSGAPS      :ignore positions with gaps.
-BOOTLABELS=node OR branch :position of bootstrap values in tree display

>>HELP 0 <<           Help for tree output format options

Four output formats are offered: 1) Clustal, 2) Phylip, 3) Just the distances
4) Nexus

None of these formats displays the results graphically. Many packages can
display trees in the the PHYLIP format 2) below. It can also be imported into
the PHYLIP programs RETREE, DRAWTREE and DRAWGRAM for graphical display. 
NEXUS format trees can be read by PAUP and MacClade.

1) Clustal format output. 
This format is verbose and lists all of the distances between the sequences and
the number of alignment positions used for each. The tree is described at the
end of the file. It lists the sequences that are joined at each alignment step
and the branch lengths. After two sequences are joined, it is referred to later
as a NODE. The number of a NODE is the number of the lowest sequence in that
NODE.   

2) Phylip format output.
This format is the New Hampshire format, used by many phylogenetic analysis
packages. It consists of a series of nested parentheses, describing the
branching order, with the sequence names and branch lengths. It can be used by
the RETREE, DRAWGRAM and DRAWTREE programs of the PHYLIP package to see the
trees graphically. This is the same format used during multiple alignment for
the guide trees. 

Use this format with NJplot (Manolo Gouy), supplied with Clustal W. Some other
packages that can read and display New Hampshire format are TreeView (Mac/PC),
TreeTool (UNIX), and Phylowin.

3) The distances only.
This format just outputs a matrix of all the pairwise distances in a format
that can be used by the Phylip package. It used to be useful when one could not
produce distances from protein sequences in the Phylip package but is now
redundant (Protdist of Phylip 3.5 now does this).

4) NEXUS FORMAT TREE. This format is used by several popular phylogeny programs,
including PAUP and MacClade. The format is described fully in:
Maddison, D. R., D. L. Swofford and W. P. Maddison.  1997.
NEXUS: an extensible file format for systematic information.
Systematic Biology 46:590-621.

5) TOGGLE PHYLIP BOOTSTRAP POSITIONS
By default, the bootstrap values are placed on the nodes of the phylip format
output tree. This is inaccurate as the bootstrap values should be associated
with the tree branches and not the nodes. However, this format can be read and
displayed by TreeTool, TreeView and Phylowin. An option is available to
correctly place the bootstrap values on the branches with which they are
associated.
上一页 1 23
💿 文件大小 448 K
👤 上传用户 xufengping716
📂 所属分类其他行业
🏷️ 相关标签

#clustalw #DOS #ZIP #83
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -