📄 clustalw_help
字号:
tree and all branch lengths. The root of the tree can only be inferred byusing an outgroup (a sequence that you are certain branches at the outsideof the tree .... certain on biological grounds) OR if you assume a degreeof constancy in the 'molecular clock', you can place the root in the 'middle'of the tree (roughly equidistant from all tips).5) TOGGLE PHYLIP BOOTSTRAP POSITIONSBy default, the bootstrap values are correctly placed on the tree branches ofthe phylip format output tree. The toggle allows them to be placed on thenodes, which is incorrect, but some display packages (e.g. TreeTool, TreeViewand Phylowin) only support node labelling but not branch labelling. Careshould be taken to note which branches and labels go together.6) OUTPUT FORMATS: four different formats are allowed. None of these displaysthe tree visually. Useful display programs accepting PHYLIP format includeNJplot (from Manolo Gouy and supplied with Clustal W), TreeView (Mac-PC), andPHYLIP itself - OR get the PHYLIP package and use the tree drawing facilitiesthere. (Get the PHYLIP package anyway if you are interested in trees). TheNEXUS format can be read into PAUP or MacClade.>>HELP 8 << Help for choosing a weight matrixFor protein alignments, you use a weight matrix to determine the similarity ofnon-identical amino acids. For example, Tyr aligned with Phe is usually judged to be 'better' than Tyr aligned with Pro.There are three 'in-built' series of weight matrices offered. Each consists ofseveral matrices which work differently at different evolutionary distances. Tosee the exact details, read the documentation. Crudely, we store severalmatrices in memory, spanning the full range of amino acid distance (from almostidentical sequences to highly divergent ones). For very similar sequences, itis best to use a strict weight matrix which only gives a high score toidentities and the most favoured conservative substitutions. For more divergentsequences, it is appropriate to use "softer" matrices which give a high scoreto many other frequent substitutions.1) BLOSUM (Henikoff). These matrices appear to be the best available for carrying out database similarity (homology searches). The matrices used are:Blosum 80, 62, 45 and 30. (BLOSUM was the default in earlier Clustal Wversions)2) PAM (Dayhoff). These have been extremely widely used since the late '70s.We use the PAM 20, 60, 120 and 350 matrices.3) GONNET. These matrices were derived using almost the same procedure as theDayhoff one (above) but are much more up to date and are based on a far largerdata set. They appear to be more sensitive than the Dayhoff series. We use theGONNET 80, 120, 160, 250 and 350 matrices. This series is the default forClustal W version 1.8.We also supply an identity matrix which gives a score of 1.0 to two identical amino acids and a score of zero otherwise. This matrix is not very useful.Alternatively, you can read in your own (just one matrix, not a series).A new matrix can be read from a file on disk, if the filename consists onlyof lower case characters. The values in the new weight matrix must be integersand the scores should be similarities. You can use negative as well as positivevalues if you wish, although the matrix will be automatically adjusted to allpositive scores.For DNA, a single matrix (not a series) is used. Two hard-coded matrices are available:1) IUB. This is the default scoring matrix used by BESTFIT for the comparisonof nucleic acid sequences. X's and N's are treated as matches to any IUBambiguity symbol. All matches score 1.9; all mismatches for IUB symbols score 0. 2) CLUSTALW(1.6). The previous system used by Clustal W, in which matches score1.0 and mismatches score 0. All matches for IUB symbols also score 0.INPUT FORMAT The format used for a new matrix is the same as the BLAST program.Any lines beginning with a # character are assumed to be comments. The firstnon-comment line should contain a list of amino acids in any order, using the1 letter code, followed by a * character. This should be followed by a squarematrix of integer scores, with one row and one column for each amino acid. Thelast row and column of the matrix (corresponding to the * character) containthe minimum score over the whole matrix.>>HELP 9 << Help for command line parameters DATA (sequences)-INFILE=file.ext :input sequences.-PROFILE1=file.ext and -PROFILE2=file.ext :profiles (old alignment). VERBS (do things)-OPTIONS :list the command line parameters-HELP or -CHECK :outline the command line params.-ALIGN :do full multiple alignment -TREE :calculate NJ tree.-BOOTSTRAP(=n) :bootstrap a NJ tree (n= number of bootstraps; def. = 1000).-CONVERT :output the input sequences in a different file format. PARAMETERS (set things)***General settings:****-INTERACTIVE :read command line, then enter normal interactive menus-QUICKTREE :use FAST algorithm for the alignment guide tree-TYPE= :PROTEIN or DNA sequences-NEGATIVE :protein alignment with negative values in matrix-OUTFILE= :sequence alignment file name-OUTPUT= :GCG, GDE, PHYLIP, PIR or NEXUS-OUTORDER= :INPUT or ALIGNED-CASE :LOWER or UPPER (for GDE output only)-SEQNOS= :OFF or ON (for Clustal output only)-SEQNO_RANGE=:OFF or ON (NEW: for all output formats) -RANGE=m,n :sequence range to write starting m to m+n. ***Fast Pairwise Alignments:***-KTUPLE=n :word size-TOPDIAGS=n :number of best diags.-WINDOW=n :window around best diags.-PAIRGAP=n :gap penalty-SCORE :PERCENT or ABSOLUTE***Slow Pairwise Alignments:***-PWMATRIX= :Protein weight matrix=BLOSUM, PAM, GONNET, ID or filename-PWDNAMATRIX= :DNA weight matrix=IUB, CLUSTALW or filename-PWGAPOPEN=f :gap opening penalty -PWGAPEXT=f :gap opening penalty***Multiple Alignments:***-NEWTREE= :file for new guide tree-USETREE= :file for old guide tree-MATRIX= :Protein weight matrix=BLOSUM, PAM, GONNET, ID or filename-DNAMATRIX= :DNA weight matrix=IUB, CLUSTALW or filename-GAPOPEN=f :gap opening penalty -GAPEXT=f :gap extension penalty-ENDGAPS :no end gap separation pen. -GAPDIST=n :gap separation pen. range-NOPGAP :residue-specific gaps off -NOHGAP :hydrophilic gaps off-HGAPRESIDUES= :list hydrophilic res. -MAXDIV=n :% ident. for delay-TYPE= :PROTEIN or DNA-TRANSWEIGHT=f :transitions weighting***Profile Alignments:***-PROFILE :Merge two alignments by profile alignment-NEWTREE1= :file for new guide tree for profile1-NEWTREE2= :file for new guide tree for profile2-USETREE1= :file for old guide tree for profile1-USETREE2= :file for old guide tree for profile2***Sequence to Profile Alignments:***-SEQUENCES :Sequentially add profile2 sequences to profile1 alignment-NEWTREE= :file for new guide tree-USETREE= :file for old guide tree***Structure Alignments:***-NOSECSTR1 :do not use secondary structure-gap penalty mask for profile 1 -NOSECSTR2 :do not use secondary structure-gap penalty mask for profile 2-SECSTROUT=STRUCTURE or MASK or BOTH or NONE :output in alignment file-HELIXGAP=n :gap penalty for helix core residues -STRANDGAP=n :gap penalty for strand core residues-LOOPGAP=n :gap penalty for loop regions-TERMINALGAP=n :gap penalty for structure termini-HELIXENDIN=n :number of residues inside helix to be treated as terminal-HELIXENDOUT=n :number of residues outside helix to be treated as terminal-STRANDENDIN=n :number of residues inside strand to be treated as terminal-STRANDENDOUT=n:number of residues outside strand to be treated as terminal ***Trees:***-OUTPUTTREE=nj OR phylip OR dist OR nexus-SEED=n :seed number for bootstraps.-KIMURA :use Kimura's correction. -TOSSGAPS :ignore positions with gaps.-BOOTLABELS=node OR branch :position of bootstrap values in tree display>>HELP 0 << Help for tree output format optionsFour output formats are offered: 1) Clustal, 2) Phylip, 3) Just the distances4) NexusNone of these formats displays the results graphically. Many packages candisplay trees in the the PHYLIP format 2) below. It can also be imported intothe PHYLIP programs RETREE, DRAWTREE and DRAWGRAM for graphical display. NEXUS format trees can be read by PAUP and MacClade.1) Clustal format output. This format is verbose and lists all of the distances between the sequences andthe number of alignment positions used for each. The tree is described at theend of the file. It lists the sequences that are joined at each alignment stepand the branch lengths. After two sequences are joined, it is referred to lateras a NODE. The number of a NODE is the number of the lowest sequence in thatNODE. 2) Phylip format output.This format is the New Hampshire format, used by many phylogenetic analysispackages. It consists of a series of nested parentheses, describing thebranching order, with the sequence names and branch lengths. It can be used bythe RETREE, DRAWGRAM and DRAWTREE programs of the PHYLIP package to see thetrees graphically. This is the same format used during multiple alignment forthe guide trees. Use this format with NJplot (Manolo Gouy), supplied with Clustal W. Some otherpackages that can read and display New Hampshire format are TreeView (Mac/PC),TreeTool (UNIX), and Phylowin.3) The distances only.This format just outputs a matrix of all the pairwise distances in a formatthat can be used by the Phylip package. It used to be useful when one could notproduce distances from protein sequences in the Phylip package but is nowredundant (Protdist of Phylip 3.5 now does this).4) NEXUS FORMAT TREE. This format is used by several popular phylogeny programs,including PAUP and MacClade. The format is described fully in:Maddison, D. R., D. L. Swofford and W. P. Maddison. 1997.NEXUS: an extensible file format for systematic information.Systematic Biology 46:590-621.5) TOGGLE PHYLIP BOOTSTRAP POSITIONSBy default, the bootstrap values are placed on the nodes of the phylip formatoutput tree. This is inaccurate as the bootstrap values should be associatedwith the tree branches and not the nodes. However, this format can be read anddisplayed by TreeTool, TreeView and Phylowin. An option is available tocorrectly place the bootstrap values on the branches with which they areassociated.
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -