📄 clustalv.doc

📁 clustalw1.83.DOS.ZIP,用于多序列比对的软件
💻 DOC
📖 第 1 页 / 共 5 页
字号:
               I      I------ 3  Diagram of the sequence similarity 
          I----I
          I    I------------- 4  relationships shown in the above 
       I--I
       I  I------------------ 1  dendrogram file (branch lengths are
   ----I
       I       I------------- 5  not to scale).
       I-------I
               I------------- 6









MULTIPLE ALIGNMENT PARAMETERS:


Having calculated a dendrogram between a set of sequences, the final 
multiple alignment is carried out by a series of alignments of 
larger and larger groups of sequences.  The order is determined by 
the dendrogram so that the most similar sequences get aligned first.  
Any gaps that are introduced in the early alignments are fixed.  
When two groups of sequences are aligned against each other, a full 
protein weight matrix (such as a Dayhoff PAM 250) is used.  Two gap 
penalties are offered: a "FIXED" penalty for opening up a gap and a 
"FLOATING" penalty for extending a gap.  


 ********* MULTIPLE ALIGNMENT PARAMETERS *********


     1. Fixed Gap Penalty       :10
     2. Floating Gap Penalty    :10
     3. Toggle Transitions (DNA):Weighted
     4. Protein weight matrix   :PAM 250

     H. HELP


Enter number (or [RETURN] to exit): 


FIXED GAP PENALTY:   Reduce this to encourage gaps of all sizes; 
increase it to discourage them.   Terminal gaps are penalised same 
as all others.  BEWARE of making this too small (approx 5 or so); if 
the penalty is too small, the program may prefer to align each 
sequence opposite one long gap.

FLOATING GAP PENALTY:   Reduce this to encourage longer gaps; 
increase it to shorten them.   Terminal gaps are penalised same as 
all others.  BEWARE of making this too small (approx 5 or so); if 
the penalty is too small, the program may prefer to align each 
sequence opposite one long gap.


DNA TRANSITIONS = WEIGHTED or UNWEIGHTED:   By default, transitions 
(A versus G; C versus T) are weighted more strongly than 
transversions (an A aligned with a G will be preferred to an A 
aligned with a C or a T).  You can make all pairs of nucleotide 
equally weighted with this option.

PROTEIN WEIGHT MATRIX:  For protein comparisons, a weight matrix is 
used to differentially weight different pairs of aligned amino 
acids.  The default is the well known Dayhoff PAM 250 matrix.  We 
also offer a PAM 100 matrix, an identity matrix (all weights are the 
same for exact matches) or allow you to give the name of a file with 
your own matrix.  The weight matrices used by Clustal V are shown in 
full in the Algorithms and References section of this documentation.  

If you input a matrix from a file, it must be in the following 
format.  Use a 20x20 matrix only (entries for the 20 "normal" amino 
acids only; no ambiguity codes etc.).  Input the lower left triangle 
of the matrix, INCLUDING the diagonal.  The order of the amino acids 
(rows and columns) must be: CSTPAGNDEQHRKMILVFYW.  The values can be 
in free format seperated by spaces (not commas).  The PAM 250 matrix 
is shown below in this format.

  12 
   0  2 
  -2  1  3 
  -3  1  0  6 
  -2  1  1  1  2 
  -3  1  0 -1  1  5 
  -4  1  0 -1  0  0  2 
  -5  0  0 -1  0  1  2  4 
  -5  0  0 -1  0  0  1  3  4 
  -5 -1 -1  0  0 -1  1  2  2  4 
  -3 -1 -1  0 -1 -2  2  1  1  3  6 
  -4  0 -1  0 -2 -3  0 -1 -1  1  2  6 
  -5  0  0 -1 -1 -2  1  0  0  1  0  3  5 
  -5 -2 -1 -2 -1 -3 -2 -3 -2 -1 -2  0  0  6 
  -2 -1  0 -2 -1 -3 -2 -2 -2 -2 -2 -2 -2  2  5 
  -6 -3 -2 -3 -2 -4 -3 -4 -3 -2 -2 -3 -3  4  2  6 
  -2 -1  0 -1  0 -1 -2 -2 -2 -2 -2 -2 -2  2  4  2  4 
  -4 -3 -3 -5 -4 -5 -4 -6 -5 -5 -2 -4 -5  0  1  2 -1  9 
   0 -3 -3 -5 -3 -5 -2 -4 -4 -4  0 -4 -4 -2 -1 -1 -2  7 10 
  -8 -2 -5 -6 -6 -7 -4 -7 -7 -5 -3  2 -3 -4 -5 -2 -6  0  0 17 

Values must be integers and can be all positive or positive and 
negative as above.  These are SIMILARITY values.  




ALIGNMENT OUTPUT OPTIONS:
      
By default, the alignment goes to a file in a self explanatory 
"blocked" alignment format.  This format is fine for displaying the 
results but requires heavy editing if you wish to use the alignment 
with other software.  To help, we provide 3 other formats which can 
be turned on or off.  If you have a sequence data set or alignment 
in memory, you can also ask for output files in whatever formats are 
turned on, NOW.  The menu you use to choose format is shown below.
 
*** 
We draw your attention to NBRF/PIR format in particular.  This 
format is EXACTLY the same as one of the input formats.  Therefore, 
alignments written in this format can be used again as input (to the 
profile alignments or phylogenetic trees).
***


 ********* Format of Alignment Output *********


     1. Toggle CLUSTAL format output   =  ON
     2. Toggle NBRF/PIR format output  =  OFF
     3. Toggle GCG format output       =  OFF
     4. Toggle PHYLIP format output    =  OFF

     5. Create alignment output file(s) now?
     H. HELP


Enter number (or [RETURN] to exit): 



CLUSTAL FORMAT:     This is a self explanatory alignment.  The 
alignment is written out in blocks.  Identities are highlighted and 
(if you use a PAM 250 matrix) positions in the alignment where all 
of the residues are "similar" to each other (PAM 250 score of 8 or 
more) are indicated.

NBRF/PIR FORMAT:    This is the usual NBRF/PIR format with gaps 
indicated by hyphens ("-"). AS we have stressed before, this format 
is EXACTLY compatible with the sequence input format.  Therefore you 
can read in these alignments again for profile alignments or for 
calculating phylogenetic trees.  

GCG FORMAT:         In version 7 of the Wisconsin GCG package, a new 
multiple sequence format was introduced.  This is the MSF (Multiple 
Sequence Format) format.  It can be used as input to the GCG 
sequence editor or any of the GCG programs that make use of multiple 
alignments.   THIS FORMAT IS ONLY SUPPORTED IN VERSION 7 OF THE GCG 
PACKAGE OR LATER.  

PHYLIP FORMAT:      This format can be used by the Phylip package of 
Joe Felsenstein (see the references/algorithms section for details 
of how to get it).  Phylip allows you to do a huge range of 
phylogenetic analyses (we just offer one method in this program) and 
is probably the most widely used set of programs for drawing trees.
It also works on just about every computer you can think of, 
providing you have a decent Pascal compiler.





      ******************************
      *   PROFILE ALIGNMENT MENU.  *
      ******************************



This menu is for taking two old alignments (or single sequences) and 
aligning them with each other.  The result is one bigger alignment.  
The menu is very similar to the multiple alignment menu except that 
there is no mention of dendrograms here (they are not needed) and 
you need to input two sets of sequences.  The menu looks like this:



******Profile*Alignment*Menu******


    1.  Input 1st. profile/sequence
    2.  Input 2nd. profile/sequence
    3.  Do alignment now
    4.  Alignment parameters
    5.  Output format options

    S.  Execute a system command
    H.  HELP
    or press [RETURN] to go back to main menu


Your choice: 


You must input profile number 1 first.   When both profiles are 
loaded, use item 3 (Do alignment now) and the 2 profiles will be 
aligned.  Items 4 and 5 (parameters and output options) are 
identical to the equivalent options on the multiple alignment menu.  

The same input routines that are used for general input are used 
here i.e. sequences must be in NBRF/PIR, EMBL/SwissProt or FASTA 
format, with gaps indicated by hyphens ("-").  This is why we have 
continualy drawn your attention to the NBRF/PIR format as a useful 
output format.  

Either profile can consist of just one sequence.   Therefore, if you 
have a favourite alignment of sequences that you are working on and 
wish to add a new sequence, you can use this menu, provided the 
alignment is in the correct format.  

The total number of sequences in the two profiles must be less less 
than or equal to the MAXN parameter set in the clustalv.h header 
file.  











      ******************************
      *   PHYLOGENETIC TREE MENU.  *
      ******************************


This menu allows you to input an alignment and calculate a 
phylogenetic tree.  You can also calculate a tree if you have just 
carried out a multiple alignment and the alignment is still in 
memory.  THE SEQUENCES MUST BE ALIGNED ALREADY!!!!!!   The tree will 
look strange if the sequences are not already aligned.  You can also 
"BOOTSTRAP" the tree to show confidence levels for groupings.  This 
is SLOW on microcomputers but works fine on workstations or 
mainframes.



******Phylogenetic*tree*Menu******


    1.  Input an alignment
    2.  Exclude positions with gaps?        = OFF
    3.  Correct for multiple substitutions? = OFF
    4.  Draw tree now
    5.  Bootstrap tree

    S.  Execute a system command
    H.  HELP
    or press [RETURN] to go back to main menu


Your choice: 




The same input routine that is used for general input is used here 
i.e. sequences must be in NBRF/PIR, EMBL/SwissProt or FASTA format, 
with gaps indicated by hyphens ("-").  This is why we have 
continualy drawn your attention to the NBRF/PIR format as a useful 
output format.  

If you have input an alignment, then just use item 4 to draw a tree.  
The method used is the Neighbor Joining method of Saitou and Nei 
(1987).  This is a "distance method". First, percent divergence 
figures are calculated between all pairs of sequence.  These 
divergence figures are then used by the NJ method to give the tree.  
Example trees will be shown below.  

There are two options which can be used to control the way the 
distances are calculated.  These are set by options 2 and 3 in the 
menu.  

EXCLUDE POSITIONS WITH GAPS?   This option allows you to ignore all 
alignment positions (columns) where there is a gap in ANY sequence.  
This guarantees that "like" is compared with "like" in all distances 
i.e. the same positions are used to calculate all distances.  It 
also means that the distances will be "metric".  The disadvantage of 
using this option is that you throw away much of the data if there 
are many gaps.  If the total number of gaps is small, it has little 
effect.  
 
CORRECT FOR MULTIPLE SUBSTITUTIONS?    As sequences diverge, 
substitutions accumulate.  It becomes increasingly likely that more 
than one substitution (as a result of a mutation) will have happened 
at a site where you observe just one difference now.  This option 
allows you to use formulae developed by Motoo Kimura to correct for 
this effect.  It has the effect of stretching long branches in tres 
while leaving short ones relatively untouched.  The desired effect 
is to try and make distances proportional to time since divergence.  

The tree is sent to a file called BLAH.NJ, where BLAH.SEQ is the 
name of the input, alignment file.  An example is shown below for 6 
globin sequences.  



 DIST   = percentage divergence (/100)
 Length = number of sites used in comparison

   1 vs.   2  DIST = 0.5683;  length =    139
   1 vs.   3  DIST = 0.5540;  length =    139
   1 vs.   4  DIST = 0.5315;  length =    111
   1 vs.   5  DIST = 0.7447;  length =    141
   1 vs.   6  DIST = 0.7571;  length =    140
   2 vs.   3  DIST = 0.0897;  length =    145
   2 vs.   4  DIST = 0.1391;  length =    115
   2 vs.   5  DIST = 0.7517;  length =    145
   2 vs.   6  DIST = 0.7431;  length =    144
   3 vs.   4  DIST = 0.0957;  length =    115
   3 vs.   5  DIST = 0.7379;  length =    145
   3 vs.   6  DIST = 0.7361;  length =    144
   4 vs.   5  DIST = 0.7304;  length =    115
   4 vs.   6  DIST = 0.7368;  length =    114
   5 vs.   6  DIST = 0.2697;  length =    152


			Neighbor-joining Method

 Saitou, N. and Nei, M. (1987) The Neighbor-joining Method:
 A New Method for Reconstructing Phylogenetic Trees.
 Mol. Biol. Evol., 4(4), 406-425


 This is an UNROOTED tree

 Numbers in parentheses are branch lengths


 Cycle   1     =  SEQ:   5 (  0.13382) joins  SEQ:   6 (  0.13592)
💿 文件大小 448 K
👤 上传用户 xufengping716
📂 所属分类其他行业
🏷️ 相关标签

#clustalw #DOS #ZIP #83
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -