📄 clustalw.doc

📁 在任务级并行平台P2HP上
💻 DOC
📖 第 1 页 / 共 3 页
字号:
12 3 下一页
README for Clustal W version 1.7  June 1997

             Clustal W version 1.7 Documentation

This file provides some notes on the latest changes, installation and usage
of the Clustal W multiple sequence alignment program.



Julie Thompson (Thompson@EMBL-Heidelberg.DE)
Toby Gibson    (Gibson@EMBL-Heidelberg.DE)

European Molecular Biology Laboratory
Meyerhofstrasse 1
D 69117 Heidelberg
Germany


Des Higgins (Higgins@ucc.ie)

University of County Cork
Cork
Ireland


Please e-mail bug reports/complaints/suggestions (polite if possible)
to Toby Gibson or Des Higgins.  



Thompson, J.D., Higgins, D.G. and Gibson, T.J. (1994)
CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment
through sequence weighting, positions-specific gap penalties and weight matrix
choice.  Nucleic Acids Research, 22:4673-4680.

--------------------------------------------------------------

What's New (June 1997) in Version 1.7 (since version 1.6).


1. The static arrays used by clustalw for storing the alignment data have been
replaced by dynamically allocated memory. There is now no limit on the number
or length of sequences which can be input.

2. The alignment of DNA sequences now offers a new hard-coded matrix, as well
as the identity matrix used previously. The new matrix is the default scoring
matrix used by the BESTFIT program of the GCG package for the comparison of
nucleic acid sequences. X's and N's are treated as matches to any IUB ambiguity
symbol. All matches score 1.9; all mismatches for IUB symbols score 0.0.

3. The transition weight option for aligning nucleotide sequences has been
changed from an on/off toggle to a weight between 0 and 1.  A weight of zero
means that the transitions are scored as mismatches; a weight of 1 gives 
transitions the full match score. For distantly related DNA sequences, the
weight should be near to zero; for closely related sequences it can be useful
to assign a higher score.

4. The RSF sequence alignment file format used by GCG Version 9 can now be
read.

5. The clustal sequence alignment file format has been changed to allow
sequence names longer than 10 characters. The maximum length allowed is set in
clustalw.h by the statement:
#define MAXNAMES	10

For the fasta format, the name is taken as the first string after the '>'
character, stopping at the first white space. (Previously, the first 10
characters were taken, replacing blanks by underscores).

6. The bootstrap values written in the phylip tree file format can be assigned
either to branches or nodes. The default is to write the values on the nodes,
as this can be read by several commonly-used tree display programs. But note
that this can lead to confusion if the tree is rooted and the bootstraps may
be better attached to the internal branches: Software developers should ensure
they can read the branch label format.

7. The sequence weighting used during sequence to profile alignments has been
changed. The tree weight is now multiplied by the percent identity of the
new sequence compared with the most closely related sequence in the profile.

8. The sequence weighting used during profile to profile alignments has been
changed. A guide tree is now built for each profile separately and the
sequence weights calculated from the two trees. The weights for each
sequence are then multiplied by the percent identity of the sequence compared
with the most closely related sequence in the opposite profile.

9. The adjustment of the Gap Opening and Gap Extension Penalties for sequences
of unequal length has been improved.

10. The default order of the sequences in the output alignment file has been
changed. Previously the default was to output the sequences in the same order
as the input file. Now the default is to use the order in which the sequences
were aligned (from the guide tree/dendrogram), thus automatically grouping
closely related sequences.

11. The option to 'Reset Gaps between alignments' has been switched off by
default.

12. The conservation line output in the clustal format alignment file has been
changed. Three characters are now used:
'*' indicates positions which have a single, fully conserved residue
':' indicates that one of the following 'strong' groups is fully conserved:-
                 STA
                 NEQK
                 NHQK
                 NDEQ
                 QHRK
                 MILV
                 MILF
                 HY
                 FYW

'.' indicates that one of the following 'weaker' groups is fully conserved:-
                 CSA
                 ATV
                 SAG
                 STNK
                 STPA
                 SGND
                 SNDEQK
                 NDEQHK
                 NEQHRK
                 FVLIM
                 HFY

These are all the positively scoring groups that occur in the Gonnet Pam250
matrix. The strong and weak groups are defined as strong score >0.5 and weak
score =<0.5 respectively.

13. A bug in the modification of the Myers and Miller alignment algorithm
for residue-specific gap penalites has been fixed. This occasionally caused
new gaps to be opened a few residues away from the optimal position.

14. The GCG/MSF input format no longer needs the word PILEUP on the first
line. Several versions can now be recognised:-
      1.  The word PILEUP as the first word in the file
      2.  The word !!AA_MULTIPLE_ALIGNMENT or !!NA_MULTIPLE_ALIGNMENT
          as the first word in the file
      3.  The characters MSF on the first line in the line, and the
          characters .. at the end of the line.

15. The standard command line separator for UNIX systems has been changed from
'/' to '-'. ie. to give options on the command line, you now type

     clustalw input.aln -gapopen=8.0

instead of  clustalw input.aln /gapopen=8.0


                      ATTENTION SOFTWARE DEVELOPERS!!
                      -------------------------------

The CLUSTAL sequence alignment output format has been modified:

1. Names longer than 10 chars are now allowed. (The maximum is specified in
clustalw.h by '#define MAXNAMES'.)

2. The consensus line now consists of three characters: '*',':' and '.'. (Only
the '*' and '.' were previously used.)

3. An option (not the default) has been added, allowing the user to print out
sequence numbers at the end of each line of the alignment output.

4. Both RNA bases (U) and base ambiguities are now supported in nucleic acid
sequences. In the past, all characters (upper or lower case) other than
a,c,g,t or u were converted to N. Now the following characters are recognised 
and retained in the alignment output: ABCDGHKMNRSTUVWXY (upper or lower case).

5. A  Blank line inadvertently added in the version 1.6 header has been taken
out again.


--------------------------------------------------------------

What's New (March 1996) in Version 1.6 (since version 1.5).


1) Improved handling of sequences of unequal length.  Previously, we
increased the gap extension penalties for both sequences if the two sequences
(or groups of previously aligned sequences) were of different lengths.  
Now, we increase the gap opening and extension penalties for the shorter 
sequence only.   This helps prevent short sequences being stretched out
along longer ones.

2) Added the "Gonnet" series of weight matrices (from Gaston Gonnet and 
co-workers at the ETH in Zurich).  Fixed a bug in the matrix
choice menu; now PAM matrices can be selected ok.

3) Added secondary structure/gap penalty masks.  These allow you to 
include, in an alignment, a position specific set of gap penalties.  
You can either set a gap opening penalty at each position or specify
the secondary strcuture (if protein; alpha helix, beta strand or loop)
and have gap penalties set automatically.   This, basically, is used to make 
gaps harder to open inside helices or strands.  

These masks are only used in the "profile alignment" menu.  They may be read in
as part of an alignment in a special format (see the on-line help for
details) or associated with each sequence, if the sequences are in Swiss Prot 
format and secondary structure information is given.   All of the mask 
parameters can be set from the profile alignment menu.  Basically, the
mask is made up of a series of numbers between 1 and 9, one per position.
The gap opening penalty at a position is calculated as the starting penalty
multipleied by the mask value at that site. 

4) Added command line options /profile and /sequences.
These allow uses to choose between normal profile alignment where the
two profiles (pre-existing alignments specified in the files
/profile1= and /profile2=) are merged/aligned with each other (/profile)
and the case where the individual sequences in /profile2 are aligned
sequentially with the alignment in /profile1 (/sequences).

5) Fixed bug in modified Myers and Miller algorithm - gap penalty score
was not always calculated properly for type 2 midpoints.  This is the core
alignment algorithm.

6) Only allows one output file format to be selected from command line
- ie. multiple output alignment files are not allowed.

7) Fixed 'bad calls to ckfree' error during calculation of phylip distance
matrix.

8) Fixed command line options /gapopen /gapext /type=protein /negative.

9) Allowed user to change command line separator on UNIX from '/' to '-'.
This allows unix users to use the more conventinal '-' symbol
for seperating command line options.  "/" can then be used in unix
file names on the command line.   The symbol that is used,
is specified in the file clustalw.h which must be edited if you 
wish to change it (and the program must then be recompiled).   Find the 
block of code in clustalw.h that corrsponds to the operating system you
are using.  These blocks are started by one of the following:

#ifdef VMS 
#elif MAC
#elif MSDOS
#elif UNIX

On the next line after each is the line:

#define COMMANDSEP '/'

Change this in the appropriate block of code (e.g. the UNIX block) to 

#define COMMANDSEP '-'

if you wish to use the "-" character as command seperator.
  

       
--------------------------------------------------------------

What's New (April 1995) in Version 1.5 (since version 1.3).
12 3 下一页
💿 文件大小 576 K
👤 上传用户 as7512158
📂 所属分类并行计算
🏷️ 相关标签

#P2HP #并行
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -