📄 readme_w
字号:
****************************************************************************** CLUSTAL W Multiple Sequence Alignment Program (version 1.83, Feb 2003)******************************************************************************Please send bug reports, comments etc. to one of:- gibson@embl-heidelberg.de thompson@igbmc.u-strasbg.fr d.higgins@ucc.ie****************************************************************************** POLICY ON COMMERCIAL DISTRIBUTION OF CLUSTAL WClustal W is freely available to the user community. However, Clustal W isincreasingly being distributed as part of commercial sequence analysispackages. To help us safeguard future maintenance and development, commercialdistributors of Clustal W must take out a NON-EXCLUSIVE LICENCE. Anyonewishing to commercially distribute version 1.81 of Clustal W should contact theauthors unless they have previously taken out a licence.******************************************************************************Clustal W is written in ANSI-C and can be run on any machine with an ANSI-Ccompiler. Executables are provided for several major platforms. Changes since CLUSTAL X Version 1.82------------------------------------1. The FASTA format has been added to the list of alignment output options.2. It is now possible to save the residue ranges (appended after the sequencenames) when saving a specified range of the alignment.3. The efficiency of the neighour-joining algorithm has been improved. Thiswork was done by Tadashi Koike at the Center for Information Biology and DNA DataBank of Japan and FUJITSU Limited.Some example speedups are given below : (timings on a SPARC64 CPU)No. of sequences original NJ new NJ 200 0' 12" 0.1" 500 9' 19" 1.4" 1000 XXXX 0' 31"Changes since version 1.8 --------------------------1. ClustalW now returns error codes for some common errors when exiting. Thismay be useful for people who run clustalw automatically from within a script.Error codes are: 1 bad command line option 2 cannot open sequence file 3 wrong format in sequence file 4 sequence file contains only 1 sequence (for multiple alignments)2. Alignments can now be saved in Nexus format, for compatibility with PAUP, MacClade etc. For a description of the Nexus format, see:Maddison, D. R., D. L. Swofford and W. P. Maddison. 1997.NEXUS: an extensible file format for systematic information.Systematic Biology 46:590-621.3. Phylogenetic trees can also be saved in nexus format.4. A ClustalW icon has been designed for MAC and PC systems.Changes since version 1.74 --------------------------1. Some work has been done to automatically select the optimal parametersdepending on the set of sequences to be aligned. The Gonnet series of residuecomparison matrices are now used by default. The Blosum series remains as anoption. The default gap extension penalty for proteins has been changed to 0.2(was 0.05).The 'delay divergent sequences' option has been changed to 30%residue identity (was 40%).2. The default parameters used when the 'Negative matrix' option is selectedhave been optimised. This option may help when the sequences to be aligned arenot superposable over their whole lengths (e.g. in the presence of N/C terminalextensions).3. A bug in the calculation of phylogenetic trees for 2 sequences has beenfixed.4. A command line option has been added to turn off the sequence weightingcalculation.5. The phylogenetic tree calculation now ignores any ambiguity codes in thesequences. 6. A bug in the memory access during the calculation of profiles has beenfixed. (Thanks to Haruna Cofer at SGI).7. A bug has been fixed in the 'transition weight' option for nucleic acidsequences. (Thanks to Chanan Rubin at Compugen).8. An option has been added to read in a series of comparison matrices from afile. This option is only applicable for protein sequences. For details of thefile format, see the on-line documentation.9. The MSF output file format has been changed. The sequence weightscalculated by Clustal W are now included in the header.10. Two bugs in the FAST/APPROXIMATE pairwise alignments have been fixed. Oneinvolved the alignment of new sequences to an existing profile using the fastpairwise alignment option; the second was caused by changing the defaultoptions for the fast pairwise alignments.11. A bug in the alignment of a small number of sequences has been fixed.Previously a Guide Tree was not calculated for less than 4 sequences.Changes since version 1.6-------------------------1. The static arrays used by clustalw for storing the alignment data have beenreplaced by dynamically allocated memory. There is now no limit on the numberor length of sequences which can be input.2. The alignment of DNA sequences now offers a new hard-coded matrix, as wellas the identity matrix used previously. The new matrix is the default scoringmatrix used by the BESTFIT program of the GCG package for the comparison ofnucleic acid sequences. X's and N's are treated as matches to any IUB ambiguitysymbol. All matches score 1.9; all mismatches for IUB symbols score 0.0.3. The transition weight option for aligning nucleotide sequences has beenchanged from an on/off toggle to a weight between 0 and 1. A weight of zeromeans that the transitions are scored as mismatches; a weight of 1 gives transitions the full match score. For distantly related DNA sequences, theweight should be near to zero; for closely related sequences it can be usefulto assign a higher score.4. The RSF sequence alignment file format used by GCG Version 9 can now beread.5. The clustal sequence alignment file format has been changed to allowsequence names longer than 10 characters. The maximum length allowed is set inclustalw.h by the statement:#define MAXNAMES 10For the fasta format, the name is taken as the first string after the '>'character, stopping at the first white space. (Previously, the first 10characters were taken, replacing blanks by underscores).6. The bootstrap values written in the phylip tree file format can be assignedeither to branches or nodes. The default is to write the values on the nodes,as this can be read by several commonly-used tree display programs. But notethat this can lead to confusion if the tree is rooted and the bootstraps maybe better attached to the internal branches: Software developers should ensurethey can read the branch label format.7. The sequence weighting used during sequence to profile alignments has beenchanged. The tree weight is now multiplied by the percent identity of thenew sequence compared with the most closely related sequence in the profile.8. The sequence weighting used during profile to profile alignments has beenchanged. A guide tree is now built for each profile separately and thesequence weights calculated from the two trees. The weights for eachsequence are then multiplied by the percent identity of the sequence comparedwith the most closely related sequence in the opposite profile.9. The adjustment of the Gap Opening and Gap Extension Penalties for sequencesof unequal length has been improved.10. The default order of the sequences in the output alignment file has beenchanged. Previously the default was to output the sequences in the same orderas the input file. Now the default is to use the order in which the sequenceswere aligned (from the guide tree/dendrogram), thus automatically groupingclosely related sequences.11. The option to 'Reset Gaps between alignments' has been switched off bydefault.12. The conservation line output in the clustal format alignment file has beenchanged. Three characters are now used:'*' indicates positions which have a single, fully conserved residue':' indicates that one of the following 'strong' groups is fully conserved:- STA NEQK NHQK NDEQ QHRK MILV MILF HY FYW'.' indicates that one of the following 'weaker' groups is fully conserved:- CSA ATV SAG STNK STPA SGND SNDEQK NDEQHK NEQHRK FVLIM HFYThese are all the positively scoring groups that occur in the Gonnet Pam250matrix. The strong and weak groups are defined as strong score >0.5 and weakscore =<0.5 respectively.13. A bug in the modification of the Myers and Miller alignment algorithmfor residue-specific gap penalites has been fixed. This occasionally causednew gaps to be opened a few residues away from the optimal position.14. The GCG/MSF input format no longer needs the word PILEUP on the firstline. Several versions can now be recognised:- 1. The word PILEUP as the first word in the file 2. The word !!AA_MULTIPLE_ALIGNMENT or !!NA_MULTIPLE_ALIGNMENT as the first word in the file 3. The characters MSF on the first line in the line, and the characters .. at the end of the line.15. The standard command line separator for UNIX systems has been changed from'/' to '-'. ie. to give options on the command line, you now type clustalw input.aln -gapopen=8.0instead of clustalw input.aln /gapopen=8.0 ATTENTION SOFTWARE DEVELOPERS!! -------------------------------The CLUSTAL sequence alignment output format was modified from version 1.7:1. Names longer than 10 chars are now allowed. (The maximum is specified inclustalw.h by '#define MAXNAMES'.)2. The consensus line now consists of three characters: '*',':' and '.'. (Onlythe '*' and '.' were previously used.)3. An option (not the default) has been added, allowing the user to print outsequence numbers at the end of each line of the alignment output.4. Both RNA bases (U) and base ambiguities are now supported in nucleic acidsequences. In the past, all characters (upper or lower case) other thana,c,g,t or u were converted to N. Now the following characters are recognised and retained in the alignment output: ABCDGHKMNRSTUVWXY (upper or lower case).5. A Blank line inadvertently added in the version 1.6 header has been takenout again. CLUSTAL REFERENCES ------------------Details of algorithms, implementation and useful tips on usage of Clustalprograms can be found in the following publications:Jeanmougin,F., Thompson,J.D., Gouy,M., Higgins,D.G. and Gibson,T.J. (1998)Multiple sequence alignment with Clustal X. Trends Biochem Sci, 23, 403-5.Thompson,J.D., Gibson,T.J., Plewniak,F., Jeanmougin,F. and Higgins,D.G. (1997)The ClustalX windows interface: flexible strategies for multiple sequence alignment aided by quality analysis tools. Nucleic Acids Research, 24:4876-4882.Higgins, D. G., Thompson, J. D. and Gibson, T. J. (1996) Using CLUSTAL formultiple sequence alignments. Methods Enzymol., 266, 383-402.Thompson, J.D., Higgins, D.G. and Gibson, T.J. (1994) CLUSTAL W: improving thesensitivity of progressive multiple sequence alignment through sequenceweighting, positions-specific gap penalties and weight matrix choice. NucleicAcids Research, 22:4673-4680.Higgins,D.G., Bleasby,A.J. and Fuchs,R. (1992) CLUSTAL V: improved software formultiple sequence alignment. CABIOS 8,189-191.Higgins,D.G. and Sharp,P.M. (1989) Fast and sensitive multiple sequencealignments on a microcomputer. CABIOS 5,151-153.Higgins,D.G. and Sharp,P.M. (1988) CLUSTAL: a package for performing multiplesequence alignment on a microcomputer. Gene 73,237-244.
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -