⭐ 欢迎来到虫虫下载站! | 📦 资源下载 📁 资源专辑 ℹ️ 关于我们
⭐ 虫虫下载站

📄 clustalw.doc

📁 生物序列比对程序clustw的源代码
💻 DOC
📖 第 1 页 / 共 3 页
字号:
1) ported to MAC and PC.  These versions are quite slow unless youhave a nice beefy machine.  On a Power Mac or a Pentium boxit is nice and fast.  Two precompiled versions are supplied for Macs(Power mac and old mac versions).Mac:       1500 residues by 100 sequencesPower Mac  3000    "     "   "     "PC         1500    "     "   "     "2) alignment of new sequences to an alignment.  Fixed a serious bugwhich assigned weights to the wrong sequences.  Now also, weights sequences according to distance from the incoming sequence.  Thenew weights are: tree weights * similarity to incoming sequence.The tree weights are the old weights that we derive from the treeconnecting all the sequences in the existing alignment.3) for all platforms, output linelength = 60.4) Bootstrap files (*.phb): the "final" node (arbitrary trichotomyat the end of the neighbor-joining process) is labelled as TRICHOTOMY in the bootstrap output files.  This is to helplink bootstrap figures with nodes when you reroot the tree.5) Command line /bootstrap option now more robust.--------------------------------------------------------------INTRODUCTIONThis document gives some BRIEF notes about usage of the Clustal Wmultiple alignment program for UNIX and VMS machines.  Clustal Wis a major update and rewrite of the Clustal V program which was described in:Higgins, D.G., Bleasby, A.J. and Fuchs, R. (1992)CLUSTAL V: improved software for multiple sequence alignment.Computer Applications in the Biosciences (CABIOS), 8(2):189-191.The main new features are a greatly improved (more sensitive)multiple alignment procedure for proteins and improved supportfor different file formats.  This software was described in:Thompson, J.D., Higgins, D.G. and Gibson, T.J. (1994)CLUSTAL W: improving the sensitivity of progressive multiplesequence alignment through sequence weighting, position specificgap penalties and weight matrix choice.Nucleic Acids Research, 22(22):4673-4680.The usage of Clustal W is largely the same as forClustal V details of which are described in clustalv.doc.  Details of thenew alignment algorithms are described in the manuscript byThompson et. al. above, an ascii/text version of which is included (clustalw.ms). This file lists some of the details not covered by either of the above documents.There are brief notes on the following topics:1) Installation for VMS and UNIX and MAC and PC2) File input3) file output4) changes to the alignment algorithms5) minor modifications to the phylogenetic tree and bootstrapping methods6) summary of the command line usage.-------------------------------------------------------------------1) INSTALLATION    (for Unix, VAX/VMS, PC and MAC)*****IMPORTANT*****If you wish to recompile the program (or compile it for the firsttime; you will have to do this with UNIX):first check the file CLUSTALW.H which needs to be changed if youmove the code from between unix and vms machines.  At the topof the file are four lines which define one of VMS, MSDOS, MAC orUNIX to be 1.  All of these EXCEPT one must be commented outusing enclosed /* ... */.  *******************Unix-----Make files are supplied for unix machines.  The code was compiled andtested using Decstation (Ultrix), SUN (Gnu C compiler/gcc), SiliconGraphics (IRIX) and DEC/Alpha (OSF1).  We have not tested the code on any othersystems.  Just use makefile to make on most systems.  For Sun, you need tohave the Gnuc C (gcc) compiler installed ... use the file makefile.sun in thiscase.  You make the program with:make  (or make -f makefile.sun)This produces the file clustalw which can be run by typing clustalw andpressing return.  The help file is called clustalw_helpVMS----There is a small DCL command file (VMSLINK.COM) to compile and link thecode for VMS machines (vax or alpha).  This procedure just compiles thesource files and links using default settings.  Run it using:$ @vmslinkThis produces Clustalw.exe which can be run using the run command:$ run clustalwThe intermediate object files can be deleted with:$ del *.obj;There is an extensive command line facility.  To use this, you mustcreate a symbol to run the program (and put this in your login.com file).e.g.$ clustalw :== $$drive:[dir.dir]clustalwwhere $drive is the drive on which the executable file is stored (clustalw.exe)and [dir.dir] is the full directory specification.  NOTE THE EXTRA DOLLAR SIGN.Then the program can be run using the command:$ clustalwPC__We supply an executable file (Clustalw.exe) which will run using MSDOS.It will also run under windows (as a DOS application) *** IF you have a maths coprocessor***.  If you do not have a maths chip (e.g. 80387), the program can only be run under MSDOS.  In the latter case, you must have the file EMU387.exe in the same directory as CLUSTALW.EXE.  This file emulates a maths chip if you do not have one.  We generated the executable file using gnu c for MSDOS. It will also compile (with about 10,000 warning messages)using Microsoft C but we have not tested it and there appear to be problemswith the executable. You will need to use a "memory extender" to allow the program to get at more than 640kb of memory.MAC---The code compiles for Power Mac and older macs using Metroworks CodewarriorC compiler.  We supply 2 executable programs (one each for PowerMac andolder mac): ClustalwPPC and Clustalw68k).  These need up to10mb of memory to run which needs to be adjusted with the Get Info (%I)command from the Finder if you have problems.  Just double click the executable file name or icon and off you go (we hope).As a special treat for Mac users, we supply an executable and brief readmefile for NJPLOT.   This is a really nice program by Manolo Gouy(University of Lyon, France) that allows you to import the treesmade by Clustal W and display them/manipulate them.  It will properlydisplay the bootstrap figures from the *.phb files.  It can export thetrees in PICT format which can then be used by MacDraw for example.-------------------------------------------------------------------------2) FILE INPUT (sequences to be aligned)The sequences must all be in one file (or two files for a "profile alignment")in ONE of the following formats:FASTA (Pearson), NBRF/PIR, EMBL/Swiss Prot, GDE, CLUSTAL, GCG/MSF, GCG9/RSF.The program tries to "guess" which format is being used and whetherthe sequences are nucleic acid (DNA/RNA) or amino acid (proteins).  Theformat is recognised by the first characters in the file.  This is kindof stupid/crude but works most of the time and it is difficultto do reliably, any other way.Format           First non blank word or character in the file................................................................FASTA            >NBRF             >P1;  or >D1;EMBL/SWISS       IDGDE protein      % GDE nucleotide   # CLUSTAL          CLUSTAL (blocked multiple alignments)GCG/MSF          PILEUP  or !!AA_MULTIPLE_ALIGNMENT or !!NA_MULTIPLE_ALIGNMENT                 or MSF on the first line, and '..' at the end of lineGCG9/RSF         !!RICH_SEQUENCENote, that the only way of spotting that a file is MSF format is ifthe word PILEUP appears at the very beginning of the file.  If you produce this format from software other than the GCG pileup program,then you will have to insert the word PILEUP at the start of the file.Similarly, if you use clustal format, the word CLUSTAL must appear first.All of these formats can be used to read in AN EXISTING FULL ALIGNMENT.With CLUSTAL format, this is just the same as the output format of thisprogram and Clustal V.  If you use PILEUP or CLUSTAL format, all sequencesmust be the same length, INCLUDING GAPS ("-" in clustal format; "." in MSF).With the other formats, sequences can be gapped with "-" characters.  If youread in any gaps these are kept during any later alignments.  You can usethis facility to read in an alignment in order to calculate a phylogenetictree OR to output the same alignment in a different format (from theoutput format options menu of the multiple alignment menu) e.g. readin a GCG/MSF format alignment and output a PHYLIP format alignment. This is also useful to read in one reference alignment and to add one or more new sequences to it using the "profile alignment" facilities.DNA vs. PROTEIN:  the program will count the number of A,C,G,T,U and Ncharcters.  If 85% or more of the characters in a sequence are as above,then DNA/RNA is assumed, protein otherwise.  -------------------------------------------------------------------------3) FILE OUTPUT 1) the alignments.In the multiple alignment and profile alignment menus, there is a menuitem to control the output format(s).The alignment output format can be set to any (or all) of:CLUSTAL  (a self explanatory blocked alignment)NBRF/PIR (same as input format but with "-" characters for gaps)MSF      (the main GCG package multiple alignment format)PHYLIP   (Joe Felsenstein's phylogeny inference package.  Gaps are set to         "-" characters.  For some programs (e.g. PROTPARS/DNAPARS) these          should be changed to "?" characters for unknown residues.GDE      (Used by Steven Smith's GDE package)You can also choose between having the sequences in the same order as in the input file or writing them out in an order that more closely matches the order used to carry out the multiple alignment.2) The trees.Believe it or not, we now use the New Hampshire (nested parentheses)format as default for our trees.  This format is compatible with e.g. thePHYLIP package.  If you want to view a tree, you can use the RETREE or DRAWGRAM/DRAWTREE programs of PHYLIP.  This format is used for all our trees, even the initial guide trees for deciding the order of multiplealignment.  The output trees from the phylogenetic tree menu can also berequested in our old verbose/cryptic format.  This may be more usefulif, for example, you wish to see the bootstrap figures.  The bootstraptrees in the default New Hampshire format give the bootstrap figuresas extra labels which can be viewed very easily using TREETOOL which isavailable as part of the GDE package.  TREETOOL is available from theRDP project by ftp from rdp.life.uiuc.edu.  

⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -