📄 clustalw.doc

📁 在任务级并行平台P2HP上
💻 DOC
📖 第 1 页 / 共 3 页
字号:
1) ported to MAC and PC.  These versions are quite slow unless you
have a nice beefy machine.  On a Power Mac or a Pentium box
it is nice and fast.  Two precompiled versions are supplied for Macs
(Power mac and old mac versions).
Mac:       1500 residues by 100 sequences
Power Mac  3000    "     "   "     "
PC         1500    "     "   "     "

2) alignment of new sequences to an alignment.  Fixed a serious bug
which assigned weights to the wrong sequences.  Now also, weights 
sequences according to distance from the incoming sequence.  The
new weights are: tree weights * similarity to incoming sequence.
The tree weights are the old weights that we derive from the tree
connecting all the sequences in the existing alignment.

3) for all platforms, output linelength = 60.

4) Bootstrap files (*.phb): the "final" node (arbitrary trichotomy
at the end of the neighbor-joining process) is labelled as 
TRICHOTOMY in the bootstrap output files.  This is to help
link bootstrap figures with nodes when you reroot the tree.

5) Command line /bootstrap option now more robust.

--------------------------------------------------------------
INTRODUCTION



This document gives some BRIEF notes about usage of the Clustal W
multiple alignment program for UNIX and VMS machines.  Clustal W
is a major update and rewrite of the Clustal V program which 
was described in:

Higgins, D.G., Bleasby, A.J. and Fuchs, R. (1992)
CLUSTAL V: improved software for multiple sequence alignment.
Computer Applications in the Biosciences (CABIOS), 8(2):189-191.

The main new features are a greatly improved (more sensitive)
multiple alignment procedure for proteins and improved support
for different file formats.  This software was described in:

Thompson, J.D., Higgins, D.G. and Gibson, T.J. (1994)
CLUSTAL W: improving the sensitivity of progressive multiple
sequence alignment through sequence weighting, position specific
gap penalties and weight matrix choice.
Nucleic Acids Research, 22(22):4673-4680.


The usage of Clustal W is largely the same as for
Clustal V details of which are described in clustalv.doc.  Details of the
new alignment algorithms are described in the manuscript by
Thompson et. al. above, an ascii/text version of which is included 
(clustalw.ms). This file lists some of the details not covered by either 
of the above documents.


There are brief notes on the following topics:

1) Installation for VMS and UNIX and MAC and PC
2) File input
3) file output
4) changes to the alignment algorithms
5) minor modifications to the phylogenetic tree and bootstrapping methods
6) summary of the command line usage.

-------------------------------------------------------------------

1) INSTALLATION    (for Unix, VAX/VMS, PC and MAC)



*****IMPORTANT*****
If you wish to recompile the program (or compile it for the first
time; you will have to do this with UNIX):
first check the file CLUSTALW.H which needs to be changed if you
move the code from between unix and vms machines.  At the top
of the file are four lines which define one of VMS, MSDOS, MAC or
UNIX to be 1.  All of these EXCEPT one must be commented out
using enclosed /* ... */.  
*******************


Unix
-----

Make files are supplied for unix machines.  The code was compiled and
tested using Decstation (Ultrix), SUN (Gnu C compiler/gcc), Silicon
Graphics (IRIX) and DEC/Alpha (OSF1).  We have not tested the code on any other
systems.  Just use makefile to make on most systems.  For Sun, you need to
have the Gnuc C (gcc) compiler installed ... use the file makefile.sun in this
case.  You make the program with:
make  (or make -f makefile.sun)

This produces the file clustalw which can be run by typing clustalw and
pressing return.  The help file is called clustalw_help


VMS
----

There is a small DCL command file (VMSLINK.COM) to compile and link the
code for VMS machines (vax or alpha).  This procedure just compiles the
source files and links using default settings.  Run it using:
$ @vmslink
This produces Clustalw.exe which can be run using the run command:
$ run clustalw

The intermediate object files can be deleted with:
$ del *.obj;

There is an extensive command line facility.  To use this, you must
create a symbol to run the program (and put this in your login.com file).
e.g.
$ clustalw :== $$drive:[dir.dir]clustalw
where $drive is the drive on which the executable file is stored (clustalw.exe)
and [dir.dir] is the full directory specification.  NOTE THE EXTRA DOLLAR SIGN.
Then the program can be run using the command:
$ clustalw


PC
__

We supply an executable file (Clustalw.exe) which will run using MSDOS.
It will also run under windows (as a DOS application) 
*** IF you have a maths coprocessor***.  If you do not have a maths chip 
(e.g. 80387), the program can only be run under MSDOS.  In the latter case, 
you must have the file EMU387.exe in the same directory as CLUSTALW.EXE.  
This file emulates a maths chip if you do not have one.  


We generated the executable file using gnu c for MSDOS. 
It will also compile (with about 10,000 warning messages)
using Microsoft C but we have not tested it and there appear to be problems
with the executable. 

You will need to use a "memory extender" to allow the program to get at more 
than 640kb of memory.



MAC
---

The code compiles for Power Mac and older macs using Metroworks Codewarrior
C compiler.  We supply 2 executable programs (one each for PowerMac and
older mac): ClustalwPPC and Clustalw68k).  These need up to
10mb of memory to run which needs to be adjusted with the Get Info (%I)
command from the Finder if you have problems.  Just double click the 
executable file name or icon and off you go (we hope).

As a special treat for Mac users, we supply an executable and brief readme
file for NJPLOT.   This is a really nice program by Manolo Gouy
(University of Lyon, France) that allows you to import the trees
made by Clustal W and display them/manipulate them.  It will properly
display the bootstrap figures from the *.phb files.  It can export the
trees in PICT format which can then be used by MacDraw for example.


-------------------------------------------------------------------------

2) FILE INPUT (sequences to be aligned)



The sequences must all be in one file (or two files for a "profile alignment")
in ONE of the following formats:

FASTA (Pearson), NBRF/PIR, EMBL/Swiss Prot, GDE, CLUSTAL, GCG/MSF, GCG9/RSF.

The program tries to "guess" which format is being used and whether
the sequences are nucleic acid (DNA/RNA) or amino acid (proteins).  The
format is recognised by the first characters in the file.  This is kind
of stupid/crude but works most of the time and it is difficult
to do reliably, any other way.


Format           First non blank word or character in the file.
...............................................................
FASTA            >
NBRF             >P1;  or >D1;
EMBL/SWISS       ID
GDE protein      % 
GDE nucleotide   # 
CLUSTAL          CLUSTAL (blocked multiple alignments)
GCG/MSF          PILEUP  or !!AA_MULTIPLE_ALIGNMENT or !!NA_MULTIPLE_ALIGNMENT
                 or MSF on the first line, and '..' at the end of line
GCG9/RSF         !!RICH_SEQUENCE

Note, that the only way of spotting that a file is MSF format is if
the word PILEUP appears at the very beginning of the file.  If you 
produce this format from software other than the GCG pileup program,
then you will have to insert the word PILEUP at the start of the file.
Similarly, if you use clustal format, the word CLUSTAL must appear first.

All of these formats can be used to read in AN EXISTING FULL ALIGNMENT.
With CLUSTAL format, this is just the same as the output format of this
program and Clustal V.  If you use PILEUP or CLUSTAL format, all sequences
must be the same length, INCLUDING GAPS ("-" in clustal format; "." in MSF).
With the other formats, sequences can be gapped with "-" characters.  If you
read in any gaps these are kept during any later alignments.  You can use
this facility to read in an alignment in order to calculate a phylogenetic
tree OR to output the same alignment in a different format (from the
output format options menu of the multiple alignment menu) e.g. read
in a GCG/MSF format alignment and output a PHYLIP format alignment. This is 
also useful to read in one reference alignment and to add one or more new 
sequences to it using the "profile alignment" facilities.

DNA vs. PROTEIN:  the program will count the number of A,C,G,T,U and N
charcters.  If 85% or more of the characters in a sequence are as above,
then DNA/RNA is assumed, protein otherwise.  

-------------------------------------------------------------------------


3) FILE OUTPUT 


1) the alignments.

In the multiple alignment and profile alignment menus, there is a menu
item to control the output format(s).

The alignment output format can be set to any (or all) of:
CLUSTAL  (a self explanatory blocked alignment)
NBRF/PIR (same as input format but with "-" characters for gaps)
MSF      (the main GCG package multiple alignment format)
PHYLIP   (Joe Felsenstein's phylogeny inference package.  Gaps are set to
         "-" characters.  For some programs (e.g. PROTPARS/DNAPARS) these 
         should be changed to "?" characters for unknown residues.
GDE      (Used by Steven Smith's GDE package)

You can also choose between having the sequences in the same order as in 
the input file or writing them out in an order that more closely matches the 
order used to carry out the multiple alignment.


2) The trees.

Believe it or not, we now use the New Hampshire (nested parentheses)
format as default for our trees.  This format is compatible with e.g. the
PHYLIP package.  If you want to view a tree, you can use the RETREE or 
DRAWGRAM/DRAWTREE programs of PHYLIP.  This format is used for all our 
trees, even the initial guide trees for deciding the order of multiple
alignment.  The output trees from the phylogenetic tree menu can also be
requested in our old verbose/cryptic format.  This may be more useful
if, for example, you wish to see the bootstrap figures.  The bootstrap
trees in the default New Hampshire format give the bootstrap figures
as extra labels which can be viewed very easily using TREETOOL which is
available as part of the GDE package.  TREETOOL is available from the
RDP project by ftp from rdp.life.uiuc.edu.
💿 文件大小 576 K
👤 上传用户 as7512158
📂 所属分类并行计算
🏷️ 相关标签

#P2HP #并行
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -