📄 clustalv.doc
字号:
Clustal V Multiple Sequence Alignments.
Documentation (Installation and Usage).
Des Higgins
European Molecular Biology Laboratory
Postfach 10.2209
D-6900 Heidelberg
Germany.
higgins@EMBL-Heidelberg.DE
*******************************************************************
Contents.
1 Overview
2 Installation
3 Interactive usage
4 Command-line interface
5 Algorithms and references
*******************************************************************
1. Overview
This document describes how to install and use ClustalV on various
machines. ClustalV is a complete upgrade and rewrite of the Clustal
package of multiple alignment programs (Higgins and Sharp, 1988 and
1989). The original programs were written in Fortran for
microcomputers running MSDOS. You carried out a complete alignment
by running 3 programs in succession. Later, these were merged into
a single menu driven program with on-line help, for VAX/VMS.
ClustalV was written in C and has all of the features of the old
programs plus many new ones. It has been compiled and tested using
VAX/VMS C, Decstation ULTRIX C, Gnu C for Sun workstations, Turbo C
for IBM PC's and Think C for Apple Mac's. The original Clustal was
written by Des Higgins while he was a Post-Doc in the lab of Paul
Sharp in the Genetics Department, Trinity College, Dublin 2,
Ireland.
The main feature of the old package was the ability to carry out
reliable multiple alignments of many sequences. The sensitivity of
the program is as good as from any other program we have tried, with
the exception of the programs of Vingron and Argos (1991), while it
works in reasonable time on a microcomputer. The programs of
Vingron and Argos are specialised for finding distant similarities
between proteins but require mainframes or workstations and are more
difficult to use.
The main new features are: profile alignments (alignments of old
alignments); phylogenetic trees (Neighbor Joining trees calculated
after multiple alignment with a bootstrapping option); better
sequence input (automatically recognise and read NBRF/PIR, Pearson
(Fasta) or EMBL/SwissProt formats); flexible alignment output
(choose one of: old Clustal format, NBRF/PIR, GCG msf format or
Phylip format); full command line interface (everything that you can
do interactively can be specified on the command line).
In version 7 of the GCG package, there is a program called PILEUP
which uses a very similar algorithm to the one in ClustalV. There
are 2 main differences between the programs: 1) the metric used to
compare the sequences for the initial "guide tree" uses a full
global, optimal alignment in PILEUP instead of the fast, approximate
ones in ClustalV. This makes PILEUP much slower for the comparison
of long sequences. In principle, the distances calculated from
PILEUP will be more sensitive than ours, but in practice it will not
make much difference, except in difficult cases. 2) During the
multiple alignment, terminal gaps are penalised in ClustalV but not
in PILEUP. This will make the PILEUP alignments better when the
sequences are of very different lengths (has no effect if there are
no large terminal gaps).
This software may be distributed and used freely, provided that you
do not modify it or this documentation in any way without the
permission of the authors.
If you wish to refer to ClustalV, please cite:
Higgins,D.G. Bleasby,A.J. and Fuchs,R. (1991) CLUSTAL V: improved software
for multiple sequence alignment. CABIOS, vol .8, 189-191.
The overall multiple alignment algorithm was described in:
Higgins,D.G. and Sharp,P.M. (1989). Fast and sensitive multiple
sequence alignments on a microcomputer. CABIOS, vol. 5, 151-153.
ACKNOWLEDGEMENTS.
D.H. would particularly like to thank Paul Sharp, in whose lab. this
work originated. We also thank Manolo Gouy, Gene Myers, Peter Rice
and Martin Vingron for suggestions, bug-fixes and help.
Des Higgins and Rainer Fuchs,
EMBL Data Library, Heidelberg, Germany.
Alan Bleasby,
Daresbury, UK.
JUNE 1991
*******************************************************************
2. Installation.
As far as possible, we have tried to make ClustalV portable to any
machine with a standard C compiler (proposed ANSI C standard). The
source code, as supplied by us, has been compiled and tested using
the following compilers:
VAX/VMS C
Ultrix C (on a Decstation 2100)
Gnu C on a Sun 4 workstation
Think C on an Apple Macintosh SE
Turbo C on an IBM AT.
In each case, one must make 1 change to 1 line of code in 1 header
file. This is described below. The exact capacity of the program
(how many sequences of what length can be aligned) will depend of
course on available memory but can also be set in this header file.
The package comes as 9 C source files; 3 header files; 1 file of on-
line help; this documentation file; 3 make files:
Source code: clustalv.c, amenu.c, gcgcheck.c, myers.c, sequence.c,
showpair.c, trees.c, upgma.c, util.c
Header files: clustalv.h, general.h, matrices.h
On-Line help: clustalv.hlp (must be renamed or defined as
clustalv_help except on PC's)
Documentation: clustalv.doc (this file).
Makefiles: makefile.sun (gnu c on Sun), vmslink.com (vax/vms),
makefile.ult (ultrix).
Before compiling ClustalV you must look at and possibly change
clustalV.h, shown below..
/*******************CLUSTALV.H********************************/
/*
Main header file for ClustalV. Uncomment ONE of the following lines
depending on which compiler you wish to use.
*/
#define VMS 1 /* VAX VMS */
/*#define MAC 1 Think_C for MacIntosh */
/*#define MSDOS 1 Turbo C for PC's */
/*#define UNIX 1 Ultrix for Decstations or Gnu C for Sun */
/*************************************************************/
#include "general.h"
#define MAXNAMES 10
#define MAXTITLES 60
#define FILENAMELEN 256
#define UNKNOWN 0
#define EMBLSWISS 1
#define PIR 2
#define PEARSON 3
#define PAGE_LEN 22
#if VMS
#define DIRDELIM ']'
#define MAXLEN 3000
#define MAXN 150
#define FSIZE 15000
#define LINELENGTH 60
#define GCG_LINELENGTH 50
#elif MAC
#define DIRDELIM ':'
#define MAXLEN 2600
#define MAXN 30
#define FSIZE 10000
#define LINELENGTH 50
#define GCG_LINELENGTH 50
#elif MSDOS
#define DIRDELIM '\\'
#define MAXLEN 1300
#define MAXN 30
#define FSIZE 5000
#define LINELENGTH 50
#define GCG_LINELENGTH 50
#elif UNIX
#define DIRDELIM '/'
#define MAXLEN 3000
#define MAXN 50
#define FSIZE 15000
#define LINELENGTH 60
#define GCG_LINELENGTH 50
#endif
/*****************end*of*CLUSTALV.H***************************/
First, you must remove the comments from one of the first 10 lines.
There are 4 'define' compiler directives here (e.g. #define VMS 1),
and you should use one of these, depending on which system you wish
to work. So choose one of these, remove its comments (if it is
already commented out) and put comments around any of the others
that are still active. If you wish to use a different system, you
will need to insert a new line with a new keyword (which you must
invent) to identify your system. Most of the rest of this header
file is taken up with a block of 'define' statements for each system
type; e.g. the VAX/VMS block is:
#if VMS
#define DIRDELIM ']'
#define MAXLEN 3000
#define MAXN 150
#define FSIZE 15000
#define LINELENGTH 60
#define GCG_LINELENGTH 50
In this block, you can specify the maximum number of sequences to be
allowed (MAXN); the maximum sequence length, including gaps
(MAXLEN); FSIZE declares the size of some workspace, used by the
fast 2 sequence comparison routines and should be APPROXIMATELY 4
times MAXLEN; LINELENGTH is the length of the blocks of alignment
output in the output files; GCG_LINELENGTH is the same but for the
GCG compatible output only. Finally, DIRDELIM is the character used
to specify directories and subdirectories in file names. It should
be the character used to seperate the file name itself from the
directory name (e.g. in VMS, file names are like:
$drive:[dir1.dir2.dir3]filename.ext;2 so ']' is used as DIRDELIM).
So, if you want to use a system, not covered in Clustalv.h, you will
have to insert a new block, like the above one. To compile and link
the program, we supply 3 makefiles: one each for VAX/VMS, Ultrix
and GNU C for Sun workstations.
VAX/VMS
Compile and link the program with the
supplied makefile for vms: vmslink.com .
$ @vmslink
This will produce clustalv.exe (and a lot of .obj files which you can delete).
The on-line help file (clustalv.hlp) should be 'defined' as
clustalv_help as follows:
$ def clustalv_help $drive:[dir1.dir2]clustalv.hlp
where $drive is the drive designation and [dir1.dir2] is the
directory where clustalv.hlp is kept.
To make use of the command-line interface, you must make clustalv a
'foreign' command with:
$ clustalv :== $$drive:[dir1.dir2]clustalv
where $drive is the drive designation and [dir1.dir2] is the
directory where clustalv.exe is kept.
IBM PC/MSDOS/TURBO C
Create a makefile (something.prj) with the names of the source files
(clustalv.c, amenu.c etc.) and 'make' this using the HUGE memory
model. You will get half a dozen warnings from the compiler about
pieces of code than look suspicious to it but ignore these. The
help file should remain as clustalv.hlp . To run the program using
the default settings in Clustalv.h, you need approximately 500k of
memory. To reduce this, the main influence on memory usage is the
parameter MAXLEN; reduce MAXLEN to reduce memory usage.
Apple Mac/THINK_C version 4.0.2
This version of the program is not at all Mac like. It runs in a
window, the inside of which looks just like a normal character based
terminal. In the future we might put a proper Mac interface on it
but do not have the time right now. With the default settings in
the header file ClustalV.h, you need just over 800k of memory to run
the program. To reduce this, reduce MAXLEN; this is easily the
biggest influence on memory usage. To compile the program and save
it as an application you need to 'set the application type'; here
you specify how much memory (in kilobytes (k)) the application will
need. You should set this to 900k to run the application as it is
OR reduce MAXLEN in the header. To compile the program you have to
create a 'project'; you 'add' the names of the 9 source files to the
project AND the name of the ANSI library. The source code is too
large to compile in one compilation unit. You will get a 'link
error: code segment too big' if you try to compile and link as is.
You should compile amenu.c (the biggest source file) as a seperate
unit ..... you will have to read the manual/ask someone/mail me to
find out what this is.
*******************************************************************
3. Interactive usage.
Interactive usage of Clustal V is completely menu driven. On-line
help is provided, defaults are offered for all parameters and file
names. With a little effort it should be completely self
explanatory. The main menu, which appears when you run the
programs is shown below. Each item brings you to a sub menu.
Main menu for Clustal V:
1. Sequence Input From Disc
2. Multiple Alignments
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -