📄 clustalv.doc

📁 clustalw1.83.DOS.ZIP,用于多序列比对的软件
💻 DOC
📖 第 1 页 / 共 5 页
字号:
12 3 4 5 下一页



		Clustal V  Multiple Sequence Alignments.

		Documentation (Installation and Usage).

		Des Higgins
		European Molecular Biology Laboratory
		Postfach 10.2209
		D-6900 Heidelberg
		Germany.

		higgins@EMBL-Heidelberg.DE


*******************************************************************


		Contents.


		1		Overview

		2		Installation

		3		Interactive usage

		4		Command-line interface

		5		Algorithms and references


*******************************************************************

		1.  Overview

This document describes how to install and use ClustalV on various 
machines.  ClustalV is a complete upgrade and rewrite of the Clustal 
package of multiple alignment programs (Higgins and Sharp, 1988 and 
1989).   The original programs were written in Fortran for 
microcomputers running MSDOS.   You carried out a complete alignment 
by running 3 programs in succession.   Later, these were merged into 
a single menu driven program with on-line help, for VAX/VMS.  
ClustalV was written in C and has all of the features of the old 
programs plus many new ones.  It has been compiled and tested using 
VAX/VMS C, Decstation ULTRIX C, Gnu C for Sun workstations, Turbo C 
for IBM PC's and Think C for Apple Mac's.   The original Clustal was 
written by Des Higgins while he was a Post-Doc in the lab of Paul 
Sharp in the Genetics Department, Trinity College, Dublin 2, 
Ireland. 

The main feature of the old package was the ability to carry out 
reliable multiple alignments of many sequences.  The sensitivity of 
the program is as good as from any other program we have tried, with 
the exception of the programs of Vingron and Argos (1991), while it 
works in reasonable time on a microcomputer.  The programs of 
Vingron and Argos are specialised for finding distant similarities 
between proteins but require mainframes or workstations and are more 
difficult to use.

The main new features are: profile alignments (alignments of old 
alignments); phylogenetic trees (Neighbor Joining trees calculated 
after multiple alignment with a bootstrapping option); better 
sequence input (automatically recognise and read NBRF/PIR, Pearson 
(Fasta) or EMBL/SwissProt formats); flexible alignment output 
(choose one of: old Clustal format, NBRF/PIR, GCG msf format or 
Phylip format); full command line interface (everything that you can 
do interactively can be specified on the command line).

In version 7 of the GCG package, there is a program called PILEUP 
which uses a very similar algorithm to the one in ClustalV.  There 
are 2 main differences between the programs: 1) the metric used to 
compare the sequences for the initial "guide tree" uses a full 
global, optimal alignment in PILEUP instead of the fast, approximate 
ones in ClustalV.  This makes PILEUP much slower for the comparison 
of long sequences.  In principle, the distances calculated from 
PILEUP will be more sensitive than ours, but in practice it will not 
make much difference, except in difficult cases.  2)  During the 
multiple alignment, terminal gaps are penalised in ClustalV but not 
in PILEUP.  This will make the PILEUP alignments better when the 
sequences are of very different lengths (has no effect if there are 
no large terminal gaps).   


This software may be distributed and used freely, provided that you 
do not modify it or this documentation in any way without the 
permission of the authors.  

If you wish to refer to ClustalV, please cite: 
Higgins,D.G. Bleasby,A.J. and Fuchs,R. (1991) CLUSTAL V: improved software
for multiple sequence alignment. CABIOS, vol .8, 189-191.  

The overall multiple alignment algorithm was described in:
Higgins,D.G. and Sharp,P.M. (1989).  Fast and sensitive multiple 
sequence alignments on a microcomputer.  CABIOS, vol. 5, 151-153.


ACKNOWLEDGEMENTS.

D.H. would particularly like to thank Paul Sharp, in whose lab. this 
work originated.  We also thank Manolo Gouy, Gene Myers, Peter Rice 
and Martin Vingron for suggestions, bug-fixes and help.    

Des Higgins and Rainer Fuchs, 
EMBL Data Library, Heidelberg, Germany.

Alan Bleasby,  
Daresbury, UK.

JUNE 1991
*******************************************************************

		2.  Installation.



As far as possible, we have tried to make ClustalV portable to any 
machine with a standard C compiler (proposed ANSI C standard).  The 
source code, as supplied by us, has been compiled and tested using 
the following compilers:

VAX/VMS C
Ultrix C (on a Decstation 2100)
Gnu C on a Sun 4 workstation
Think C on an Apple Macintosh SE
Turbo C on an IBM AT.

In each case, one must make 1 change to 1 line of code in 1 header 
file.  This is described below.  The exact capacity of the program 
(how many sequences of what length can be aligned) will depend of 
course on available memory but can also be set in this header file.

The package comes as 9 C source files; 3 header files; 1 file of on-
line help; this documentation file; 3 make files:

Source code:	clustalv.c, amenu.c, gcgcheck.c, myers.c, sequence.c, 
			showpair.c, trees.c, upgma.c, util.c

Header files:	clustalv.h, general.h, matrices.h

On-Line help:	clustalv.hlp  (must be renamed or defined as 		
			clustalv_help except on PC's)

Documentation:	clustalv.doc (this file).

Makefiles:	makefile.sun (gnu c on Sun), vmslink.com (vax/vms), 
			makefile.ult (ultrix).







Before compiling ClustalV you must look at and possibly change 
clustalV.h, shown below..  

/*******************CLUSTALV.H********************************/

/*
Main header file for ClustalV. Uncomment ONE of the following lines
depending on which compiler you wish to use.
*/

#define VMS 1             /* VAX VMS */

/*#define MAC 1           Think_C for MacIntosh */

/*#define MSDOS 1         Turbo C for PC's */

/*#define UNIX 1          Ultrix for Decstations or Gnu C for Sun */

/*************************************************************/

#include "general.h"

#define MAXNAMES          10
#define MAXTITLES         60
#define FILENAMELEN      256

#define UNKNOWN   0
#define EMBLSWISS 1
#define PIR       2
#define PEARSON   3

#define PAGE_LEN       22

#if VMS
#define DIRDELIM ']'
#define MAXLEN          3000
#define MAXN             150
#define FSIZE          15000
#define LINELENGTH        60
#define GCG_LINELENGTH    50

#elif MAC
#define DIRDELIM ':'
#define MAXLEN          2600
#define MAXN              30
#define FSIZE          10000
#define LINELENGTH        50
#define GCG_LINELENGTH    50

#elif MSDOS
#define DIRDELIM '\\'
#define MAXLEN          1300
#define MAXN              30
#define FSIZE           5000
#define LINELENGTH        50
#define GCG_LINELENGTH    50

#elif UNIX
#define DIRDELIM '/'
#define MAXLEN         3000
#define MAXN             50
#define FSIZE         15000
#define LINELENGTH       60
#define GCG_LINELENGTH   50
#endif
/*****************end*of*CLUSTALV.H***************************/



First, you must remove the comments from one of the first 10 lines.  
There are 4 'define' compiler directives here (e.g. #define VMS 1), 
and you should use one of these, depending on which system you wish 
to work. So choose one of these, remove its comments (if it is 
already commented out) and put comments around any of the others 
that are still active. If you wish to use a different system, you 
will need to insert a new line with a new keyword (which you must 
invent) to identify your system.  Most of the rest of this header 
file is taken up with a block of 'define' statements for each system 
type; e.g. the VAX/VMS block is:

#if VMS
#define DIRDELIM ']'
#define MAXLEN          3000
#define MAXN             150
#define FSIZE          15000
#define LINELENGTH        60
#define GCG_LINELENGTH    50

In this block, you can specify the maximum number of sequences to be 
allowed (MAXN); the maximum sequence length, including gaps 
(MAXLEN);  FSIZE declares the size of some workspace, used by the 
fast 2 sequence comparison routines and should be APPROXIMATELY 4 
times MAXLEN; LINELENGTH is the length of the blocks of alignment 
output in the output files; GCG_LINELENGTH is the same but for the 
GCG compatible output only.  Finally, DIRDELIM is the character used 
to specify directories and subdirectories in file names.  It should 
be the character used to seperate the file name itself from the 
directory name (e.g. in VMS, file names are like: 
$drive:[dir1.dir2.dir3]filename.ext;2  so ']' is used as DIRDELIM).   

So, if you want to use a system, not covered in Clustalv.h, you will 
have to insert a new block, like the above one.  To compile and link 
the program, we supply 3 makefiles: one each for VAX/VMS, Ultrix 
and GNU C for Sun workstations. 

 

VAX/VMS

Compile and link the program with the 
supplied makefile for vms: vmslink.com .

$ @vmslink

This will produce clustalv.exe (and a lot of .obj files which you can delete).  

The on-line help file (clustalv.hlp) should be 'defined' as 
clustalv_help as follows:

$ def clustalv_help $drive:[dir1.dir2]clustalv.hlp 

where $drive is the drive designation and [dir1.dir2] is the 
directory where clustalv.hlp is kept.  

To make use of the command-line interface, you must make clustalv a 
'foreign' command with:

$ clustalv :== $$drive:[dir1.dir2]clustalv

where $drive is the drive designation and [dir1.dir2] is the 
directory where clustalv.exe is kept.  



IBM PC/MSDOS/TURBO C

Create a makefile (something.prj) with the names of the source files 
(clustalv.c, amenu.c etc.) and 'make' this using the HUGE memory 
model.  You will get half a dozen warnings from the compiler about 
pieces of code than look suspicious to it but ignore these.  The 
help file should remain as clustalv.hlp .   To run the program using 
the default settings in Clustalv.h, you need approximately 500k of 
memory.  To reduce this, the main influence on memory usage is the 
parameter MAXLEN; reduce MAXLEN to reduce memory usage.



Apple Mac/THINK_C version 4.0.2

This version of the program is not at all Mac like.  It runs in a 
window, the inside of which looks just like a normal character based 
terminal.  In the future we might put a proper Mac interface on it 
but do not have the time right now.  With the default settings in 
the header file ClustalV.h, you need just over 800k of memory to run 
the program.  To reduce this, reduce MAXLEN; this is easily the 
biggest influence on memory usage.  To compile the program and save 
it as an application you need to 'set the application type'; here 
you specify how much memory (in kilobytes (k)) the application will 
need.  You should set this to 900k to run the application as it is 
OR reduce MAXLEN in the header.  To compile the program you have to 
create a 'project'; you 'add' the names of the 9 source files to the 
project AND the name of the ANSI library.  The source code is too 
large to compile in one compilation unit.  You will get a 'link 
error: code segment too big' if you try to compile and link as is.  
You should compile amenu.c (the biggest source file) as a seperate 
unit ..... you will have to read the manual/ask someone/mail me to 
find out what this is.


*******************************************************************

		3.  Interactive usage.



Interactive usage of Clustal V is completely menu driven.  On-line 
help is provided, defaults are offered for all parameters and file 
names.  With a little effort it should be completely self 
explanatory.   The main menu, which appears when you run the 
programs is shown below.  Each item brings you to a sub menu.



Main menu for Clustal V:


     1. Sequence Input From Disc
     2. Multiple Alignments
12 3 4 5 下一页
💿 文件大小 448 K
👤 上传用户 xufengping716
📂 所属分类其他行业
🏷️ 相关标签

#clustalw #DOS #ZIP #83
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -