📄 clustalv.doc
字号:
Clustal V Multiple Sequence Alignments. Documentation (Installation and Usage). Des Higgins European Molecular Biology Laboratory Postfach 10.2209 D-6900 Heidelberg Germany. higgins@EMBL-Heidelberg.DE******************************************************************* Contents. 1 Overview 2 Installation 3 Interactive usage 4 Command-line interface 5 Algorithms and references******************************************************************* 1. OverviewThis document describes how to install and use ClustalV on various machines. ClustalV is a complete upgrade and rewrite of the Clustal package of multiple alignment programs (Higgins and Sharp, 1988 and 1989). The original programs were written in Fortran for microcomputers running MSDOS. You carried out a complete alignment by running 3 programs in succession. Later, these were merged into a single menu driven program with on-line help, for VAX/VMS. ClustalV was written in C and has all of the features of the old programs plus many new ones. It has been compiled and tested using VAX/VMS C, Decstation ULTRIX C, Gnu C for Sun workstations, Turbo C for IBM PC's and Think C for Apple Mac's. The original Clustal was written by Des Higgins while he was a Post-Doc in the lab of Paul Sharp in the Genetics Department, Trinity College, Dublin 2, Ireland. The main feature of the old package was the ability to carry out reliable multiple alignments of many sequences. The sensitivity of the program is as good as from any other program we have tried, with the exception of the programs of Vingron and Argos (1991), while it works in reasonable time on a microcomputer. The programs of Vingron and Argos are specialised for finding distant similarities between proteins but require mainframes or workstations and are more difficult to use.The main new features are: profile alignments (alignments of old alignments); phylogenetic trees (Neighbor Joining trees calculated after multiple alignment with a bootstrapping option); better sequence input (automatically recognise and read NBRF/PIR, Pearson (Fasta) or EMBL/SwissProt formats); flexible alignment output (choose one of: old Clustal format, NBRF/PIR, GCG msf format or Phylip format); full command line interface (everything that you can do interactively can be specified on the command line).In version 7 of the GCG package, there is a program called PILEUP which uses a very similar algorithm to the one in ClustalV. There are 2 main differences between the programs: 1) the metric used to compare the sequences for the initial "guide tree" uses a full global, optimal alignment in PILEUP instead of the fast, approximate ones in ClustalV. This makes PILEUP much slower for the comparison of long sequences. In principle, the distances calculated from PILEUP will be more sensitive than ours, but in practice it will not make much difference, except in difficult cases. 2) During the multiple alignment, terminal gaps are penalised in ClustalV but not in PILEUP. This will make the PILEUP alignments better when the sequences are of very different lengths (has no effect if there are no large terminal gaps). This software may be distributed and used freely, provided that you do not modify it or this documentation in any way without the permission of the authors. If you wish to refer to ClustalV, please cite: Higgins,D.G. Bleasby,A.J. and Fuchs,R. (1991) CLUSTAL V: improved softwarefor multiple sequence alignment. CABIOS, vol .8, 189-191. The overall multiple alignment algorithm was described in:Higgins,D.G. and Sharp,P.M. (1989). Fast and sensitive multiple sequence alignments on a microcomputer. CABIOS, vol. 5, 151-153.ACKNOWLEDGEMENTS.D.H. would particularly like to thank Paul Sharp, in whose lab. this work originated. We also thank Manolo Gouy, Gene Myers, Peter Rice and Martin Vingron for suggestions, bug-fixes and help. Des Higgins and Rainer Fuchs, EMBL Data Library, Heidelberg, Germany.Alan Bleasby, Daresbury, UK.JUNE 1991******************************************************************* 2. Installation.As far as possible, we have tried to make ClustalV portable to any machine with a standard C compiler (proposed ANSI C standard). The source code, as supplied by us, has been compiled and tested using the following compilers:VAX/VMS CUltrix C (on a Decstation 2100)Gnu C on a Sun 4 workstationThink C on an Apple Macintosh SETurbo C on an IBM AT.In each case, one must make 1 change to 1 line of code in 1 header file. This is described below. The exact capacity of the program (how many sequences of what length can be aligned) will depend of course on available memory but can also be set in this header file.The package comes as 9 C source files; 3 header files; 1 file of on-line help; this documentation file; 3 make files:Source code: clustalv.c, amenu.c, gcgcheck.c, myers.c, sequence.c, showpair.c, trees.c, upgma.c, util.cHeader files: clustalv.h, general.h, matrices.hOn-Line help: clustalv.hlp (must be renamed or defined as clustalv_help except on PC's)Documentation: clustalv.doc (this file).Makefiles: makefile.sun (gnu c on Sun), vmslink.com (vax/vms), makefile.ult (ultrix).Before compiling ClustalV you must look at and possibly change clustalV.h, shown below.. /*******************CLUSTALV.H********************************//*Main header file for ClustalV. Uncomment ONE of the following linesdepending on which compiler you wish to use.*/#define VMS 1 /* VAX VMS *//*#define MAC 1 Think_C for MacIntosh *//*#define MSDOS 1 Turbo C for PC's *//*#define UNIX 1 Ultrix for Decstations or Gnu C for Sun *//*************************************************************/#include "general.h"#define MAXNAMES 10#define MAXTITLES 60#define FILENAMELEN 256#define UNKNOWN 0#define EMBLSWISS 1#define PIR 2#define PEARSON 3#define PAGE_LEN 22#if VMS#define DIRDELIM ']'#define MAXLEN 3000#define MAXN 150#define FSIZE 15000#define LINELENGTH 60#define GCG_LINELENGTH 50#elif MAC#define DIRDELIM ':'#define MAXLEN 2600#define MAXN 30#define FSIZE 10000#define LINELENGTH 50#define GCG_LINELENGTH 50#elif MSDOS#define DIRDELIM '\\'#define MAXLEN 1300#define MAXN 30#define FSIZE 5000#define LINELENGTH 50#define GCG_LINELENGTH 50#elif UNIX#define DIRDELIM '/'#define MAXLEN 3000#define MAXN 50#define FSIZE 15000#define LINELENGTH 60#define GCG_LINELENGTH 50#endif/*****************end*of*CLUSTALV.H***************************/First, you must remove the comments from one of the first 10 lines. There are 4 'define' compiler directives here (e.g. #define VMS 1), and you should use one of these, depending on which system you wish to work. So choose one of these, remove its comments (if it is already commented out) and put comments around any of the others that are still active. If you wish to use a different system, you will need to insert a new line with a new keyword (which you must invent) to identify your system. Most of the rest of this header file is taken up with a block of 'define' statements for each system type; e.g. the VAX/VMS block is:#if VMS#define DIRDELIM ']'#define MAXLEN 3000#define MAXN 150#define FSIZE 15000#define LINELENGTH 60#define GCG_LINELENGTH 50In this block, you can specify the maximum number of sequences to be allowed (MAXN); the maximum sequence length, including gaps (MAXLEN); FSIZE declares the size of some workspace, used by the fast 2 sequence comparison routines and should be APPROXIMATELY 4 times MAXLEN; LINELENGTH is the length of the blocks of alignment output in the output files; GCG_LINELENGTH is the same but for the GCG compatible output only. Finally, DIRDELIM is the character used to specify directories and subdirectories in file names. It should be the character used to seperate the file name itself from the directory name (e.g. in VMS, file names are like: $drive:[dir1.dir2.dir3]filename.ext;2 so ']' is used as DIRDELIM). So, if you want to use a system, not covered in Clustalv.h, you will have to insert a new block, like the above one. To compile and link the program, we supply 3 makefiles: one each for VAX/VMS, Ultrix and GNU C for Sun workstations. VAX/VMSCompile and link the program with the supplied makefile for vms: vmslink.com .$ @vmslinkThis will produce clustalv.exe (and a lot of .obj files which you can delete). The on-line help file (clustalv.hlp) should be 'defined' as clustalv_help as follows:$ def clustalv_help $drive:[dir1.dir2]clustalv.hlp where $drive is the drive designation and [dir1.dir2] is the directory where clustalv.hlp is kept. To make use of the command-line interface, you must make clustalv a 'foreign' command with:$ clustalv :== $$drive:[dir1.dir2]clustalvwhere $drive is the drive designation and [dir1.dir2] is the directory where clustalv.exe is kept. IBM PC/MSDOS/TURBO CCreate a makefile (something.prj) with the names of the source files (clustalv.c, amenu.c etc.) and 'make' this using the HUGE memory model. You will get half a dozen warnings from the compiler about pieces of code than look suspicious to it but ignore these. The help file should remain as clustalv.hlp . To run the program using the default settings in Clustalv.h, you need approximately 500k of memory. To reduce this, the main influence on memory usage is the parameter MAXLEN; reduce MAXLEN to reduce memory usage.Apple Mac/THINK_C version 4.0.2This version of the program is not at all Mac like. It runs in a window, the inside of which looks just like a normal character based terminal. In the future we might put a proper Mac interface on it but do not have the time right now. With the default settings in the header file ClustalV.h, you need just over 800k of memory to run the program. To reduce this, reduce MAXLEN; this is easily the biggest influence on memory usage. To compile the program and save it as an application you need to 'set the application type'; here you specify how much memory (in kilobytes (k)) the application will need. You should set this to 900k to run the application as it is OR reduce MAXLEN in the header. To compile the program you have to create a 'project'; you 'add' the names of the 9 source files to the project AND the name of the ANSI library. The source code is too large to compile in one compilation unit. You will get a 'link error: code segment too big' if you try to compile and link as is. You should compile amenu.c (the biggest source file) as a seperate unit ..... you will have to read the manual/ask someone/mail me to find out what this is.******************************************************************* 3. Interactive usage.Interactive usage of Clustal V is completely menu driven. On-line help is provided, defaults are offered for all parameters and file names. With a little effort it should be completely self explanatory. The main menu, which appears when you run the programs is shown below. Each item brings you to a sub menu.Main menu for Clustal V: 1. Sequence Input From Disc 2. Multiple Alignments 3. Profile Alignments 4. Phylogenetic trees S. Execute a system command H. HELP X. EXIT (leave program)
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -