⭐ 欢迎来到虫虫下载站! | 📦 资源下载 📁 资源专辑 ℹ️ 关于我们
⭐ 虫虫下载站

📄 clustalw_help

📁 经典生物信息学多序列比对工具clustalw
💻
📖 第 1 页 / 共 3 页
字号:
Further help is offered in the weight matrix menu.7)  In the weight matrices, you can use negative as well as positive values ifyou wish, although the matrix will be automatically adjusted to all positivescores, unless the NEGATIVE MATRIX option is selected.8) PROTEIN GAP PARAMETERS displays a menu allowing you to set some Gap Penaltyoptions which are only used in protein alignments. >>HELP A <<           Help for protein gap parameters.1) RESIDUE SPECIFIC PENALTIES are amino acid specific gap penalties that reduceor increase the gap opening penalties at each position in the alignment orsequence.  See the documentation for details.  As an example, positions that are rich in glycine are more likely to have an adjacent gap than positions thatare rich in valine.2) 3) HYDROPHILIC GAP PENALTIES are used to increase the chances of a gap withina run (5 or more residues) of hydrophilic amino acids; these are likely tobe loop or random coil regions where gaps are more common.  The residues that are "considered" to be hydrophilic are set by menu item 3.4) GAP SEPARATION DISTANCE tries to decrease the chances of gaps being tooclose to each other. Gaps that are less than this distance apart are penalisedmore than other gaps. This does not prevent close gaps; it makes them lessfrequent, promoting a block-like appearance of the alignment.5) END GAP SEPARATION treats end gaps just like internal gaps for the purposesof avoiding gaps that are too close (set by GAP SEPARATION DISTANCE above).If you turn this off, end gaps will be ignored for this purpose.  This isuseful when you wish to align fragments where the end gaps are not biologicallymeaningful.>>HELP 5 <<      Help for output format options.Six output formats are offered. You can choose any (or all 6 if you wish).  CLUSTAL format output is a self explanatory alignment format.  It shows thesequences aligned in blocks.  It can be read in again at a later date to(for example) calculate a phylogenetic tree or add a new sequence with a profile alignment.GCG output can be used by any of the GCG programs that can work on multiplealignments (e.g. PRETTY, PROFILEMAKE, PLOTALIGN).  It is the same as the GCG.msf format files (multiple sequence file); new in version 7 of GCG.PHYLIP format output can be used for input to the PHYLIP package of Joe Felsenstein.  This is an extremely widely used package for doing every imaginable form of phylogenetic analysis (MUCH more than the the modest intro-duction offered by this program).NBRF-PIR:  this is the same as the standard PIR format with ONE ADDITION.  Gapcharacters "-" are used to indicate the positions of gaps in the multiple alignment.  These files can be re-used as input in any part of clustal thatallows sequences (or alignments or profiles) to be read in.  GDE:  this is the flat file format used by the GDE package of Steven Smith.NEXUS: the format used by several phylogeny programs, including PAUP andMacClade.GDE OUTPUT CASE: sequences in GDE format may be written in either upper orlower case.CLUSTALW SEQUENCE NUMBERS: residue numbers may be added to the end of thealignment lines in clustalw format.OUTPUT ORDER is used to control the order of the sequences in the outputalignments.  By default, the order corresponds to the order in which thesequences were aligned (from the guide tree-dendrogram), thus automaticallygrouping closely related sequences. This switch can be used to set the orderto the same as the input file.PARAMETER OUTPUT: This option allows you to save all your parameter settingsin a parameter file. This file can be used subsequently to rerun Clustal Wusing the same parameters.>>HELP 6 <<      Help for profile and structure alignments   By PROFILE ALIGNMENT, we mean alignment using existing alignments. Profile alignments allow you to store alignments of your favourite sequences and addnew sequences to them in small bunches at a time. A profile is simply analignment of one or more sequences (e.g. an alignment output file from CLUSTALW). Each input can be a single sequence. One or both sets of input sequencesmay include secondary structure assignments or gap penalty masks to guide thealignment. The profiles can be in any of the allowed input formats with "-" charactersused to specify gaps (except for MSF-RSF where "." is used).You have to specify the 2 profiles by choosing menu items 1 and 2 and giving2 file names.  Then Menu item 3 will align the 2 profiles to each other. Secondary structure masks in either profile can be used to guide the alignment.Menu item 4 will take the sequences in the second profile and align them tothe first profile, 1 at a time.  This is useful to add some new sequences toan existing alignment, or to align a set of sequences to a known structure.  In this case, the second profile would not be pre-aligned.The alignment parameters can be set using menu items 5, 6 and 7. These areEXACTLY the same parameters as used by the general, automatic multiplealignment procedure. The general multiple alignment procedure is simply aseries of profile alignments. Carrying out a series of profile alignments onlarger and larger groups of sequences, allows you to manually build up acomplete alignment, if necessary editing intermediate alignments.SECONDARY STRUCTURE OPTIONS. Menu Option 0 allows you to set 2D structureparameters. If a solved structure is available, it can be used to guide the alignment by raising gap penalties within secondary structure elements, so that gaps will preferentially be inserted into unstructured surface loops.Alternatively, a user-specified gap penalty mask can be supplied directly.A gap penalty mask is a series of numbers between 1 and 9, one per position in the alignment. Each number specifies how much the gap opening penalty is to be raised at that position (raised by multiplying the basic gap opening penaltyby the number) i.e. a mask figure of 1 at a position means no changein gap opening penalty; a figure of 4 means that the gap opening penalty isfour times greater at that position, making gaps 4 times harder to open.The format for gap penalty masks and secondary structure masks is explainedin the help under option 0 (secondary structure options).>>HELP B <<      Help for secondary structure - gap penalty masksThe use of secondary structure-based penalties has been shown to improve theaccuracy of multiple alignment. Therefore CLUSTAL W now allows gap penalty masks to be supplied with the input sequences. The masks work by raising gap penalties in specified regions (typically secondary structure elements) so thatgaps are preferentially opened in the less well conserved regions (typically surface loops).Options 1 and 2 control whether the input secondary structure information orgap penalty masks will be used.Option 3 controls whether the secondary structure and gap penalty masks shouldbe included in the output alignment.Options 4 and 5 provide the value for raising the gap penalty at core Alpha Helical (A) and Beta Strand (B) residues. In CLUSTAL format, capital residues denote the A and B core structure notation. The basic gap penalties aremultiplied by the amount specified.Option 6 provides the value for the gap penalty in Loops. By default this penalty is not raised. In CLUSTAL format, loops are specified by "." in the secondary structure notation.Option 7 provides the value for setting the gap penalty at the ends of secondary structures. Ends of secondary structures are observed to grow and-or shrink in related structures. Therefore by default these are given intermediate values, lower than the core penalties. All secondary structure read in as lower case in CLUSTAL format gets the reduced terminal penalty.Options 8 and 9 specify the range of structure termini for the intermediate penalties. In the alignment output, these are indicated as lower case. For Alpha Helices, by default, the range spans the end helical turn. For Beta Strands, the default range spans the end residue and the adjacent loop residue, since sequence conservation often extends beyond the actual H-bondedBeta Strand.CLUSTAL W can read the masks from SWISS-PROT, CLUSTAL or GDE format inputfiles. For many 3-D protein structures, secondary structure information isrecorded in the feature tables of SWISS-PROT database entries. You shouldalways check that the assignments are correct - some are quite inaccurate.CLUSTAL W looks for SWISS-PROT HELIX and STRAND assignments e.g.FT   HELIX       100    115FT   STRAND      118    119The structure and penalty masks can also be read from CLUSTAL alignment format as comment lines beginning "!SS_" or "!GM_" e.g.!SS_HBA_HUMA    ..aaaAAAAAAAAAAaaa.aaaAAAAAAAAAAaaaaaaAaaa.........aaaAAAAAA!GM_HBA_HUMA    112224444444444222122244444444442222224222111111111222444444HBA_HUMA        VLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHFDLSHGSAQVKGHGKNote that the mask itself is a set of numbers between 1 and 9 each of which is assigned to the residue(s) in the same column below. In GDE flat file format, the masks are specified as text and the names mustbegin with "SS_ or "GM_.Either a structure or penalty mask or both may be used. If both are included inan alignment, the user will be asked which is to be used.>>HELP C <<      Help for secondary structure - gap penalty mask output options      The options in this menu let you choose whether or not to include the masksin the CLUSTAL W output alignments. Showing both is useful for understandinghow the masks work. The secondary structure information is itself very usefulin judging the alignment quality and in seeing how residue conservationpatterns vary with secondary structure. >>HELP 7 <<      Help for phylogenetic trees1) Before calculating a tree, you must have an ALIGNMENT in memory. This can beinput in any format or you should have just carried out a full multiplealignment and the alignment is still in memory. *************** Remember YOU MUST ALIGN THE SEQUENCES FIRST!!!! ***************The method used is the NJ (Neighbour Joining) method of Saitou and Nei. Firstyou calculate distances (percent divergence) between all pairs of sequence froma multiple alignment; second you apply the NJ method to the distance matrix.2) EXCLUDE POSITIONS WITH GAPS? With this option, any alignment positions whereANY of the sequences have a gap will be ignored. This means that 'like' will becompared to 'like' in all distances, which is highly desirable. It alsoautomatically throws away the most ambiguous parts of the alignment, which areconcentrated around gaps (usually). The disadvantage is that you may throw awaymuch of the data if there are many gaps (which is why it is difficult for us tomake it the default).  3) CORRECT FOR MULTIPLE SUBSTITUTIONS? For small divergence (say <10%) thisoption makes no difference. For greater divergence, it corrects for the factthat observed distances underestimate actual evolutionary distances. This isbecause, as sequences diverge, more than one substitution will happen at manysites. However, you only see one difference when you look at the present daysequences. Therefore, this option has the effect of stretching branch lengthsin trees (especially long branches). The corrections used here (for DNA orproteins) are both due to Motoo Kimura. See the documentation for details.  Where possible, this option should be used. However, for VERY divergentsequences, the distances cannot be reliably corrected. You will be warned ifthis happens. Even if none of the distances in a data set exceed the reliablethreshold, if you bootstrap the data, some of the bootstrap distances mayrandomly exceed the safe limit.  4) To calculate a tree, use option 4 (DRAW TREE NOW). This gives an UNROOTED

⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -