📄 clustalx_help
字号:
>>HELP C << ColorsClustal X provides a versatile coloring scheme for the sequence alignment display. The sequences (or profiles) are colored automatically, when they areloaded. Sequences can be colored either by assigning a color to specificresidues, or on the basis of an alignment consensus. In the latter case, thealignment consensus is calculated automatically, and the residues in eachcolumn are colored according to the consensus character assigned to thatcolumn. In this way, you can choose to highlight, for example, conservedhydrophylic or hydrophobic positions in the alignment.The 'rules' used to color the alignment are specified in a COLOR PARAMETERFILE. Clustal X automatically looks for a file called 'colprot.par' for proteinsequences or 'coldna.par' for DNA, in the current directory. (If your runningunder UNIX, it then looks in your home directory, and finally in thedirectories in your PATH environment variable).By default, if no color parameter file is found, protein sequences are coloredby residue as follows:<PRE> Color Residue Code ORANGE GPST RED HKR BLUE FWY GREEN ILMV</PRE>In the case of DNA sequences, the default colors are as follows:<PRE> Color Residue Code ORANGE A RED C BLUE T GREEN G</PRE>The default BACKGROUND COLORING option shows the sequence residues using ablack character on a colored background. It can be switched off to showresidues as a colored character on a white background. Either BLACK AND WHITE or DEFAULT COLOR options can be selected. The Coloroption looks first for the color parameter file (as described above) and, if nofile is found, uses the default residue-specific colors.You can specify your own coloring scheme by using the LOAD COLOR PARAMETER FILEoption. The format of the color parameter file is described below.<H4>COLOR PARAMETER FILE</H4>This file is divided into 3 sections:1) the names and rgb values of the colors2) the rules for calculating the consensus3) the rules for assigning colors to the residues An example file is given here.<PRE> --------------------------------------------------------------------@rgbindexRED 0.9 0.1 0.1BLUE 0.1 0.1 0.9GREEN 0.1 0.9 0.1YELLOW 0.9 0.9 0.0@consensus% = 60% w:l:v:i:m:a:f:c:y:h:p# = 80% w:l:v:i:m:a:f:c:y:h:p- = 50% e:d+ = 60% k:rq = 50% q:ep = 50% pn = 50% nt = 50% t:s@colorg = REDp = YELLOWt = GREEN if t:%:#n = GREEN if nw = BLUE if %:#:pk = RED if + --------------------------------------------------------------------</PRE>The first section is optional and is identified by the header @rgbindex. Ifthis section exists, each color used in the file must be named and the rgbvalues specified (on a scale from 0 to 1). If the rgb index section is notfound, the following set of hard-coded colors will be used.<PRE>RED 0.9 0.1 0.1BLUE 0.1 0.1 0.9GREEN 0.1 0.9 0.1ORANGE 0.9 0.7 0.3CYAN 0.1 0.9 0.9PINK 0.9 0.5 0.5MAGENTA 0.9 0.1 0.9YELLOW 0.9 0.9 0.0</PRE>The second section is optional and is identified by the header @consensus. Itdefines how the consensus is calculated. The format of each consensus parameter is:- <PRE>c = n% residue_list where c is a character used to identify the parameter. n is an integer value used as the percentage cutoff point. residue_list is a list of residues denoted by a single character, delimited by a colon (:).</PRE> For example: # = 60% w:l:v:iwill assign a consensus character # to any column in the alignment whichcontains more than 60% of the residues w,l,v and i. The third section is identified by the header @color, and defines how colorsare assigned to each residue in the alignment. The color parameters can take one of two formats:<PRE>1) r = color2) r = color if consensus_list where r is a character used to denote a residue. color is one of the colors in the GDE color lookup table. residue_list is a list of residues denoted by a single character, delimited by a colon (:).</PRE> Examples:1) g = ORANGEwill color all glycines ORANGE, regardless of the consensus.2) w = BLUE if w:%:#will color BLUE any tryptophan which is found in a column with a consensus ofw, % or #. >>HELP Q << Alignment Quality Analysis<H3>QUALITY SCORES</H3>--------------Clustal X provides an indication of the quality of an alignment by plottinga 'conservation score' for each column of the alignment. A high score indicatesa well-conserved column; a low score indicates low conservation. The qualitycurve is drawn below the alignment.Two methods are also provided to indicate single residues or sequence segmentswhich score badly in the alignment. Low-scoring residues are expected to occur at a moderate frequency in all thesequences because of their steady divergence due to the natural processes ofevolution. The most divergent sequences are likely to have the most outliers.However, the highlighted residues are especially useful in pointing tosequence misalignments. Note that clustering of highlighted residues is astrong indication of misalignment. This can arise due to various reasons, forexample: 1. Partial or total misalignments caused by a failure in the alignment algorithm. Usually only in difficult alignment cases. 2. Partial or total misalignments because at least one of the sequences in the given set is partly or completely unrelated to the other sequences. It is up to the user to check that the set of sequences are alignable. 3. Frameshift translation errors in a protein sequence causing local mismatched regions to be heavily highlighted. These are surprisingly common in database entries. If suspected, a 3-frame translation of the source DNA needs to be examined. Occasionally, highlighted residues may point to regions of some biologicalsignificance. This might happen for example if a protein alignment contains asequence which has acquired new functions relative to the main sequence set. Itis important to exclude other explanations, such as error or the naturaldivergence of sequences, before invoking a biological explanation.<H3>LOW-SCORING SEGMENTS</H3>--------------------Unreliable regions in the alignment can be highlighted using the Low-ScoringSegments option. A sequence-weighted profile is used to indicate any segmentsin the sequences which score badly. Because the profile calculation may takesome time, an option is provided to calculate LOW-SCORING SEGMENTS. The segment display can then be toggled on or off without having to repeat thetime-consuming calculations.For details of the low-scoring segment calculation, see the CALCULATION sectionbelow.<H4>LOW-SCORING SEGMENT PARAMETERS</H4>------------------------------MINIMUM LENGTH OF SEGMENTS: short segments (or even single residues) can behidden by increasing the minimum length of segments which will be displayed.DNA MARKING SCALE is used to remove less significant segments from the highlighted display. Increase the scale to display more segments; decrease thescale to remove the least significant.PROTEIN WEIGHT MATRIX: the scoring table which describes the similarity of eachamino acid to each other. The matrix is used to calculate the sequence-weighted profile scores. There are four 'in-built' Log-Odds matrices offered:the Gonnet PAM 80, 120, 250, 350 matrices. A more stringent matrix which onlygives a high score to identities and the most favoured conservativesubstitutions, may be more suitable when the sequences are closely related. Formore divergent sequences, it is appropriate to use "softer" matrices which givea high score to many other frequent substitutions. This option automaticallyrecalculates the low-scoring segments.DNA WEIGHT MATRIX: Two hard-coded matrices are available:1) IUB. This is the default scoring matrix used by BESTFIT for the comparisonof nucleic acid sequences. X's and N's are treated as matches to any IUBambiguity symbol. All matches score 1.0; all mismatches for IUB symbols score0.9.2) CLUSTALW(1.6). The previous system used by ClustalW, in which matches score1.0 and mismatches score 0. All matches for IUB symbols also score 0. A new matrix can be read from a file on disk, if the filename consists onlyof lower case characters. The values in the new weight matrix should besimilarities and should be NEGATIVE for infrequent substitutions. INPUT FORMAT. The format used for a new matrix is the same as the BLASTprogram. Any lines beginning with a # character are assumed to be comments. Thefirst non-comment line should contain a list of amino acids in any order, usingthe 1 letter code, followed by a * character. This should be followed by asquare matrix of scores, with one row and one column for each amino acid. Thelast row and column of the matrix (corresponding to the * character) containthe minimum score over the whole matrix.<H4>QUALITY SCORE PARAMETERS</H4>------------------------You can customise the column 'quality scores' plotted underneath the alignmentdisplay using the following options.SCORE PLOT SCALE: this is a scalar value from 1 to 10, which can be used tochange the scale of the quality score plot. RESIDUE EXCEPTION CUTOFF: this is a scalar value from 1 to 10, which can beused to change the number of residue exceptions which are highlighted in thealignment display. (For an explanation of this cutoff, see the CALCULATION OFRESIDUE EXCEPTIONS section below.)PROTEIN WEIGHT MATRIX: the scoring table which describes the similarity of each amino acid to each other. DNA WEIGHT MATRIX: two hard-coded matrices are available: IUB and CLUSTALW(1.6).For more information about the weight matrices, see the help above forthe Low-scoring Segments Weight Matrix.For details of the quality score calculations, see the CALCULATION sectionbelow.<STRONG>SHOW LOW-SCORING SEGMENTS</STRONG> The low-scoring segment display can be toggled on or off. This option does notrecalculate the profile scores.<STRONG>SHOW EXCEPTIONAL RESIDUES</STRONG>
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -