📄 clustalx.html

📁 是有关基因比对的经典算法的实现。这对于初学计算生物学的人是非常重要的算法。
💻 HTML
📖 第 1 页 / 共 5 页
字号:
上一页 1 2 3 45
<P>
DNA MARKING SCALE is used to remove less significant segments from the 
highlighted display. Increase the scale to display more segments; decrease the
scale to remove the least significant.
</P>
<P>
</P>
<P>
PROTEIN WEIGHT MATRIX: the scoring table which describes the similarity of each
amino acid to each other. The matrix is used to calculate the sequence-
weighted profile scores. There are four 'in-built' Log-Odds matrices offered:
the Gonnet PAM 80, 120, 250, 350 matrices. A more stringent matrix which only
gives a high score to identities and the most favoured conservative
substitutions, may be more suitable when the sequences are closely related. For
more divergent sequences, it is appropriate to use "softer" matrices which give
a high score to many other frequent substitutions. This  option automatically
recalculates the low-scoring segments.
</P>
<P>
</P>
<P>
DNA WEIGHT MATRIX: Two hard-coded matrices are available:
</P>
<P>
1) IUB. This is the default scoring matrix used by BESTFIT for the comparison
of nucleic acid sequences. X's and N's are treated as matches to any IUB
ambiguity symbol. All matches score 1.0; all mismatches for IUB symbols score
0.9.
</P>
<P>
2) CLUSTALW(1.6). The previous system used by ClustalW, in which matches score
1.0 and mismatches score 0. All matches for IUB symbols also score 0. 
</P>
<P>
A new matrix can be read from a file on disk, if the filename consists only
of lower case characters. The values in the new weight matrix should be
similarities and should be NEGATIVE for infrequent substitutions.
</P>
<P> 
INPUT FORMAT. The format used for a new matrix is the same as the BLAST
program. Any lines beginning with a # character are assumed to be comments. The
first non-comment line should contain a list of amino acids in any order, using
the 1 letter code, followed by a * character. This should be followed by a
square matrix of scores, with one row and one column for each amino acid. The
last row and column of the matrix (corresponding to the * character) contain
the minimum score over the whole matrix.
</P>
<P>
<H4>
QUALITY SCORE PARAMETERS
</H4>
</P>
<P>
You can customise the column 'quality scores' plotted underneath the alignment
display using the following options.
</P>
<P>
SCORE PLOT SCALE: this is a scalar value from 1 to 10, which can be used to
change the scale of the quality score plot. 
</P>
<P>
RESIDUE EXCEPTION CUTOFF: this is a scalar value from 1 to 10, which can be
used to change the number of residue exceptions which are highlighted in the
alignment display. (For an explanation of this cutoff, see the CALCULATION OF
RESIDUE EXCEPTIONS section below.)
</P>
<P>
PROTEIN WEIGHT MATRIX: the scoring table which describes the similarity of 
each amino acid to each other. 
</P>
<P> 
DNA WEIGHT MATRIX: two hard-coded matrices are available: IUB and CLUSTALW(1.6).
</P>
<P>
For more information about the weight matrices, see the help above for
the Low-scoring Segments Weight Matrix.
</P>
<P>
For details of the quality score calculations, see the CALCULATION section
below.
</P>
<P>
</P>
<P>
<STRONG>
SHOW LOW-SCORING SEGMENTS
</STRONG>
</P>
<P>                       
The low-scoring segment display can be toggled on or off. This option does not
recalculate the profile scores.
</P>
<P>
</P>
<P>
<STRONG>
SHOW EXCEPTIONAL RESIDUES
</STRONG>
</P>
<P>                       
This option highlights individual residues which score badly in the alignment
quality calculations. Residues which score exceptionally low are highlighted by
using a white character on a grey background.
</P>
<P>
<STRONG>
SAVE QUALITY SCORES TO FILE
</STRONG>
</P>
<P>
The quality scores that are plotted underneath the alignment display can also
be saved in a text file. Each column in the alignment is written on one line in
the output file, with the value of the quality score at the end of the line.
Only the sequences currently selected in the display are written to the file.
One use for quality scores is to color residues in a protein structure by
sequence conservation. In this way conserved surface residues can be
highlighted to locate functional regions such as ligand-binding sites.
</P>
<P>
</P>
<P>
<H3>
CALCULATION OF QUALITY SCORES
</H3>
</P>
<P>
Suppose we have an alignment of m sequences of length n. Then, the alignment
can be written as:
</P>
<P>
<PRE>
        A11 A12 A13 .......... A1n
        A21 A22 A23 .......... A2n
        .
        .
        Am1 Am2 Am3 .......... Amn
</PRE>
</P>
<P>
We also have a residue comparison matrix of size R where C(i,j) is the score
for aligning residue i with residue j.
</P>
<P>
We want to calculate a score for the conservation of the jth position in the
alignment.
</P>
<P>
To do this, we define an R-dimensional sequence space. For the jth position in 
the alignment, each sequence consists of a single residue which is assigned a
point S in the space. S has R dimensions, and for sequence i, the rth dimension
is defined as:
</P>
<P>
<PRE>
	Sr =    C(r,Aij)
</PRE>
</P>
<P>
We then calculate a consensus value for the jth position in the alignment. This
value X also has R dimensions, and the rth dimension is defined as:
</P>
<P>
<PRE>
	Xr = (   SUM   (Fij * C(i,r)) ) / m
               1<=i<=R
</PRE>
</P>
<P>
where Fij is the count of residues i at position j in the alignment.
</P>
<P>
Now we can calculate the distance Di between each sequence i and the consensus 
position X in the R-dimensional space.
</P>
<P>
<PRE>
	Di = SQRT   (   SUM   (Xr - Sr)(Xr - Sr) )
                      1<=i<=R
</P>
<P>
</PRE>
</P>
<P>
The quality score for the jth position in the alignment is defined as the mean
of the sequence distances Di.
</P>
<P>
The score is normalised by multiplying by the percentage of sequences which
have residues (and not gaps) at this position.
</P>
<P>
<H3>
CALCULATION OF RESIDUE EXCEPTIONS
</H3>
</P>
<P>
The jth residue of the ith sequence is considered as an exception if the
distance Di of the sequence from the consensus value P is greater than (Upper
Quartile + Inter Quartile Range * Cutoff). The value used as a cutoff for
displaying exceptions can be set from the SCORE PARAMETERS menu. A high cutoff
value will only display very significant exceptions; a low value will allow
more, less significant, exceptions to be highlighted.
</P>
<P>
(NB. Sequences which contain gaps at this position are not included in the
exception calculation.)
</P>
<P>
</P>
<P>
<H3>
CALCULATION OF LOW-SCORING SEGMENTS
</H3>
</P>
<P>
Suppose we have an alignment of m sequences of length n. Then, the alignment
can be written as:
</P>
<P>
<PRE>
        A11 A12 A13 .......... A1n
        A21 A22 A23 .......... A2n
        .
        .
        Am1 Am2 Am3 .......... Amn
</PRE>
</P>
<P>
We also have a residue comparison matrix of size R where C(i,j) is the score
for aligning residue i with residue j.
</P>
<P>
We calculate sequence weights by building a neighbour-joining tree, in which
branch lengths are proportional to divergence. Summing the branches by branch
ownership provides the weights. See (Thompson et al., CABIOS, 10, 19 (1994) and
Henikoff et al.,JMB, 243, 574 1994).
</P>
<P>
To find the low-scoring segments in a sequence Si, we build a weighted profile
of the remaining sequences in the alignment. Suppose we find residue r at 
position j in the sequence; then the score for the jth position in the sequence
is defined as
</P>
<P>
<PRE>
	Score(Si,j) = Profile(j,r)   where Profile(j,r) is the profile score
                                       for residue r at position j in the
                                       alignment.
</PRE>
</P>
<P>
These residue scores are summed along the sequence in both forward and backward
directions. If the sum of the scores is positive, then it is reset to zero.
Segments which score negatively in both directions are considered as 
'low-scoring' and will be highlighted in the alignment display.
</P>
<P>
</P>
<P>
</P>
<A HREF="#INDEX"> <EM>Back to Index</EM> </A>
<CENTER><H2><A NAME="9">              Command Line Parameters
</A></H2></CENTER>
<CENTER><H3>                DATA (sequences)
</H3></CENTER>
<CENTER><TABLE ALIGN=ABSCENTER BORDER=1 CELLSPACING=1 CELLPADDING=5>
<TR>
<TD><STRONG>Parameter</STRONG></TD>
<TD><STRONG><EM>Description</EM></STRONG></TD>
</TR>
<TR>
<TD><TT>-PROFILE1=file.ext  and  -PROFILE2=file.ext  </TT></TD>
<TD><EM>profiles (aligned sequences)</EM></TD>
</TR>
</TABLE></CENTER>
<CENTER><H3>                VERBS (do things)
</H3></CENTER>
<CENTER><TABLE ALIGN=ABSCENTER BORDER=1 CELLSPACING=1 CELLPADDING=5>
<TR>
<TD><STRONG>Parameter</STRONG></TD>
<TD><STRONG><EM>Description</EM></STRONG></TD>
</TR>
<TR>
<TD><TT>-HELP  or -CHECK    </TT></TD>
<TD><EM>outline the command line parameters</EM></TD>
</TR>
<TR>
<TD><TT>-ALIGN              </TT></TD>
<TD><EM>do full multiple alignment </EM></TD>
</TR>
<TR>
<TD><TT>-TREE               </TT></TD>
<TD><EM>calculate NJ tree</EM></TD>
</TR>
<TR>
<TD><TT>-BOOTSTRAP(=n)      </TT></TD>
<TD><EM>bootstrap a NJ tree (n= number of bootstraps; def. = 1000)</EM></TD>
</TR>
<TR>
<TD><TT>-CONVERT            </TT></TD>
<TD><EM>output the input sequences in a different file format</EM></TD>
</TR>
</TABLE></CENTER>
<CENTER><H3>                PARAMETERS (set things)
</H3></CENTER>
<CENTER><P><STRONG>***General settings:****
</STRONG></P></CENTER>
<CENTER><TABLE ALIGN=ABSCENTER BORDER=1 CELLSPACING=1 CELLPADDING=5>
<TR>
<TD><STRONG>Parameter</STRONG></TD>
<TD><STRONG><EM>Description</EM></STRONG></TD>
</TR>
<TR>
<TD><TT>-INTERACTIVE </TT></TD>
<TD><EM>read command line, then enter normal interactive menus</EM></TD>
</TR>
<TR>
<TD><TT>-QUICKTREE   </TT></TD>
<TD><EM>use FAST algorithm for the alignment guide tree</EM></TD>
</TR>
<TR>
<TD><TT>-TYPE=       </TT></TD>
<TD><EM>PROTEIN or DNA sequences</EM></TD>
</TR>
<TR>
<TD><TT>-NEGATIVE    </TT></TD>
<TD><EM>protein alignment with negative values in matrix</EM></TD>
</TR>
<TR>
<TD><TT>-OUTFILE=    </TT></TD>
<TD><EM>sequence alignment file name</EM></TD>
</TR>
<TR>
<TD><TT>-OUTPUT=     </TT></TD>
<TD><EM>GCG, GDE, PHYLIP, PIR or NEXUS</EM></TD>
</TR>
<TR>
<TD><TT>-OUTORDER=   </TT></TD>
<TD><EM>INPUT or ALIGNED</EM></TD>
</TR>
<TR>
<TD><TT>-CASE=       </TT></TD>
<TD><EM>LOWER or UPPER (for GDE output only)</EM></TD>
</TR>
<TR>
<TD><TT>-SEQNOS=     </TT></TD>
<TD><EM>OFF or ON (for Clustal output only)</EM></TD>
</TR>
</TABLE></CENTER>
<CENTER><H3>***Fast Pairwise Alignments:***
</H3></CENTER>
<CENTER><TABLE ALIGN=ABSCENTER BORDER=1 CELLSPACING=1 CELLPADDING=5>
<TR>
<TD><STRONG>Parameter</STRONG></TD>
<TD><STRONG><EM>Description</EM></STRONG></TD>
</TR>
<TR>
<TD><TT>-TOPDIAGS=n  </TT></TD>
<TD><EM>number of best diags.</EM></T
上一页 1 2 3 45
💿 文件大小 848 K
👤 上传用户 nassdaq
📂 所属分类 *行业应用
🏷️ 相关标签

#算法 #基因 #生物学 #计算
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -