📄 clustalx.hlp

📁 是有关基因比对的经典算法的实现。这对于初学计算生物学的人是非常重要的算法。
💻 HLP
📖 第 1 页 / 共 5 页
字号:
上一页 1 2 3 45
using a white character on a grey background.

<STRONG>
SAVE QUALITY SCORES TO FILE
</STRONG>

The quality scores that are plotted underneath the alignment display can also
be saved in a text file. Each column in the alignment is written on one line in
the output file, with the value of the quality score at the end of the line.
Only the sequences currently selected in the display are written to the file.
One use for quality scores is to color residues in a protein structure by
sequence conservation. In this way conserved surface residues can be
highlighted to locate functional regions such as ligand-binding sites.


<H3>
CALCULATION OF QUALITY SCORES
</H3>
-----------------------------

Suppose we have an alignment of m sequences of length n. Then, the alignment
can be written as:

<PRE>
        A11 A12 A13 .......... A1n
        A21 A22 A23 .......... A2n
        .
        .
        Am1 Am2 Am3 .......... Amn
</PRE>

We also have a residue comparison matrix of size R where C(i,j) is the score
for aligning residue i with residue j.

We want to calculate a score for the conservation of the jth position in the
alignment.

To do this, we define an R-dimensional sequence space. For the jth position in 
the alignment, each sequence consists of a single residue which is assigned a
point S in the space. S has R dimensions, and for sequence i, the rth dimension
is defined as:

<PRE>
	Sr =    C(r,Aij)
</PRE>

We then calculate a consensus value for the jth position in the alignment. This
value X also has R dimensions, and the rth dimension is defined as:

<PRE>
	Xr = (   SUM   (Fij * C(i,r)) ) / m
               1<=i<=R
</PRE>

where Fij is the count of residues i at position j in the alignment.

Now we can calculate the distance Di between each sequence i and the consensus 
position X in the R-dimensional space.

<PRE>
	Di = SQRT   (   SUM   (Xr - Sr)(Xr - Sr) )
                      1<=i<=R

</PRE>

The quality score for the jth position in the alignment is defined as the mean
of the sequence distances Di.

The score is normalised by multiplying by the percentage of sequences which
have residues (and not gaps) at this position.

<H3>
CALCULATION OF RESIDUE EXCEPTIONS
</H3>
---------------------------------

The jth residue of the ith sequence is considered as an exception if the
distance Di of the sequence from the consensus value P is greater than (Upper
Quartile + Inter Quartile Range * Cutoff). The value used as a cutoff for
displaying exceptions can be set from the SCORE PARAMETERS menu. A high cutoff
value will only display very significant exceptions; a low value will allow
more, less significant, exceptions to be highlighted.

(NB. Sequences which contain gaps at this position are not included in the
exception calculation.)


<H3>
CALCULATION OF LOW-SCORING SEGMENTS
</H3>
-----------------------------------

Suppose we have an alignment of m sequences of length n. Then, the alignment
can be written as:

<PRE>
        A11 A12 A13 .......... A1n
        A21 A22 A23 .......... A2n
        .
        .
        Am1 Am2 Am3 .......... Amn
</PRE>

We also have a residue comparison matrix of size R where C(i,j) is the score
for aligning residue i with residue j.

We calculate sequence weights by building a neighbour-joining tree, in which
branch lengths are proportional to divergence. Summing the branches by branch
ownership provides the weights. See (Thompson et al., CABIOS, 10, 19 (1994) and
Henikoff et al.,JMB, 243, 574 1994).

To find the low-scoring segments in a sequence Si, we build a weighted profile
of the remaining sequences in the alignment. Suppose we find residue r at 
position j in the sequence; then the score for the jth position in the sequence
is defined as

<PRE>
	Score(Si,j) = Profile(j,r)   where Profile(j,r) is the profile score
                                       for residue r at position j in the
                                       alignment.
</PRE>

These residue scores are summed along the sequence in both forward and backward
directions. If the sum of the scores is positive, then it is reset to zero.
Segments which score negatively in both directions are considered as 
'low-scoring' and will be highlighted in the alignment display.


>>HELP 9 <<
              Command Line Parameters

                DATA (sequences)

-INFILE=file.ext                             :input sequences
-PROFILE1=file.ext  and  -PROFILE2=file.ext  :profiles (aligned sequences)


                VERBS (do things)

-OPTIONS	    :list the command line parameters
-HELP  or -CHECK    :outline the command line parameters
-ALIGN              :do full multiple alignment 
-TREE               :calculate NJ tree
-BOOTSTRAP(=n)      :bootstrap a NJ tree (n= number of bootstraps; def. = 1000)
-CONVERT            :output the input sequences in a different file format


                PARAMETERS (set things)

***General settings:****
-INTERACTIVE :read command line, then enter normal interactive menus
-QUICKTREE   :use FAST algorithm for the alignment guide tree
-TYPE=       :PROTEIN or DNA sequences
-NEGATIVE    :protein alignment with negative values in matrix
-OUTFILE=    :sequence alignment file name
-OUTPUT=     :GCG, GDE, PHYLIP, PIR or NEXUS
-OUTORDER=   :INPUT or ALIGNED
-CASE=       :LOWER or UPPER (for GDE output only)
-SEQNOS=     :OFF or ON (for Clustal output only)


***Fast Pairwise Alignments:***
-KTUPLE=n      :word size
-TOPDIAGS=n  :number of best diags.
-WINDOW=n    :window around best diags.
-PAIRGAP=n   :gap penalty
-SCORE=      :PERCENT or ABSOLUTE


***Slow Pairwise Alignments:***
-PWMATRIX=    :Protein weight matrix=BLOSUM, PAM, GONNET, ID or filename
-PWDNAMATRIX= :DNA weight matrix=IUB, CLUSTALW or filename
-PWGAPOPEN=f  :gap opening penalty
-PWGAPEXT=f  :gap opening penalty
 

***Multiple Alignments:***
-NEWTREE=    :file for new guide tree
-USETREE=    :file for old guide tree
-MATRIX=     :Protein weight matrix=BLOSUM, PAM, GONNET, ID or filename
-DNAMATRIX=  :DNA weight matrix=IUB, CLUSTALW or filename
-GAPOPEN=f   :gap opening penalty
-GAPEXT=f  :gap extension penalty
-ENDGAPS     :no end gap separation pen.
-GAPDIST=n   :gap separation pen. range
-NOPGAP      :residue-specific gaps off
-NOHGAP    :hydrophilic gaps off
-HGAPRESIDUES= :list hydrophilic res.
-MAXDIV=n    :% ident. for delay
-TYPE=       :PROTEIN or DNA
-TRANSWEIGHT=f :transitions weighting


***Profile Alignments:***
-PROFILE      :Merge two alignments by profile alignment
-NEWTREE1=    :file for new guide tree for profile1
-NEWTREE2=    :file for new guide tree for profile2
-USETREE1=    :file for old guide tree for profile1
-USETREE2=    :file for old guide tree for profile2


***Sequence to Profile Alignments:***
-SEQUENCES   :Sequentially add profile2 sequences to profile1 alignment
-NEWTREE=    :file for new guide tree
-USETREE=    :file for old guide tree


***Structure Alignments:***
-NOSECSTR1     :do not use secondary structure/gap penalty mask for profile 1 
-NOSECSTR2     :do not use secondary structure/gap penalty mask for profile 2
-SECSTROUT=STRUCTURE or MASK or BOTH or NONE  :output in alignment file
-HELIXGAP=n    :gap penalty for helix core residues 
-STRANDGAP=n   :gap penalty for strand core residues
-LOOPGAP=n     :gap penalty for loop regions
-TERMINALGAP=n :gap penalty for structure termini
-HELIXENDIN=n  :number of residues inside helix to be treated as terminal
-HELIXENDOUT=n :number of residues outside helix to be treated as terminal
-STRANDENDIN=n :number of residues inside strand to be treated as terminal
-STRANDENDOUT=n:number of residues outside strand to be treated as terminal 


***Trees:***
-OUTPUTTREE=nj OR phylip OR dist OR nexus
-SEED=n    :seed number for bootstraps
-KIMURA      :use Kimura's correction
-TOSSGAPS  :ignore positions with gaps
-BOOTLABELS=node OR branch :position of bootstrap values in tree display


>>HELP R <<
                             References

<STRONG>
The ClustalX program is described in the manuscript:
</STRONG>

Thompson,J.D., Gibson,T.J., Plewniak,F., Jeanmougin,F. and Higgins,D.G. (1997)
The ClustalX windows interface: flexible strategies for multiple sequence 
alignment aided by quality analysis tools. Nucleic Acids Research, 24:4876-4882.


<STRONG>
The ClustalW program is described in the manuscript:
</STRONG>

Thompson, J.D., Higgins, D.G. and Gibson, T.J. (1994) CLUSTAL W: improving the
sensitivity of progressive multiple sequence alignment through sequence
weighting, positions-specific gap penalties and weight matrix choice.  Nucleic
Acids Research, 22:4673-4680.


<STRONG>
The ClustalV program is described in the manuscript:
</STRONG>

Higgins,D.G., Bleasby,A.J. and Fuchs,R. (1992) CLUSTAL V: improved software for
multiple sequence alignment. CABIOS 8,189-191.


<STRONG>
The original Clustal program is described in the manuscripts:
</STRONG>

Higgins,D.G. and Sharp,P.M. (1989) Fast and sensitive multiple sequence
alignments on a microcomputer.
CABIOS 5,151-153.

Higgins,D.G. and Sharp,P.M. (1988) CLUSTAL: a package for performing multiple
sequence alignment on a microcomputer. Gene 73,237-244.

-------------------------------------------------------------------------------
<STRONG>
Some tips on using Clustal X:
</STRONG>

Jeanmougin,F., Thompson,J.D., Gouy,M., Higgins,D.G. and Gibson,T.J. (1998)
Multiple sequence alignment with Clustal X. Trends Biochem Sci, 23, 403-5.

<STRONG>
Some tips on using Clustal W:
</STRONG>

Higgins, D. G., Thompson, J. D. and Gibson, T. J. (1996) Using CLUSTAL for
multiple sequence alignments. Methods Enzymol., 266, 383-402.

-------------------------------------------------------------------------------
<STRONG>
You can get the latest version of the ClustalX program by anonymous ftp to:
</STRONG>

ftp-igbmc.u-strasbg.fr
ftp.embl-heidelberg.de
ftp.ebi.ac.uk

<STRONG>
Or, have a look at the following WWW site:
</STRONG>

http://www-igbmc.u-strasbg.fr/BioInfo/
上一页 1 2 3 45
💿 文件大小 848 K
👤 上传用户 nassdaq
📂 所属分类 *行业应用
🏷️ 相关标签

#算法 #基因 #生物学 #计算
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -