📄 clustalx_help
字号:
This option highlights individual residues which score badly in the alignmentquality calculations. Residues which score exceptionally low are highlighted byusing a white character on a grey background.<STRONG>SAVE QUALITY SCORES TO FILE</STRONG>The quality scores that are plotted underneath the alignment display can alsobe saved in a text file. Each column in the alignment is written on one line inthe output file, with the value of the quality score at the end of the line.Only the sequences currently selected in the display are written to the file.One use for quality scores is to color residues in a protein structure bysequence conservation. In this way conserved surface residues can behighlighted to locate functional regions such as ligand-binding sites.<H3>CALCULATION OF QUALITY SCORES</H3>-----------------------------Suppose we have an alignment of m sequences of length n. Then, the alignmentcan be written as:<PRE> A11 A12 A13 .......... A1n A21 A22 A23 .......... A2n . . Am1 Am2 Am3 .......... Amn</PRE>We also have a residue comparison matrix of size R where C(i,j) is the scorefor aligning residue i with residue j.We want to calculate a score for the conservation of the jth position in thealignment.To do this, we define an R-dimensional sequence space. For the jth position in the alignment, each sequence consists of a single residue which is assigned apoint S in the space. S has R dimensions, and for sequence i, the rth dimensionis defined as:<PRE> Sr = C(r,Aij)</PRE>We then calculate a consensus value for the jth position in the alignment. Thisvalue X also has R dimensions, and the rth dimension is defined as:<PRE> Xr = ( SUM (Fij * C(i,r)) ) / m 1<=i<=R</PRE>where Fij is the count of residues i at position j in the alignment.Now we can calculate the distance Di between each sequence i and the consensus position X in the R-dimensional space.<PRE> Di = SQRT ( SUM (Xr - Sr)(Xr - Sr) ) 1<=i<=R</PRE>The quality score for the jth position in the alignment is defined as the meanof the sequence distances Di.The score is normalised by multiplying by the percentage of sequences whichhave residues (and not gaps) at this position.<H3>CALCULATION OF RESIDUE EXCEPTIONS</H3>---------------------------------The jth residue of the ith sequence is considered as an exception if thedistance Di of the sequence from the consensus value P is greater than (UpperQuartile + Inter Quartile Range * Cutoff). The value used as a cutoff fordisplaying exceptions can be set from the SCORE PARAMETERS menu. A high cutoffvalue will only display very significant exceptions; a low value will allowmore, less significant, exceptions to be highlighted.(NB. Sequences which contain gaps at this position are not included in theexception calculation.)<H3>CALCULATION OF LOW-SCORING SEGMENTS</H3>-----------------------------------Suppose we have an alignment of m sequences of length n. Then, the alignmentcan be written as:<PRE> A11 A12 A13 .......... A1n A21 A22 A23 .......... A2n . . Am1 Am2 Am3 .......... Amn</PRE>We also have a residue comparison matrix of size R where C(i,j) is the scorefor aligning residue i with residue j.We calculate sequence weights by building a neighbour-joining tree, in whichbranch lengths are proportional to divergence. Summing the branches by branchownership provides the weights. See (Thompson et al., CABIOS, 10, 19 (1994) andHenikoff et al.,JMB, 243, 574 1994).To find the low-scoring segments in a sequence Si, we build a weighted profileof the remaining sequences in the alignment. Suppose we find residue r at position j in the sequence; then the score for the jth position in the sequenceis defined as<PRE> Score(Si,j) = Profile(j,r) where Profile(j,r) is the profile score for residue r at position j in the alignment.</PRE>These residue scores are summed along the sequence in both forward and backwarddirections. If the sum of the scores is positive, then it is reset to zero.Segments which score negatively in both directions are considered as 'low-scoring' and will be highlighted in the alignment display.>>HELP 9 << Command Line Parameters DATA (sequences)-INFILE=file.ext :input sequences-PROFILE1=file.ext and -PROFILE2=file.ext :profiles (aligned sequences) VERBS (do things)-OPTIONS :list the command line parameters-HELP or -CHECK :outline the command line parameters-ALIGN :do full multiple alignment -TREE :calculate NJ tree-BOOTSTRAP(=n) :bootstrap a NJ tree (n= number of bootstraps; def. = 1000)-CONVERT :output the input sequences in a different file format PARAMETERS (set things)***General settings:****-INTERACTIVE :read command line, then enter normal interactive menus-QUICKTREE :use FAST algorithm for the alignment guide tree-TYPE= :PROTEIN or DNA sequences-NEGATIVE :protein alignment with negative values in matrix-OUTFILE= :sequence alignment file name-OUTPUT= :CLUSTAL, GCG, GDE, PHYLIP, PIR, NEXUS, FASTA-OUTORDER= :INPUT or ALIGNED-CASE= :LOWER or UPPER (for GDE output only)-SEQNOS= :OFF or ON (for Clustal output only)***Fast Pairwise Alignments:***-KTUPLE=n :word size-TOPDIAGS=n :number of best diags.-WINDOW=n :window around best diags.-PAIRGAP=n :gap penalty-SCORE= :PERCENT or ABSOLUTE***Slow Pairwise Alignments:***-PWMATRIX= :Protein weight matrix=BLOSUM, PAM, GONNET, ID or filename-PWDNAMATRIX= :DNA weight matrix=IUB, CLUSTALW or filename-PWGAPOPEN=f :gap opening penalty-PWGAPEXT=f :gap opening penalty ***Multiple Alignments:***-NEWTREE= :file for new guide tree-USETREE= :file for old guide tree-MATRIX= :Protein weight matrix=BLOSUM, PAM, GONNET, ID or filename-DNAMATRIX= :DNA weight matrix=IUB, CLUSTALW or filename-GAPOPEN=f :gap opening penalty-GAPEXT=f :gap extension penalty-ENDGAPS :no end gap separation pen.-GAPDIST=n :gap separation pen. range-NOPGAP :residue-specific gaps off-NOHGAP :hydrophilic gaps off-HGAPRESIDUES= :list hydrophilic res.-MAXDIV=n :% ident. for delay-TYPE= :PROTEIN or DNA-TRANSWEIGHT=f :transitions weighting***Profile Alignments:***-PROFILE :Merge two alignments by profile alignment-NEWTREE1= :file for new guide tree for profile1-NEWTREE2= :file for new guide tree for profile2-USETREE1= :file for old guide tree for profile1-USETREE2= :file for old guide tree for profile2***Sequence to Profile Alignments:***-SEQUENCES :Sequentially add profile2 sequences to profile1 alignment-NEWTREE= :file for new guide tree-USETREE= :file for old guide tree***Structure Alignments:***-NOSECSTR1 :do not use secondary structure/gap penalty mask for profile 1 -NOSECSTR2 :do not use secondary structure/gap penalty mask for profile 2-SECSTROUT=STRUCTURE or MASK or BOTH or NONE :output in alignment file-HELIXGAP=n :gap penalty for helix core residues -STRANDGAP=n :gap penalty for strand core residues-LOOPGAP=n :gap penalty for loop regions-TERMINALGAP=n :gap penalty for structure termini-HELIXENDIN=n :number of residues inside helix to be treated as terminal-HELIXENDOUT=n :number of residues outside helix to be treated as terminal-STRANDENDIN=n :number of residues inside strand to be treated as terminal-STRANDENDOUT=n:number of residues outside strand to be treated as terminal ***Trees:***-OUTPUTTREE=nj OR phylip OR dist OR nexus-SEED=n :seed number for bootstraps-KIMURA :use Kimura's correction-TOSSGAPS :ignore positions with gaps-BOOTLABELS=node OR branch :position of bootstrap values in tree display>>HELP R << References<STRONG>The ClustalX program is described in the manuscript:</STRONG>Thompson,J.D., Gibson,T.J., Plewniak,F., Jeanmougin,F. and Higgins,D.G. (1997)The ClustalX windows interface: flexible strategies for multiple sequence alignment aided by quality analysis tools. Nucleic Acids Research, 25:4876-4882.<STRONG>The ClustalW program is described in the manuscript:</STRONG>Thompson, J.D., Higgins, D.G. and Gibson, T.J. (1994) CLUSTAL W: improving thesensitivity of progressive multiple sequence alignment through sequenceweighting, positions-specific gap penalties and weight matrix choice. NucleicAcids Research, 22:4673-4680.<STRONG>The ClustalV program is described in the manuscript:</STRONG>Higgins,D.G., Bleasby,A.J. and Fuchs,R. (1992) CLUSTAL V: improved software formultiple sequence alignment. CABIOS 8,189-191.<STRONG>The original Clustal program is described in the manuscripts:</STRONG>Higgins,D.G. and Sharp,P.M. (1989) Fast and sensitive multiple sequencealignments on a microcomputer.CABIOS 5,151-153.Higgins,D.G. and Sharp,P.M. (1988) CLUSTAL: a package for performing multiplesequence alignment on a microcomputer. Gene 73,237-244.-------------------------------------------------------------------------------<STRONG>Some tips on using Clustal X:</STRONG>Jeanmougin,F., Thompson,J.D., Gouy,M., Higgins,D.G. and Gibson,T.J. (1998)Multiple sequence alignment with Clustal X. Trends Biochem Sci, 23, 403-5.<STRONG>Some tips on using Clustal W:</STRONG>Higgins, D. G., Thompson, J. D. and Gibson, T. J. (1996) Using CLUSTAL formultiple sequence alignments. Methods Enzymol., 266, 383-402.-------------------------------------------------------------------------------<STRONG>You can get the latest version of the ClustalX program by anonymous ftp to:</STRONG>ftp-igbmc.u-strasbg.frftp.embl-heidelberg.deftp.ebi.ac.uk<STRONG>Or, have a look at the following WWW site:</STRONG>http://www-igbmc.u-strasbg.fr/BioInfo/
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -