meme-explanation.html

来自「EM算法的改进」· HTML 代码 · 共 412 行 · 第 1/2 页

HTML
412
字号
<!---#### $Id: meme-explanation.html 1339 2006-09-21 19:46:28Z tbailey $#### $Log$## Revision 1.2  2006/01/03 01:56:13  tbailey## Fix documentation of PSSM and PSPM in meme-explanation.html and## meme-output-example.html.#### Revision 1.1  2005/10/13 21:15:51  nadya## move meme-explanation.html to etc/ to allow html conversion without installing web site######---><A NAME=explanation><CENTER><H3><HR>EXPLANATION OF MEME RESULTS<HR></H3></CENTER><H4>The MEME results consist of:</H4><UL><A NAME=version_doc></A><LI>	The <A HREF=#version><B>version</B></A>        of MEME and the date it was released.<A NAME=reference_doc></A><LI>	The <A HREF=#reference><B>reference</B></A>        to cite if you use MEME in your research.<A NAME=sequences_doc></A><LI>	A description of the        <A HREF=#sequences><B>sequences</B></A>        you submitted (the "training set") showing the name,	"weight" and length of each sequence.<A NAME=command_doc></A><LI>	The <A HREF=#command><B>command line summary        </B></A> detailing the parameters with which you ran MEME.<A NAME=motifs_doc></A><LI>	Information on each of the        <A HREF=#motifs><B>motifs</B></A> MEME discovered, including:<OL><LI>    A <A NAME=summary_doc HREF=#summary_doc2>	summary line</A>        showing the width, number of occurrences, log likelihood ratio	and statistical significance of the motif.<LI>    A <A NAME=simplified_doc HREF=#simplified_doc2>        simplified position-specific probability matrix</A>.<LI>    A <A NAME=IC_doc HREF=#IC_doc2>	diagram</A>        showing the degree of conservation at each motif position.<LI>    A <A NAME=consensus_doc HREF=#consensus_doc2>        multilevel consensus sequence</A> 	showing the most conserved letter(s) at each motif position.<LI>    The <A NAME=sites_doc HREF=#sites_doc2>	occurrences of the motif</A>	sorted by <I>p</I>-value and aligned with each other.<LI>	<A NAME=diagrams_doc HREF=#diagrams_doc2>	Block diagrams</A> 	of the occurrences of the motif within each sequence in the training 	set.<LI>    The motif in 	<A NAME=BLOCKS_doc HREF=#BLOCKS_doc2>	BLOCKS or FASTA format</A>.<LI>    A <A NAME=pssm_doc HREF=#pssm_doc2>        position-specific scoring matrix (PSSM)</A> 	for use by the        <A HREF="mast-intro.html">MAST</A> database search program.<LI>    The <A NAME=pspm_doc HREF=#pspm_doc2>        position specific probability matrix (PSPM)</A> 	describing the motif.<LI>    A <A NAME=regular_expression_doc HREF=#regular_expression_doc2>        regular expression</A> describing the motif.</OL><A NAME=motif-summary_doc></A><LI>	A <A HREF=#motif-summary><B>summary of motifs</B></A>        showing an optimized (non-overlapping)         <A HREF=#motif-summary-doc2>tiling</A> of all of the motifs onto        each of the sequences in the training set.<A NAME=stopped_doc></A><LI>	The reason why MEME <A HREF=#stopped>stopped</A>	and the name of the CPU on which it ran.<A NAME=explanation_doc></A><LI>	This <B>explanation</B> of how to interpret MEME results.</UL><P><A HREF=#summary1 NAME=motifs> MOTIFS </A><P>For each motif that it discovers in the training set,MEME prints the following information: <UL><P><LI> <A NAME=summary_doc2 HREF=#summary1><H4>Summary Line</H4></A>This line gives the width (`width'), number of occurrences in the training set (`sites'), log likelihoodratio (`llr') and <I>E</I>-value of the motif.Each motif describes a pattern of a fixed width--no gaps are allowed inMEME motifs.MEME numbers the motifs consecutively from one as it finds them. MEME usually finds the most statistically significant (low <I>E</I>-value) motifs first.The statistical significance of a motif is based on its log likelihood ratio,its width and number of occurrences, the background letter frequencies(given in the <A HREF=#command_doc>command line summary</A>), and the size of the training set.  The <I>E</I>-value is anestimate of the expected number of motifs with the given log likelihoodratio (or higher), and with the same width and number of occurrences,that one would find in a similarly sized set of random sequences. (In random sequences each position is independent with letters chosenaccording to the background letter frequencies.)  The log likelihoodratio is the logarithm of the ratio of the probability of the occurrences of the motif given the motif model (likelihood given the motif) versus their probability given the background model (likelihood given thenull model).  (Normally the background model is a 0-order Markov modelusing the background letter frequencies, but higher order Markov modelsmay be specified via the <B>-bfile</B> option to MEME.)Clicking on the <B>buttons</B> to the left of the motif summary line takes you to the previous motif (P) or next motif (N).<P><LI> <A NAME=simplified_doc2 HREF=#simplified1><H4>Simplified   Position-Specific Probability Matrix</H4></A>MEME motifs are represented by position-specific probability matrices that specify the probability of each possible letter appearing at eachpossible position in an occurrence of the motif.  In order to make it easier to see which letters are most likely in each of the columns of themotif, the simplified motif shows the letter probabilities multiplied by 10 rounded to the nearest integer ("a" means 10).  Zeros are replaced by ":" (the colon) for readability.<P><LI> <A NAME=IC_doc2 HREF=#IC1><H4>Information Content Diagram</H4></A>The information content diagram provides an idea of which positions in the motif are most highly conserved.Each column (position) in a motif can be characterized by the amount ofinformation it contains (measured in bits).  Highly conserved positionsin the motif have high information; positions where all letters are equallylikely have low information.  (The information content is relative tothe background letter frequencies which are given in the <A HREF=#command_doc>command line summary</A> section.)The diagram is printed so that each column lines up with the same column in the simplified position-specific probability matrix above it.Columns in the information content diagram are colored according to themajority category of the letters occurring in that column of the alignment.If no letter category has frequency above 0.5, the column in the diagramis colored black.  For DNA sequences, the letter categories contain one lettereach.  For proteins,  the categories are based on the biochemical propertiesof the various amino acids.  The categories and their colors are:<P><CENTER>  <TABLE BORDER>    <TR> <TH>NUCLEIC ACIDS</TH> <TH ALIGN=LEFT>COLOR</TH> </TR>    <TR>       <TD>A</TD>       <TD><FONT COLOR=RED>RED</FONT></TD></TR>    <TR>        <TD>C</TD>        <TD><FONT COLOR=BLUE>BLUE</FONT></TD></TR>    <TR>       <TD>G</TD>       <TD><FONT COLOR=ORANGE>ORANGE</FONT></TD></TR>    <TR>       <TD>T</TD>       <TD><FONT COLOR=GREEN>GREEN</FONT></TD></TR>  </TABLE>  <P>  <TABLE BORDER>    <TR> <TH>AMINO ACIDS</TH> <TH ALIGN=LEFT>COLOR</TH> 	 <TH ALIGN=LEFT>PROPERTIES</TH> </TR>      <TD>A, C, F, I, L, V, W and M</TD>      <TD><FONT COLOR=BLUE>BLUE</FONT></TD>      <TD>Most hydrophobic[Kyte and Doolittle, 1982]</TD>    </TR>    <TR>      <TD>NQST</TD>      <TD><FONT COLOR=GREEN>GREEN</FONT></TD>      <TD>Polar, non-charged, non-aliphatic residues</TD>    </TR>    <TR>      <TD>DE</TD>      <TD><FONT COLOR=MAGENTA>MAGENTA</FONT></TD>      <TD>Acidic</TD>    </TR>    <TR>      <TD>KR</TD>      <TD><FONT COLOR=RED>RED</FONT></TD>      <TD>Positively charged</TD>    </TR>    <TR>      <TD>H</TD>      <TD><FONT COLOR=PINK>PINK</FONT></TD> </TR>    <TR>      <TD>G</TD>      <TD><FONT COLOR=ORANGE>ORANGE</FONT></TD> </TR>    <TR>      <TD>P</TD>      <TD><FONT COLOR=YELLOW>YELLOW</FONT></TD> </TR>

⌨️ 快捷键说明

复制代码Ctrl + C
搜索代码Ctrl + F
全屏模式F11
增大字号Ctrl + =
减小字号Ctrl + -
显示快捷键?