📄 readme.txt

📁 greedy em 混和模型训练算法
💻 TXT
字号:
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%% THE GREEDY EM ALGORITHM FOR MULTIPLE MOTIF DISCOVERY    %%%%%
%%%%%                                                         %%%%%
%%%%% Kostas Blekas, 15 May 2001                              %%%%%
%%%%% Dept. of Computer Science                               %%%%%
%%%%% University of Ioannina, Greece                          %%%%%
%%%%%                                                         %%%%%
%%%%% please contact at kblekas@cc.uoi.gr in case of problems %%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%


The GreedyEM.zip file contains the following files:


filename			explanation
-----------------------------------------------------------

readme.txt 			(this file)

artificial_seqs.txt 		the artificial set of 10 sequences with 6 motifs

pr00058.txt			the PRINTS family PR00058 with 16 sequences

artificial_res.txt		the results as depicted from applying the GreedyEM algorithm to the artificial_seqs dataset (demo1)

pr00058_res.txt			the results as depicted from applying the GreedyEM algorithm to the PR00058 family (demo2)

read_seqs.m			reads the file stored the training sequences
				and creates the training set of substrings

GreedyEM.m			The Greedy EM basic algorithm

kd_trees.m			Kd-tree technique for partitioning the set
kd_recurse.m			of n substrings to a set of C candidate models
bestpos.m			for global searching phase. The Ksi matrix
Ksi_matrix.m			is calculated for iteratively using during candidate selection

candidate_selection.m		Finds the candidate model that maximizes the log-likelihood and
				initializes the trial component parameters (a' and 
				probability matrix)

partial_EMsteps.m		Perform partial EM steps until convergence

Estep.m				Expectation phase and
Mstep.m				Maximization phase of the general EM algorithm for likelihood maximization

take_res.m			Stores the results in the res.txt file

demo1.m				Demo with application in the artificial dataset artificial_seqs.txt

demo2.m				Demo with application in the PR00058 family

=====================

Follow the instructions:

steps
-----

[1]. Run the GreedyEM.m file to discover the motifs of length W in the set of sequences. 
     
     Give the following inputs:

     	- The motif length (W)
	- The maximum number of motifs to discover
	- the name of the sequences dataset in FASTA format
	- the value of parameter (T) used for partitioning the input substrings.

	For better results choose T=N (number of training sequences)	    

[2]. Read the file res.txt for an explanation and statistics of the results (motifs found).
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -