⭐ 欢迎来到虫虫下载站! | 📦 资源下载 📁 资源专辑 ℹ️ 关于我们
⭐ 虫虫下载站

📄 ghmm.py

📁 一个通用的隐性马尔可夫C代码库 开发环境:C语言 简要说明:这是一个通用的隐性马尔可夫C代码库
💻 PY
📖 第 1 页 / 共 5 页
字号:
#!/usr/bin/env python2.3##################################################################################       This file is part of GHMM (General Hidden Markov Model library) ##       file:    ghmm.py#       authors: Benjamin Georgi, Wasinee Rungsarityotin, Alexander Schliep##       Copyright (C) 2003-2004, Alexander Schliep and MPI Molekulare Genetik, Berlin#                                   #       Contact: schliep@molgen.mpg.de         ##       Information: http://ghmm.org##       This library is free software; you can redistribute it and/or#       modify it under the terms of the GNU Library General Public#       License as published by the Free Software Foundation; either#       version 2 of the License, or (at your option) any later version.##       This library is distributed in the hope that it will be useful,#       but WITHOUT ANY WARRANTY; without even the implied warranty of#       MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU#       Library General Public License for more details.##       You should have received a copy of the GNU Library General Public#       License along with this library; if not, write to the Free#       Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA#####       This file is version $Revision: 1.65.2.1 $ #                       from $Date: 2004/06/11 13:24:50 $#             last change by $Author: wasinee $.#################################################################################"""The Design of ghmm.py HMMs are stochastic models which encode a probability density oversequences of symbols. These symbols can be discrete letters (A,C,G andT for DNA; 1,2,3,4,5,6 for dice), real numbers (weather measurementover time: temperature) or vectors of either or the combinationthereof (weather again: temperature, pressure, percipitation).Note: We will always talk about emissions, emission sequence and soforth when we refer to the sequence of symbols. Another namefor the same object is observation resp. observation sequence.The objects one has to deal with in HMM modelling are the following1) The domain the emissions come from: the EmissionDomain. Domain   is to be understood mathematically and to encompass both discrete,   finite alphabets and fields such as the real numbers or intervals   of the reals.   For technical reasons there can be two representations of an   emission symbol: an external and an internal. The external   representation is the view of the application using ghmm.py. The   internal one is what is used in both ghmm.py and the ghmm   C-library. Representations can coincide, but this is not   guaranteed. Discrete alphabets of size k are represented as   [0,1,2,...,k-1] internally.  It is the domain objects job to   provide a mapping between representations in both directions.   NOTE: Do not make assumptions about the internal   representations. It might change.   2) Every domain has to afford a distribution, which is usually   parameterized. A distribution associated with a domain   should allow us to compute $\Prob[x| distribution parameters]$   efficiently.   The distribution defines the *type* of distribution which   we will use to model emissions in *every state* of the HMM.   The *type* of distribution will be identical for all states,   their *parameterizations* will differ from state to state.    3) We will consider a Sequence of emissions from the same emission   domain and very often sets of such sequences: SequenceSet4) The HMM: The HMM consists of two major components: A Markov chain   over states (implemented as a weighted directed graph with   adjacency and inverse-adjacency lists) and the emission   distributions per-state. For reasons of efficiency the HMM itself   is *static*, as far as the topology of the underlying Markov chain   (and obviously the EmissionDomain) are concerned. You cannot add or   delete transitions in an HMM.   Transition probabilities and the parameters of the per-state   emission distributions can be easily modified. Particularly,   Baum-Welch reestimation is supported.  While a transition cannot be   deleted from the graph, you can set the transition probability to   zero, which has the same effect from the theoretical point of   view. However, the corresponding edge in the graph is still   traversed in the computation.    States in HMMs are referred to by their integer index. State sequences   are simply list of integers.   If you want to store application specific data for each state you   have to do it yourself.   Subclasses of HMM implement specific types of HMM. The type depends   on the EmissionDomain, the Distribution used, the specific   extensions to the 'standard' HMMs and so forth   5) HMMFactory: This provides a way of constucting HMMs. Classes derived   from HMMFactory allow to read HMMs from files, construct them explicitly   from, for a discrete alphabet, transition matrix, emission matrix and prior   or serve as the basis for GUI-based model building.   There are two ways of using the HMMFactory.   Static construction:   HMMOpen(fileName) # Calls an object of type HMMOpen instantiated in ghmm   HMMOpen(fileName, type=HMM.FILE_XML)   HMMFromMatrices(emission_domain, distribution, A, B, pi) # B is a list of distribution parameters   Dynamic construction: Providing a context for dynamically   editing existing HMMs or creating them from scratch   HMMEditingContext() # Create an object   HMMEditingContext(hmm) # Create an object from existing HMM      Examples:   hmm = HMMOpen('some-hmm.xml')   hmm_context = HMMEditingContext(hmm) # just reads hmm   hmm_context.addTransition(4,5, 0.3) # normalization will occurr   hmm_context.addTransition(5,6, 0.1)   hmm = hmm_context() # Creates a new hmm    hmm.bla ....   """import xmlutilimport ghmmwrapperimport ghmmhelperimport modhmmerimport reimport StringIOimport copyfrom os import pathfrom math import log,ceilghmmwrapper.gsl_rng_init() # Initialize random number generatorghmmwrapper.time_seed()#-------------------------------------------------------------------------------#- Exceptions ------------------------------------------------------------------class GHMMError(Exception):    """Base class for exceptions in this module."""    def __init__(self, message):        self.message = message    def __str__(self):        return repr(self.message)    class UnknownInputType(GHMMError):    def __init__(self,message):       self.message = message    def __str__(self):        return repr(self.message)    class NoValidCDataType(GHMMError):    def __init__(self,message):           self.message = message    def __str__(self):        return repr(self.message)            class badCPointer(GHMMError):    def __init__(self,message):           self.message = message    def __str__(self):        return repr(self.message)            class SequenceCannotBeBuild(GHMMError):    def __init__(self,message):           self.message = message    def __str__(self):        return repr(self.message)    #-------------------------------------------------------------------------------#- EmissionDomain and derived  -------------------------------------------------class EmissionDomain:    """ Abstract base class for emissions produced by an HMM.        There can be two representations for emissions:        1) An internal, used in ghmm.py and the ghmm C-library        2) An external, used in your particular application        Example:        The underlying library represents symbols from a finite,        discrete domain as integers (see Alphabet).        EmissionDomain is the identity mapping    """    def internal(self, emission):        """ Given a emission return the internal representation        """        return emission    def internalSequence(self, emissionSequence):        """ Given a emissionSequence return the internal representation        """        return emissionSequence    def external(self, internal):        """ Given an internal representation return the            external representation        """        return internal    def externalSequence(self, internalSequence):        """ Given a sequence with the internal representation return the            external representation        """        return internalSequence    def isAdmissable(self, emission):        """ Check whether emission is admissable (contained in) the domain            raises GHMMOutOfDomain else        """        return Noneclass Alphabet(EmissionDomain):    """ Discrete, finite alphabet        """    def __init__(self, listOfCharacters):        """ Creates an alphabet out of a listOfCharacters """        self.listOfCharacters = listOfCharacters        self.index = {} # Which index belongs to which character        i = 0        for c in self.listOfCharacters:            self.index[c] = i            i += 1        self.CDataType = "int" # flag indicating which C data type should be used    def internal(self, emission):        """ Given a emission return the internal representation        """        return self.index[emission]    def internalSequence(self, emissionSequence):        """ Given a emission_sequence return the internal representation            Raises KeyError        """        result = copy.deepcopy(emissionSequence)        try:            result = map(lambda i: self.index[i], result)        except IndexError:            raise KeyError        return result    def external(self, internal):        """ Given an internal representation return the            external representation            Raises KeyError        """        if internal < 0 or len(self.listOfCharacters) <= internal:             raise KeyError        return self.listOfCharacters[internal]    def externalSequence(self, internalSequence):        """ Given a sequence with the internal representation return the            external representation        """        result = copy.deepcopy(internalSequence)        try:            result = map(lambda i: self.listOfCharacters[i], result)        except IndexError:            raise KeyError        return result    def isAdmissable(self, emission):        """ Check whether emission is admissable (contained in) the domain            raises GHMMOutOfDomain else        """        return emission in self.listOfCharacters    def size(self):        """ depreciate """        return len(self.listOfCharacters)    def __len__(self):        return len(self.listOfCharacters)DNA = Alphabet(['a','c','g','t'])AminoAcids = Alphabet(['A','C','D','E','F','G','H','I','K','L',                       'M','N','P','Q','R','S','T','V','W','Y'])def IntegerRange(a,b):    return Alphabet(range(a,b))class Float(EmissionDomain):    def __init__(self):        self.CDataType = "double" # flag indicating which C data type should be used    def isAdmissable(self, emission):        """ Check whether emission is admissable (contained in) the domain            raises GHMMOutOfDomain else        """        return isinstance(emission,float)   

⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -