📄 readme.hmm
字号:
H I D D E N M A R K O V M O D E L for automatic speech recognition7/30/95 This code implements in C++ a basic left-right hidden Markov modeland corresponding Baum-Welch (ML) training algorithm. It is meant asan example of the HMM algorithms described by L.Rabiner (1) andothers. Serious students are directed to the sources listed below fora theoretical description of the algorithm. KF Lee (2) offers anespecially good tutorial of how to build a speech recognition systemusing hidden Markov models. Jim and I built this code in order to learn how HMM systems work andwe are now offering it to the net so that others can learn how to useHMMs for speech recognition. Keep in mind that efficiency was not ourprimary concern when we built this code, but ease of understandingwas. I expect people to use this code in two different ways. Peoplewho wish to build an experimental speech recognition system can usethe included "train_hmm" and "test_hmm" programs as black boxcomponents. The code can also be used in conjunction with writtentutorials on HMMs to understand how they work. HOW TO COMPILE IT: We built this code on a Linux system (8meg RAM) and it has beentested under SunOS as well; it should run on any system with Gnu C++and has been tested to be ANSI compliant. To compile and test the program, 1) extract the code: tar -xf hmm.tar 2) compile the programs: make all 3) create test sequences: generate_seq test.hmm 20 50 4) train using existing model: train_hmm test.hmm.seq test.hmm .01 5) train using random parameters: train_hmm test.hmm.seq 1234 3 3 .01 After steps 4 and 5 you can compare the file test.hmm.seq.hmm withtest.hmm to confirm that the program is working. FILE FORMATS: There are two types of files used by these programs. The first isthe hmm model file which has the following header: states: <number of states> symbols: <number of symbols> A series of ordered blocks follow the header, each of which is twolines long. Each block corresponds to a state in the model. Thefirst line of each block gives the probability of the model recurringfollowed by the probability of generating each of the possible outputsymbols when it recurs. The second line gives the probability of themodel transitioning to the next state followed by the probability ofgenerating each of the possible output symbols when it transitions.The file "test.hmm" gives an example of this format for a three statemodel with three possible output symbols. The second kind of file is a list of symbol sequences to train ortest the model on. Symbol sequences are space separated integers (0 12...) terminated by a newline ("\n"). Sequences may either be all ofthe same length, or of different lengths. The algorithm detects foreach case and processes each slightly differently. Use the output ofstep 3 above for an example of a sequence file. A file containingsequences which are all of the same length should train slightlyfaster. ASR IN A NUTSHELL: A complete automatic speech recognition system is likely to includeprograms that perform the following tasks: 1) convert audio/wave files to sequences of multi-dimensional feature vectors. (eg. DFT, PLP, etc) 2) quantize feature vectors into sequences of symbols (eg. VQ) 3) train a model for each recognition object (ie. word, phoneme) from the sequences of symbols. (eg. HMM) 4?) constrain models using grammar information. Most of the above components are readily available as freeware andbuilding a system from them should not be too difficult. Making itwork well, however, could be a major undertaking; the devil is in thedetails. FUTURE: I would like to eventually put together all of the necessarycomponents for a complete speech recognition test bench. I envisionsomething that could be combined with a standard speech database suchas the TIMIT data set. Such a test bench would allow researchers toswap in and evaluate their own methods at various stages in thesystem. Reported results could be compared against the performance ofa standard non-optimized system which would be publicly available.This way two methods could be compared while controlling for differentdata sets and pre/post processing. Unfortunately, speech recognition is mostly a side line to Jim'sgraduate work in neural networks and I currently have a job that hastaken me away from the field of speech recognition. If someone usesthis code in a complete system, we would appreciate hearing about it. Questions and comments can be directed to: Richard Myers (rmyers@isx.com) and Jim Whitson (whitson@ics.uci.edu)Bibliography:-------------1. L. R. Rabiner, B. H. Juang, "Fundamentals of Speech Recognition." New Jersey : Prentice Hall, c1993.2. L. R. Rabiner, "A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition," Proc. of the IEEE, Feb. 1989.3. L. R. Rabiner, B. H. Juang, "An Introduction to Hidden Markov Models," IEEE ASSP Magazine, Jan. 1986.4. K. F. Lee, "Automatic speech recognition : the development of the SPHINX system." Boston : Kluwer Academic Publishers, c1989.
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -