⭐ 欢迎来到虫虫下载站! | 📦 资源下载 📁 资源专辑 ℹ️ 关于我们
⭐ 虫虫下载站

📄 search.c

📁 WinCE平台上的语音识别程序
💻 C
📖 第 1 页 / 共 5 页
字号:
/* -*- c-basic-offset: 4; indent-tabs-mode: nil -*- *//* ==================================================================== * Copyright (c) 1999-2004 Carnegie Mellon University.  All rights * reserved. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * * 1. Redistributions of source code must retain the above copyright *    notice, this list of conditions and the following disclaimer.  * * 2. Redistributions in binary form must reproduce the above copyright *    notice, this list of conditions and the following disclaimer in *    the documentation and/or other materials provided with the *    distribution. * * This work was supported in part by funding from the Defense Advanced  * Research Projects Agency and the National Science Foundation of the  * United States of America, and the CMU Sphinx Speech Consortium. * * THIS SOFTWARE IS PROVIDED BY CARNEGIE MELLON UNIVERSITY ``AS IS'' AND  * ANY EXPRESSED OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO,  * THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR * PURPOSE ARE DISCLAIMED.  IN NO EVENT SHALL CARNEGIE MELLON UNIVERSITY * NOR ITS EMPLOYEES BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT  * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,  * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY  * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT  * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE  * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. * * ==================================================================== * *//* * search.c -- HMM-tree version *  *  * 01-Dec-2004	M K Ravishankar (rkm@cs) at Carnegie Mellon University * 		Consolidated senone_active updates into senscr.c. *  * 22-Nov-2004	M K Ravishankar (rkm@cs) at Carnegie Mellon University * 		Changed senone score computation to use senscr module, * 		for integrated handling of semi-continuous and continuous * 		acoustic model evaluation. *  * Revision 1.13  2004/11/09 19:01:41  egouvea * Added Ravi's changes, which add a phone transition probability to the * allphone search. Also, when using a start word in the search, do not * assume a default if startword not defined. * * 12-Aug-2004	M K Ravishankar (rkm@cs) at Carnegie Mellon University * 		Added search_get_current_startwid(). *  * Revision 1.12  2004/07/23 23:36:34  egouvea * Ravi's merge, with the latest fixes in the FSG code, and making the log files generated by FSG, LM, and allphone have the same 'look and feel', with the backtrace information presented consistently * * Revision 1.11  2004/07/16 00:57:11  egouvea * Added Ravi's implementation of FSG support. * * Revision 1.7  2004/07/07 13:56:33  rkm * Added reporting of (acoustic score - best senone score)/frame * * Revision 1.6  2004/06/18 17:11:53  rkm * *** empty log message *** * * Revision 1.5  2004/06/16 18:45:54  rkm * *** empty log message *** * * Revision 1.4  2004/06/16 18:32:28  rkm * Minor logformat change * * Revision 1.3  2004/06/16 13:47:58  rkm * *** empty log message *** * *  * 06-Aug-99	M K Ravishankar (rkm@cs) at Carnegie Mellon University * 		Added -maxwpf parameter handling to limit number of words exiting per frame. *  * 30-Oct-98	M K Ravishankar (rkm@cs) at Carnegie Mellon University * 		Generalized the implementation of pscr based allphone. * 		Added phone_conf option at the end of FWDTREE search to produce pscr-based * 		rescoring of fwdtree result. *  * 24-Mar-98	M K Ravishankar (rkm@cs) at Carnegie Mellon University * 		Added phone perplexity measure into search_hyp_t structure hypothesis * 		for each utterance. *  * 08-Mar-98	M K Ravishankar (rkm@cs) at Carnegie Mellon University * 		Added lattice density measure into search_hyp_t structure generated * 		as hypothesis for each utterance. *  * 04-Apr-97	M K Ravishankar (rkm@cs) at Carnegie Mellon University * 		Added search_remove_context() call in search_postprocess_bptable. *  * 03-Apr-97	M K Ravishankar (rkm@cs) at Carnegie Mellon University * 		Changed lm_cache_reset to lm3g_cache_reset, and lm_cache_stats_dump * 		to lm3g_cache_stats_dump. *  * 08-Dec-95	M K Ravishankar (rkm@cs) at Carnegie Mellon University * 		Changed search_hyp_t hyp[] result to contain actual frame ids. * 		instead of post-silence-compression ids. * 		Added functions search_bptbl_wordlist() and search_bptbl_pred(). *  * 12-Jul-95	M K Ravishankar (rkm@cs) at Carnegie Mellon University * 		Commented out lm_cache_reset in search_fwdflat_start(). *  * 19-Jun-95	M K Ravishankar (rkm@cs.cmu.edu) at Carnegie Mellon University * 		Changed strings phone_active to npa (phone_active is too generic). *  * 19-Jun-95	M K Ravishankar (rkm@cs.cmu.edu) at Carnegie Mellon University * 		Added bestpscr[] and modified compute_phone_active() to use best phone * 		scores (bestpscr) returned by SCVQ. *  * 15-Jun-95	M K Ravishankar (rkm@cs.cmu.edu) at Carnegie Mellon University * 		Modified to always rebuild search tree when search_set_current_lm called. *  * 22-May-95	M K Ravishankar (rkm@cs.cmu.edu) at Carnegie Mellon University *		Changed search_result and search_partial_result interfaces to simplify * 		network client interfaces for these two. *  * 09-Dec-94	M K Ravishankar (rkm@cs.cmu.edu) at Carnegie Mellon University *		Added flat forward pass after tree forward pass. *  * Revision 8.10  94/10/11  12:39:52  rkm * Print back trace conditionally, depending on -backtrace argument. *  * Revision 8.9  94/07/29  12:01:40  rkm * Added code, in building search tree, to take care of growth in * dictionary size owing to dynamic addition of OOVs. *  * Revision 8.8  94/05/26  16:49:41  rkm * Rewrote lattice rescoring non-recursively to reduce LM thrashing, * and moved that code to a separate file.  Moved some data structure * declaration (BPTBL_T) to search.h. *  * Revision 8.7  94/05/19  14:21:36  rkm * Reordered LM accesses in last_phone_transition and word_transition for * greater efficiency. *  * Revision 8.6  94/05/10  10:48:43  rkm * Changed various list memory management routines to use the generic * listelem_alloc() and listelem_free() functions. * Added last_ltrans array for caching the best LM score info during * last_phone_transition(). *  * Revision 8.5  94/04/22  13:56:30  rkm * Added search_hyp_t for collecting output hypothesis-related info in * one place.  Directed output of both passes to this structure, so the * match file will contain the final result. *  * Revision 8.4  94/04/14  14:43:52  rkm * Added optional second pass for lattice-rescoring. * Added option to dump forward pass bptable for postprocessing. * Added option to skip inter-channel transitions in alternate frames. * Fixed bug in init_search_tree in allocating first_phone_rchan_map. *  * Revision 8.1  94/02/15  15:13:07  rkm * Derived from v7.  Includes multiple start symbols for the LISTEN * project.  Includes multiple LMs for grammar switching. *  * Revision 1.14  94/02/11  13:12:48  rkm * Added multiple start words for the LISTEN project. * Corrected minor error in statistics gathering. *  * Revision 1.13  94/02/03  18:38:12  rkm * Fixed debugging and tracing code. *  * Revision 1.12  94/02/01  10:46:54  rkm * Mark active senones only if topN=4; otherwise SCVQ computes all senone scores. *  * Revision 1.11  94/02/01  10:23:02  rkm * Lookup trigram LM values through trigram LM cache instead of directly. *  * Revision 1.10  94/01/31  14:27:21  rkm * Added code to mark the active senones in each frame. *  * Revision 1.9  94/01/25  12:36:45  rkm * Look up LM values through bigram cache instead of directly. *  * Revision 1.8  94/01/24  10:01:38  rkm * Include LM score when entering last phone of any word, rather than on * exiting word.  Special case for handling single-phone words. *  * Revision 1.7  94/01/21  15:22:10  rkm * Minor changes. *  * Revision 1.6  94/01/21  15:06:45  rkm * Bug fix in word_transition in compacting BPTable. *  * Revision 1.5  94/01/21  13:47:48  rkm * Bug fix in alloc_all_rc(). *  * Revision 1.4  94/01/19  11:25:48  rkm * Before rescoring last phone with LM scores. *  *//* * NOTE: this module assumes that the dictionary is organized as follows: *     Main, real dictionary words *     </s> *     <s>... (possibly more than one of these) *     <sil> *     noise-words... * In particular, note that </s> comes before <s> since </s> occurs in the LM, but * not <s> (well, there's no transition to <s> in the LM). *//* System includes. */#include <stdio.h>#include <stdlib.h>#include <string.h>#include <math.h>#include <assert.h>/* SphinxBase includes. */#include <ckd_alloc.h>#include <err.h>#include <cmd_ln.h>/* Local includes. */#include "s2types.h"#include "basic_types.h"#include "linklist.h"#include "list.h"#include "search_const.h"#include "dict.h"#include "msd.h"#include "lm.h"#include "lmclass.h"#include "lm_3g.h"#include "phone.h"#include "kb.h"#include "log.h"#include "s2_semi_mgau.h"#include "senscr.h"#include "fbs.h"#include "search.h"/* Turn this on to dump channels for debugging */#define __CHAN_DUMP__		0#define ISA_FILLER_WORD(x)	((x) >= SilenceWordId)#define ISA_REAL_WORD(x)	((x) < FinishWordId)#define SEARCH_PROFILE			1#define SEARCH_TRACE_CHAN		0#define SEARCH_TRACE_CHAN_DETAILED	0#define SEARCH_SELFTEST_DETAILED	0/* * Search structure of HMM instances (channels; see CHAN_T and ROOT_CHAN_T definitions): * The word triphone sequences (HMM instances) are transformed into tree structures, * one tree per unique left triphone in the entire dictionary (actually diphone, since * its left context varies dyamically during the search process).  The entire set of * trees of channels is allocated once and for all during initialization (since * dynamic management of active CHANs is time consuming), with one exception: the * last phones of words, that need multiple right context modelling, are not maintained * in this static structure since there are too many of them and few are active at any * time.  Instead they are maintained as linked lists of CHANs, one list per word, * and each CHAN in this set is allocated only on demand and freed if inactive. */static ROOT_CHAN_T *root_chan;  /* one per unique root channel */static int32 n_root_chan_alloc; /* total number of root channels allocated */static int32 n_root_chan;       /* number of root channels valid for a given utt;                                   depends on words in the LM for that utt */static int32 n_nonroot_chan;    /* #non-root channels in search tree *//* MAX #non-root channels in search tree for allocating active_chan_list[]... */static int32 max_nonroot_chan = 0;static int32 n_phone_eval;static int32 n_root_chan_eval;static int32 n_nonroot_chan_eval;static int32 n_last_chan_eval;static int32 n_word_lastchan_eval;static int32 n_lastphn_cand_utt;static int32 n_phn_in_topsen;/* * word_chan[w] = separate linked list of channels for each word w, normally used only * to model the last phone of w, with multiple channels representing different right * context phones. */static CHAN_T **word_chan;/* word_active[w] = 1 if word w active in current frame, 0 otherwise */static int32 *word_active;/* * Each node in the HMM tree structure may point to a set of words whose last phone * would follow that node in the tree structure (but is not included in the tree * structure for reasons explained above).  The channel node points to one word in this * set of words.  The remaining words are linked through homophone_set[]. *  * Single-phone words are not represented in the HMM tree; they are kept in word_chan. * * Specifically, homophone_set[w] = wid of next word in the same set as w. */static WORD_ID *homophone_set;/* * In any frame, only some HMM tree nodes are active.  active_chan_list[f mod 2] = * list of nonroot channels in the HMM tree active in frame f. * Similarly, active_word_list[f mod 2] = list of word ids for which active channels * exist in word_chan in frame f. */static CHAN_T **active_chan_list[2] = { NULL, NULL };static int32 n_active_chan[2];  /* #entries in active_chan_list */static WORD_ID *active_word_list[2];static int32 n_active_word[2];  /* #entries in active_word_list */static int32 NumWords;          /* Total #words in dictionary */static int32 NumMainDictWords;  /* #words in main dictionary, excluding fillers                                   (i.e., <s>, </s>, <sil>, and noise words).                                   These come first in WordDict. */static int32 NumCiPhones;static lm_t *LangModel = NULL;static int32 StartWordId;static int32 FinishWordId;static int32 SilenceWordId;static int32 SilencePhoneId;static int32 **LeftContextFwd;static int32 **RightContextFwd;static int32 **RightContextFwdPerm;static int32 *RightContextFwdSize;static int32 **LeftContextBwd;static int32 **LeftContextBwdPerm;static int32 *LeftContextBwdSize;static int32 **RightContextBwd;static int32 **sc_scores;       /* SC scores for several frames in advance */static int32 BestScore;         /* Best among all phones */static int32 LastPhoneBestScore;        /* Best among last phones only */static int32 LogBeamWidth;static int32 DynamicLogBeamWidth;       /* Modified by absolute pruning */static int32 NewPhoneLogBeamWidth;static int32 NewWordLogBeamWidth;static int32 LastPhoneAloneLogBeamWidth;static int32 LastPhoneLogBeamWidth;static int32 FwdflatLogBeamWidth;static int32 FwdflatLogWordBeamWidth;static int32 FillerWordPenalty = 0;static int32 SilenceWordPenalty = 0;static int32 LogInsertionPenalty = 0;static int32 logPhoneInsertionPenalty = 0;static lw_t fwdtree_lw = FLOAT2LW(6.5);static lw_t fwdflat_lw = FLOAT2LW(8.5);static lw_t bestpath_lw = FLOAT2LW(9.5);static lw_t bestpath_fwdtree_lw_ratio = FLOAT2LW(9.5 / 6.5);static lw_t fwdflat_fwdtree_lw_ratio = FLOAT2LW(8.5 / 6.5);static int32 newword_penalty = 0;/* BestScoreTable[CurrentFrame] === BestScore */static int32 BestScoreTable[MAX_FRAMES];static int32 compute_all_senones = TRUE;static int32 ChannelsPerFrameTarget = 0;        /* #channels to eval / frame */static int32 CurrentFrame;static int32 LastFrame;static int32 n_senone_active_utt;static BPTBL_T *BPTable;        /* Forward pass lattice */static int32 BPIdx;             /* First free BPTable entry */static int32 BPTableSize;static int32 *BScoreStack;      /* Score stack for all possible right contexts */static int32 BSSHead;           /* First free BScoreStack entry */static int32 BScoreStackSize;static int32 *BPTableIdx;       /* First BPTable entry for each frame */static int32 *WordLatIdx;       /* BPTable index for any word in current frame;                                   cleared before each frame */static int32 BPTblOflMsg;       /* Whether BPtable overflow msg has been printed */static int32 *lattice_density;  /* #words/frame in lattice */static int32 *zeroPermTab;/* Word-id sequence hypothesized by decoder */static search_hyp_t hyp[HYP_SZ];        /* no <s>, </s>, or filler words */static char hyp_str[4096];      /* hyp as string of words sep. by blanks */static int32 hyp_wid[4096];static int32 n_hyp_wid;static int32 HypTotalScore;

⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -