main_decode_anytopo.c

来自「CMU大名鼎鼎的SPHINX-3大词汇量连续语音识别系统」· C语言 代码 · 共 1,466 行 · 第 1/4 页

C
1,466
字号
/* ==================================================================== * Copyright (c) 1995-2004 Carnegie Mellon University.  All rights * reserved. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * * 1. Redistributions of source code must retain the above copyright *    notice, this list of conditions and the following disclaimer.  * * 2. Redistributions in binary form must reproduce the above copyright *    notice, this list of conditions and the following disclaimer in *    the documentation and/or other materials provided with the *    distribution. * * This work was supported in part by funding from the Defense Advanced  * Research Projects Agency and the National Science Foundation of the  * United States of America, and the CMU Sphinx Speech Consortium. * * THIS SOFTWARE IS PROVIDED BY CARNEGIE MELLON UNIVERSITY ``AS IS'' AND  * ANY EXPRESSED OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO,  * THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR * PURPOSE ARE DISCLAIMED.  IN NO EVENT SHALL CARNEGIE MELLON UNIVERSITY * NOR ITS EMPLOYEES BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT  * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,  * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY  * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT  * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE  * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. * * ==================================================================== * *//* * main.c -- Main S3 decoder driver. * * ********************************************** * CMU ARPA Speech Project * * Copyright (c) 1996 Carnegie Mellon University. * ALL RIGHTS RESERVED. * ********************************************** *  * HISTORY *  * 26-Jul-04    ARCHAN (archan@cs.cmu.edu) at Carnegie Mellon Unversity *              First incorporated from sphinx 3.0 code base to 3.X codebase.  * * $Log: main_decode_anytopo.c,v $ * Revision 1.7  2004/12/23 21:05:22  arthchan2003 * Enable compilation of decode_anytopo, change option names from -match to -hyp, it makes the code more consistent. * * Revision 1.6  2004/12/14 00:50:33  arthchan2003 * 1, Change the code to accept extension, 2, add timer to livepretend, 3, fixing the s3_astar to separate the bypass variable to bypass and is_filler_bypass.  4, Add some doxygen comments. 5, Don't care about changes in main_decode_anytopo.c. It is still under work, 6, remove option -help and -example from 3.5 releases. * * Revision 1.5  2004/12/06 11:31:47  arthchan2003 * Fix brief comments for programs. * * Revision 1.4  2004/12/06 11:15:11  arthchan2003 * Enable doxygen in the program directory. * * Revision 1.3  2004/12/05 12:01:32  arthchan2003 * 1, move libutil/libutil.h to s3types.h, seems to me not very nice to have it in every files. 2, Remove warning messages of main_align.c 3, Remove warning messages in chgCase.c * * Revision 1.2  2004/11/16 05:13:19  arthchan2003 * 1, s3cipid_t is upgraded to int16 because we need that, I already check that there are no magic code using 8-bit s3cipid_t * 2, Refactor the ep code and put a lot of stuffs into fe.c (should be renamed to something else. * 3, Check-in codes of wave2feat and cepview. (cepview will not dump core but Evandro will kill me) * 4, Make the same command line frontends for decode, align, dag, astar, allphone, decode_anytopo and ep . Allow the use a file to configure the application. * 5, Make changes in test such that test-allphone becomes a repeatability test. * 6, cepview, wave2feat and decode_anytopo will not be installed in 3.5 RCIII * (Known bugs after this commit) * 1, decode_anytopo has strange bugs in some situations that it cannot find the end of the lattice. This is urgent. * 2, default argument file's mechanism is not yet supported, we need to fix it. * 3, the bug discovered by SonicFoundry is still not fixed. * * Revision 1.1  2004/11/14 07:00:08  arthchan2003 * 1, Finally, a version of working flat decoder is completed. It is not compiled in the standard compilation yet because there are two many warnings. 2, eliminate the statics variables in  fe_sigproc.c * * Revision 1.2  2002/12/03 23:02:40  egouvea * Updated slow decoder with current working version. * Added copyright notice to Makefiles, *.c and *.h files. * Updated some of the documentation. * * Revision 1.1.1.1  2002/12/03 20:20:46  robust * Import of s3decode. * *  * 08-Sep-97	M K Ravishankar (rkm@cs.cmu.edu) at Carnegie Mellon University * 		Added .Z compression option to lattice files. *  * 06-Mar-97	M K Ravishankar (rkm@cs.cmu.edu) at Carnegie Mellon University. * 		Added .semi. and .cont. options to -senmgaufn flag. *  * 02-Dec-96	M K Ravishankar (rkm@cs.cmu.edu) at Carnegie Mellon University * 		Restricted MLLR transformation to CD mixture densities only. *  * 15-Nov-96	M K Ravishankar (rkm@cs.cmu.edu) at Carnegie Mellon University * 		Changed the meaning of -matchsegfn and, correspondingly, log_hypseg(). *  * 11-Nov-96	M K Ravishankar (rkm@cs.cmu.edu) at Carnegie Mellon University * 		Added -min_endfr and -dagfudge arguments. *  * 08-Nov-96	M K Ravishankar (rkm@cs.cmu.edu) at Carnegie Mellon University * 		Added BSTXCT: reporting since that became available from dag_search. *  * 07-Nov-96	M K Ravishankar (rkm@cs.cmu.edu) at Carnegie Mellon University * 		Added ,NODES suffix to -outlatdir argument for dumping only words to * 		lattice output files. *   * 16-Oct-96	M K Ravishankar (rkm@cs.cmu.edu) at Carnegie Mellon University * 		Added orig_stdout, orig_stderr hack to avoid hanging on exit under Linux. *   * 11-Oct-96	M K Ravishankar (rkm@cs.cmu.edu) at Carnegie Mellon University * 		Added fillpen_init() and removed explicit addition of SILENCE_WORD, * 		START_WORD and FINISH_WORD to the dictionary. *  * 04-Oct-96	M K Ravishankar (rkm@cs.cmu.edu) at Carnegie Mellon University * 		Added unlimit() call to remove malloc restrictions. *   * 26-Sep-96	M K Ravishankar (rkm@cs.cmu.edu) at Carnegie Mellon University * 		Added separate language weight (-bestpathlw) for bestpath DAG search. * 		Added -mllrctlfn flag and handling. *  * 21-Sep-96	M K Ravishankar (rkm@cs.cmu.edu) at Carnegie Mellon University * 		Added -bptblsize argument. *  * 18-Sep-96	M K Ravishankar (rkm@cs.cmu.edu) at Carnegie Mellon University * 		Added optional start/end frame specification in control file, for * 		processing selected segments (utterances) from a large cepfile. * 		Control spec: cepfile [startframe endframe [uttid]]. *  * 13-Sep-96	M K Ravishankar (rkm@cs.cmu.edu) at Carnegie Mellon University * 		Bugfix: added senscale to bestscr before writing best score file. * 		(Otherwise, the scaled scores are meaningless.) * 		Added ,EXACT suffix option to -matchfn argument, and correspondingly * 		added "exact" argument to log_hypstr().  (But running bestpath search * 		will still cause <sil> and filler words to be removed from matchfile.) *  * 12-Sep-96	M K Ravishankar (rkm@cs.cmu.edu) at Carnegie Mellon University * 		Changed structure of gauden/senone computation: * 		    from: foreach (gauden) {eval gauden}; foreach (senone) {senone} * 		    to:   foreach (gauden) {eval gauden;  foreach (senone in gauden) {...}} * 		reducing memory space for results of gauden, specially in block mode. * 		Normalized senone scores (subtracting the best) rather than density scores. * 		Changed active senone list to flags. *  * 09-Sep-96	M K Ravishankar (rkm@cs.cmu.edu) at Carnegie Mellon University * 		Block-mode gauden computation for improving cache performance: changed *		    from: foreach(frame) {foreach(gauden)...} * 		    to:   foreach(gauden) {foreach(frame)...} * 		within a block of frames.  Must evaluate all gauden, not just active ones. * 		But even so the resulting caching performance is better. *  * 29-Aug-96	M K Ravishankar (rkm@cs.cmu.edu) at Carnegie Mellon University * 		Changed argument -inlatext to -latext. * 		Added check to ensure input and output lattice directories are different. * 		Added reporting of hostname. *  * 23-Aug-96	M K Ravishankar (rkm@cs.cmu.edu) at Carnegie Mellon University * 		Changed profiling to use timing_ functions, available on all platforms. * 		Added write_bestscore() for writing best statescore in each frame, * 		for helping determine desirable beamwidth. *  * 24-Jun-96	M K Ravishankar (rkm@cs.cmu.edu) at Carnegie Mellon University * 		Added DAG search option and consolidated logging and reporting. * 		Added backtrace option. *  * 22-Jul-96	M K Ravishankar (rkm@cs.cmu.edu) at Carnegie Mellon University * 		Added absolute (unnormalized) acoustic scores in log file. * 		Added uttid in log file with each word segmentation. * 		Compute only active codebooks and senones if multiple codebooks present. *  * 20-Jan-96	M K Ravishankar (rkm@cs.cmu.edu) at Carnegie Mellon University * 		Added specification of input word lattices to limit search (-inlatdir * 		argument...).  Added computation of active senone and gauden codebook * 		lists when such a lattice is provided, to minimize computation. * 		Added -cmn and -agc flags. *  * 10-Jan-96	M K Ravishankar (rkm@cs.cmu.edu) at Carnegie Mellon University * 		Added call to feat_init(); added cepsize variable and initialization; * 		Changed argument to norm_mean from featlen[0]+1 to cepsize. *  * 13-Dec-95	M K Ravishankar (rkm@cs.cmu.edu) at Carnegie Mellon University * 		Changed call to senone_eval to senone_eval_all optimized for the * 		semi-continuous case. * 		Completed handling multiple mixture-gaussian codebooks. *  * 01-Dec-95	M K Ravishankar (rkm@cs.cmu.edu) at Carnegie Mellon University * 		Created. *//** \file main_decode_anytopo.c * \brief Main driver for sphinx 3.0 decoding (or the slow decoder) */#include <stdio.h>#include <stdlib.h>#include <string.h>#if (! WIN32)#include <unistd.h>#endif#include <assert.h>#include <s3types.h>#include "logs3.h"#include "tmat.h"#include "mdef.h"#include "dict.h"#include "lm.h"#include "fillpen.h"#include "search.h"#include "feat.h"#include "bio.h"#include <wid.h>#include "search.h"#include "cmn.h"#include "agc.h"#include "flat_fwd.h"#include "ms_mllr.h"#include "ms_gauden.h"#include "ms_senone.h"#include "interp.h"#include "s3_dag.h"static gauden_t *g;		/* Gaussian density codebooks */static senone_t *sen;		/* Senones */static interp_t *interp;	/* CD/CI interpolation */static tmat_t *tmat;		/* HMM transition matrices */static feat_t *fcb;           /* Feature type descriptor (Feature Control Block) */static float32 ***feat = NULL;        /* Speech feature data */static mdef_t *mdef;		/* Model definition */extern lm_t* lm;extern dict_t* dict;extern fillpen_t* fpen;extern s3lmwid_t *dict2lmwid;   /* Mapping from decoding dictionary wid's to lm ones.  				   They may not be the same! */static s3wid_t startwid, finishwid, silwid;static int32 *senscale;		/* ALL senone scores scaled by senscale[i] in frame i */static int32 *bestscr;		/* Best statescore in each frame */ptmr_t tmr_utt;ptmr_t tmr_fwdvit;ptmr_t tmr_bstpth;ptmr_t tmr_gausen;ptmr_t tmr_fwdsrch;pctr_t ctr_nfrm;pctr_t ctr_nsen;static int32 tot_nfr;static char *inlatdir;static char *outlatdir;static int32 outlat_onlynodes;static FILE *matchfp, *matchsegfp;static int32 matchexact;/* * Command line arguments. */static arg_t defn[] = {    { "-log3table",      ARG_INT32,      "1",      "Determines whether to use the log3 table or to compute the values at run time."},    { "-logbase",      ARG_FLOAT32,      "1.0001",      "Base in which all log values calculated" },    { "-mdef",       ARG_STRING,      NULL,      "Model definition input file: triphone -> senones/tmat tying" },    { "-tmat",      ARG_STRING,      NULL,      "Transition matrix input file" },    { "-mean",      ARG_STRING,      NULL,      "Mixture gaussian codebooks mean parameters input file" },    { "-var",      ARG_STRING,      NULL,      "Mixture gaussian codebooks variance parameters input file" },    { "-senmgau",      ARG_STRING,      ".cont.",      "Senone to mixture-gaussian mapping file (or .semi. or .cont.)" },    { "-mixw",      ARG_STRING,      NULL,      "Senone mixture weights parameters input file" },    { "-lambda",      ARG_STRING,      NULL,      "Interpolation weights (CD/CI senone) parameters input file" },    { "-tpfloor",      ARG_FLOAT32,      "0.0001",      "Triphone state transition probability floor applied to -tmat file" },    { "-varfloor",      ARG_FLOAT32,      "0.0001",      "Codebook variance floor applied to -var file" },    { "-mwfloor",      ARG_FLOAT32,      "0.0000001",      "Codebook mixture weight floor applied to -mixw file" },    { "-agc",      ARG_STRING,      "max",      "AGC.  max: C0 -= max(C0) in current utt; none: no AGC" },    { "-cmn",      ARG_STRING,      "current",      "Cepstral mean norm.  current: C[1..n-1] -= mean(C[1..n-1]) in current utt; none: no CMN" },    { "-varnorm",      ARG_STRING,      "no",      "Cepstral var norm. yes: C[0..n-1] /= stddev(C[0..n-1]); no = no norm */"},    { "-feat",      ARG_STRING,      "s2_4x",      "Feature stream:\n\t\t\t\ts2_4x: Sphinx-II type 4 streams, 12cep, 24dcep, 3pow, 12ddcep\n\t\t\t\ts3_1x39: Single stream, 12cep+12dcep+3pow+12ddcep\n\t\t\t\t1s_12c_12d_3p_12dd: Single stream, 12cep+12dcep+3pow+12ddcep\n\t\t\t\t1s_c: Single stream, given input vector only\n\t\t\t\t1s_c_d: Feature + Deltas only\n\t\t\t\t1s_c_dd: Feature + Double deltas only\n\t\t\t\t1s_c_d_dd: Feature + Deltas + Double deltas\n\t\t\t\t1s_c_wd_dd: Feature cep+windowed delcep+deldel \n\t\t\t1s_c_d_ld_dd: Feature + delta + longter delta + doubledelta" },/* ADDED BY BHIKSHA: 6 JAN 98 */    { "-lminmemory",      ARG_INT32,      "0",      "Load language model into memory (default: use disk cache for lm"},    { "-ceplen",      ARG_INT32,      "13",      "Length of input feature vector" },    { "-dict",      ARG_STRING,      NULL,      "Main pronunciation dictionary (lexicon) input file" },    { "-fdict",      ARG_STRING,      NULL,      "Silence and filler (noise) word pronunciation dictionary input file" },    { "-lm",      ARG_STRING,      NULL,      "Language model input file (precompiled .DMP file)" },    { "-lw",      ARG_FLOAT32,      "9.5",      "Language weight: empirical exponent applied to LM probabilty" },    { "-ugwt",      ARG_FLOAT32,      "0.7",      "LM unigram weight: unigram probs interpolated with uniform distribution with this weight" },

⌨️ 快捷键说明

复制代码Ctrl + C
搜索代码Ctrl + F
全屏模式F11
增大字号Ctrl + =
减小字号Ctrl + -
显示快捷键?