main_decode_anytopo.c
来自「CMU大名鼎鼎的SPHINX-3大词汇量连续语音识别系统」· C语言 代码 · 共 1,466 行 · 第 1/4 页
C
1,466 行
/* ==================================================================== * Copyright (c) 1995-2004 Carnegie Mellon University. All rights * reserved. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * * 1. Redistributions of source code must retain the above copyright * notice, this list of conditions and the following disclaimer. * * 2. Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in * the documentation and/or other materials provided with the * distribution. * * This work was supported in part by funding from the Defense Advanced * Research Projects Agency and the National Science Foundation of the * United States of America, and the CMU Sphinx Speech Consortium. * * THIS SOFTWARE IS PROVIDED BY CARNEGIE MELLON UNIVERSITY ``AS IS'' AND * ANY EXPRESSED OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, * THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR * PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL CARNEGIE MELLON UNIVERSITY * NOR ITS EMPLOYEES BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. * * ==================================================================== * *//* * main.c -- Main S3 decoder driver. * * ********************************************** * CMU ARPA Speech Project * * Copyright (c) 1996 Carnegie Mellon University. * ALL RIGHTS RESERVED. * ********************************************** * * HISTORY * * 26-Jul-04 ARCHAN (archan@cs.cmu.edu) at Carnegie Mellon Unversity * First incorporated from sphinx 3.0 code base to 3.X codebase. * * $Log: main_decode_anytopo.c,v $ * Revision 1.7 2004/12/23 21:05:22 arthchan2003 * Enable compilation of decode_anytopo, change option names from -match to -hyp, it makes the code more consistent. * * Revision 1.6 2004/12/14 00:50:33 arthchan2003 * 1, Change the code to accept extension, 2, add timer to livepretend, 3, fixing the s3_astar to separate the bypass variable to bypass and is_filler_bypass. 4, Add some doxygen comments. 5, Don't care about changes in main_decode_anytopo.c. It is still under work, 6, remove option -help and -example from 3.5 releases. * * Revision 1.5 2004/12/06 11:31:47 arthchan2003 * Fix brief comments for programs. * * Revision 1.4 2004/12/06 11:15:11 arthchan2003 * Enable doxygen in the program directory. * * Revision 1.3 2004/12/05 12:01:32 arthchan2003 * 1, move libutil/libutil.h to s3types.h, seems to me not very nice to have it in every files. 2, Remove warning messages of main_align.c 3, Remove warning messages in chgCase.c * * Revision 1.2 2004/11/16 05:13:19 arthchan2003 * 1, s3cipid_t is upgraded to int16 because we need that, I already check that there are no magic code using 8-bit s3cipid_t * 2, Refactor the ep code and put a lot of stuffs into fe.c (should be renamed to something else. * 3, Check-in codes of wave2feat and cepview. (cepview will not dump core but Evandro will kill me) * 4, Make the same command line frontends for decode, align, dag, astar, allphone, decode_anytopo and ep . Allow the use a file to configure the application. * 5, Make changes in test such that test-allphone becomes a repeatability test. * 6, cepview, wave2feat and decode_anytopo will not be installed in 3.5 RCIII * (Known bugs after this commit) * 1, decode_anytopo has strange bugs in some situations that it cannot find the end of the lattice. This is urgent. * 2, default argument file's mechanism is not yet supported, we need to fix it. * 3, the bug discovered by SonicFoundry is still not fixed. * * Revision 1.1 2004/11/14 07:00:08 arthchan2003 * 1, Finally, a version of working flat decoder is completed. It is not compiled in the standard compilation yet because there are two many warnings. 2, eliminate the statics variables in fe_sigproc.c * * Revision 1.2 2002/12/03 23:02:40 egouvea * Updated slow decoder with current working version. * Added copyright notice to Makefiles, *.c and *.h files. * Updated some of the documentation. * * Revision 1.1.1.1 2002/12/03 20:20:46 robust * Import of s3decode. * * * 08-Sep-97 M K Ravishankar (rkm@cs.cmu.edu) at Carnegie Mellon University * Added .Z compression option to lattice files. * * 06-Mar-97 M K Ravishankar (rkm@cs.cmu.edu) at Carnegie Mellon University. * Added .semi. and .cont. options to -senmgaufn flag. * * 02-Dec-96 M K Ravishankar (rkm@cs.cmu.edu) at Carnegie Mellon University * Restricted MLLR transformation to CD mixture densities only. * * 15-Nov-96 M K Ravishankar (rkm@cs.cmu.edu) at Carnegie Mellon University * Changed the meaning of -matchsegfn and, correspondingly, log_hypseg(). * * 11-Nov-96 M K Ravishankar (rkm@cs.cmu.edu) at Carnegie Mellon University * Added -min_endfr and -dagfudge arguments. * * 08-Nov-96 M K Ravishankar (rkm@cs.cmu.edu) at Carnegie Mellon University * Added BSTXCT: reporting since that became available from dag_search. * * 07-Nov-96 M K Ravishankar (rkm@cs.cmu.edu) at Carnegie Mellon University * Added ,NODES suffix to -outlatdir argument for dumping only words to * lattice output files. * * 16-Oct-96 M K Ravishankar (rkm@cs.cmu.edu) at Carnegie Mellon University * Added orig_stdout, orig_stderr hack to avoid hanging on exit under Linux. * * 11-Oct-96 M K Ravishankar (rkm@cs.cmu.edu) at Carnegie Mellon University * Added fillpen_init() and removed explicit addition of SILENCE_WORD, * START_WORD and FINISH_WORD to the dictionary. * * 04-Oct-96 M K Ravishankar (rkm@cs.cmu.edu) at Carnegie Mellon University * Added unlimit() call to remove malloc restrictions. * * 26-Sep-96 M K Ravishankar (rkm@cs.cmu.edu) at Carnegie Mellon University * Added separate language weight (-bestpathlw) for bestpath DAG search. * Added -mllrctlfn flag and handling. * * 21-Sep-96 M K Ravishankar (rkm@cs.cmu.edu) at Carnegie Mellon University * Added -bptblsize argument. * * 18-Sep-96 M K Ravishankar (rkm@cs.cmu.edu) at Carnegie Mellon University * Added optional start/end frame specification in control file, for * processing selected segments (utterances) from a large cepfile. * Control spec: cepfile [startframe endframe [uttid]]. * * 13-Sep-96 M K Ravishankar (rkm@cs.cmu.edu) at Carnegie Mellon University * Bugfix: added senscale to bestscr before writing best score file. * (Otherwise, the scaled scores are meaningless.) * Added ,EXACT suffix option to -matchfn argument, and correspondingly * added "exact" argument to log_hypstr(). (But running bestpath search * will still cause <sil> and filler words to be removed from matchfile.) * * 12-Sep-96 M K Ravishankar (rkm@cs.cmu.edu) at Carnegie Mellon University * Changed structure of gauden/senone computation: * from: foreach (gauden) {eval gauden}; foreach (senone) {senone} * to: foreach (gauden) {eval gauden; foreach (senone in gauden) {...}} * reducing memory space for results of gauden, specially in block mode. * Normalized senone scores (subtracting the best) rather than density scores. * Changed active senone list to flags. * * 09-Sep-96 M K Ravishankar (rkm@cs.cmu.edu) at Carnegie Mellon University * Block-mode gauden computation for improving cache performance: changed * from: foreach(frame) {foreach(gauden)...} * to: foreach(gauden) {foreach(frame)...} * within a block of frames. Must evaluate all gauden, not just active ones. * But even so the resulting caching performance is better. * * 29-Aug-96 M K Ravishankar (rkm@cs.cmu.edu) at Carnegie Mellon University * Changed argument -inlatext to -latext. * Added check to ensure input and output lattice directories are different. * Added reporting of hostname. * * 23-Aug-96 M K Ravishankar (rkm@cs.cmu.edu) at Carnegie Mellon University * Changed profiling to use timing_ functions, available on all platforms. * Added write_bestscore() for writing best statescore in each frame, * for helping determine desirable beamwidth. * * 24-Jun-96 M K Ravishankar (rkm@cs.cmu.edu) at Carnegie Mellon University * Added DAG search option and consolidated logging and reporting. * Added backtrace option. * * 22-Jul-96 M K Ravishankar (rkm@cs.cmu.edu) at Carnegie Mellon University * Added absolute (unnormalized) acoustic scores in log file. * Added uttid in log file with each word segmentation. * Compute only active codebooks and senones if multiple codebooks present. * * 20-Jan-96 M K Ravishankar (rkm@cs.cmu.edu) at Carnegie Mellon University * Added specification of input word lattices to limit search (-inlatdir * argument...). Added computation of active senone and gauden codebook * lists when such a lattice is provided, to minimize computation. * Added -cmn and -agc flags. * * 10-Jan-96 M K Ravishankar (rkm@cs.cmu.edu) at Carnegie Mellon University * Added call to feat_init(); added cepsize variable and initialization; * Changed argument to norm_mean from featlen[0]+1 to cepsize. * * 13-Dec-95 M K Ravishankar (rkm@cs.cmu.edu) at Carnegie Mellon University * Changed call to senone_eval to senone_eval_all optimized for the * semi-continuous case. * Completed handling multiple mixture-gaussian codebooks. * * 01-Dec-95 M K Ravishankar (rkm@cs.cmu.edu) at Carnegie Mellon University * Created. *//** \file main_decode_anytopo.c * \brief Main driver for sphinx 3.0 decoding (or the slow decoder) */#include <stdio.h>#include <stdlib.h>#include <string.h>#if (! WIN32)#include <unistd.h>#endif#include <assert.h>#include <s3types.h>#include "logs3.h"#include "tmat.h"#include "mdef.h"#include "dict.h"#include "lm.h"#include "fillpen.h"#include "search.h"#include "feat.h"#include "bio.h"#include <wid.h>#include "search.h"#include "cmn.h"#include "agc.h"#include "flat_fwd.h"#include "ms_mllr.h"#include "ms_gauden.h"#include "ms_senone.h"#include "interp.h"#include "s3_dag.h"static gauden_t *g; /* Gaussian density codebooks */static senone_t *sen; /* Senones */static interp_t *interp; /* CD/CI interpolation */static tmat_t *tmat; /* HMM transition matrices */static feat_t *fcb; /* Feature type descriptor (Feature Control Block) */static float32 ***feat = NULL; /* Speech feature data */static mdef_t *mdef; /* Model definition */extern lm_t* lm;extern dict_t* dict;extern fillpen_t* fpen;extern s3lmwid_t *dict2lmwid; /* Mapping from decoding dictionary wid's to lm ones. They may not be the same! */static s3wid_t startwid, finishwid, silwid;static int32 *senscale; /* ALL senone scores scaled by senscale[i] in frame i */static int32 *bestscr; /* Best statescore in each frame */ptmr_t tmr_utt;ptmr_t tmr_fwdvit;ptmr_t tmr_bstpth;ptmr_t tmr_gausen;ptmr_t tmr_fwdsrch;pctr_t ctr_nfrm;pctr_t ctr_nsen;static int32 tot_nfr;static char *inlatdir;static char *outlatdir;static int32 outlat_onlynodes;static FILE *matchfp, *matchsegfp;static int32 matchexact;/* * Command line arguments. */static arg_t defn[] = { { "-log3table", ARG_INT32, "1", "Determines whether to use the log3 table or to compute the values at run time."}, { "-logbase", ARG_FLOAT32, "1.0001", "Base in which all log values calculated" }, { "-mdef", ARG_STRING, NULL, "Model definition input file: triphone -> senones/tmat tying" }, { "-tmat", ARG_STRING, NULL, "Transition matrix input file" }, { "-mean", ARG_STRING, NULL, "Mixture gaussian codebooks mean parameters input file" }, { "-var", ARG_STRING, NULL, "Mixture gaussian codebooks variance parameters input file" }, { "-senmgau", ARG_STRING, ".cont.", "Senone to mixture-gaussian mapping file (or .semi. or .cont.)" }, { "-mixw", ARG_STRING, NULL, "Senone mixture weights parameters input file" }, { "-lambda", ARG_STRING, NULL, "Interpolation weights (CD/CI senone) parameters input file" }, { "-tpfloor", ARG_FLOAT32, "0.0001", "Triphone state transition probability floor applied to -tmat file" }, { "-varfloor", ARG_FLOAT32, "0.0001", "Codebook variance floor applied to -var file" }, { "-mwfloor", ARG_FLOAT32, "0.0000001", "Codebook mixture weight floor applied to -mixw file" }, { "-agc", ARG_STRING, "max", "AGC. max: C0 -= max(C0) in current utt; none: no AGC" }, { "-cmn", ARG_STRING, "current", "Cepstral mean norm. current: C[1..n-1] -= mean(C[1..n-1]) in current utt; none: no CMN" }, { "-varnorm", ARG_STRING, "no", "Cepstral var norm. yes: C[0..n-1] /= stddev(C[0..n-1]); no = no norm */"}, { "-feat", ARG_STRING, "s2_4x", "Feature stream:\n\t\t\t\ts2_4x: Sphinx-II type 4 streams, 12cep, 24dcep, 3pow, 12ddcep\n\t\t\t\ts3_1x39: Single stream, 12cep+12dcep+3pow+12ddcep\n\t\t\t\t1s_12c_12d_3p_12dd: Single stream, 12cep+12dcep+3pow+12ddcep\n\t\t\t\t1s_c: Single stream, given input vector only\n\t\t\t\t1s_c_d: Feature + Deltas only\n\t\t\t\t1s_c_dd: Feature + Double deltas only\n\t\t\t\t1s_c_d_dd: Feature + Deltas + Double deltas\n\t\t\t\t1s_c_wd_dd: Feature cep+windowed delcep+deldel \n\t\t\t1s_c_d_ld_dd: Feature + delta + longter delta + doubledelta" },/* ADDED BY BHIKSHA: 6 JAN 98 */ { "-lminmemory", ARG_INT32, "0", "Load language model into memory (default: use disk cache for lm"}, { "-ceplen", ARG_INT32, "13", "Length of input feature vector" }, { "-dict", ARG_STRING, NULL, "Main pronunciation dictionary (lexicon) input file" }, { "-fdict", ARG_STRING, NULL, "Silence and filler (noise) word pronunciation dictionary input file" }, { "-lm", ARG_STRING, NULL, "Language model input file (precompiled .DMP file)" }, { "-lw", ARG_FLOAT32, "9.5", "Language weight: empirical exponent applied to LM probabilty" }, { "-ugwt", ARG_FLOAT32, "0.7", "LM unigram weight: unigram probs interpolated with uniform distribution with this weight" },
⌨️ 快捷键说明
复制代码Ctrl + C
搜索代码Ctrl + F
全屏模式F11
增大字号Ctrl + =
减小字号Ctrl + -
显示快捷键?