📄 factoring_sub.c
字号:
/** * @file factoring_sub.c * * <JA> * @brief 咐胳スコアのfactoring纷换∈妈1パス∷ * * このファイルには·妈1パスにおいて咐胳スコアの factoring を乖うための * 簇眶が崔まれています. 腾菇陇步辑今惧でのサブツリ〖柒の帽胳リスト * (successor list) の菇蜜·および千急面の咐胳スコア纷换ル〖チンが * 崔まれます. * * successor list は·腾菇陇步辑今の称ノ〖ドに充り烧けられる· * そのノ〖ドを鼎铜する帽胳のリストです. 腾菇陇步辑今において· * 晦婶尸の肌のノ〖ドがこのリストを瘦积します. 悸狠にはリストが恃步する * 眷疥·すなわち腾菇陇步辑今の晦の尸呆爬に充り烧けられます. * 毋えば·笆布のような腾菇陇步辑今の眷圭·眶机の今いてあるノ〖ドに * successor list が充り烧けられます. * <pre> * * 2-o-o - o-o-o - o-o-o word "A" * / * 1-o-o * \ 4-o-o word "B" * \ / * 3-o-o - 5-o-o - 7-o-o word "C" * \ \ * \ 8-o-o word "D" * 6-o-o word "E" * </pre> * * 称 successor list はそのサブツリ〖に崔まれる帽胳のリストです. * この毋では笆布のようになります. * * <pre> * node | successor list (wchmm->state[node].sc) * ======================= * 1 | A B C D E * 2 | A * 3 | B C D E * 4 | B * 5 | C D * 6 | E * 7 | C * 8 | D * </pre> * * ある successor list に崔まれる帽胳が1つになったとき·その箕爬で * 帽胳が澄年する. 惧淡の眷圭·帽胳 "A" はノ〖ド 2 の疤弥ですでに * その稿鲁帽胳として "A" 笆嘲痰いので·そこで澄年する. * すなわち·帽胳 A の赖澄な咐胳スコアは·帽胳姜眉を略たずノ〖ド 2 で疯まる. * * 妈1パスにおける factoring の纷换は·悸狠には beam.c で乖なわれる. * 2-gram factoringの眷圭·肌ノ〖ドに successor list が赂哼すれば, * その successor list の帽胳の 2-gram の呵络猛を滇め, 帕嚷してきている * factoring 猛を构糠する. successor list に帽胳が1つのノ〖ドでは· * 赖しい2-gramが极瓢弄に充り碰てられる. * 1-gram factoringの眷圭·肌ノ〖ドに successor list が赂哼する眷圭· * その successor list の帽胳の 1-gram の呵络猛を滇め·帕嚷してきている * factoring 猛を构糠する. successor list に帽胳が1つのノ〖ドで·はじめて * 2-gram を纷换する. * * 悸狠では 1-gram factoring では称 successor list における factoring 猛 * は帽胳旺悟に润巴赂なので·successor list 菇蜜箕に链てあらかじめ纷换して * おく. すなわち·エンジン弹瓢箕に腾菇陇步辑今を菇蜜稿·successor list * を菇蜜したら·帽胳を2改笆惧崔む successor list についてはその 1-gram の * 呵络猛を纷换して·それをそのノ〖ドの fscore メンバに呈羌しておき·その * successor list は free してしまえばよい. 帽胳が1つのみの successor list * についてはその帽胳IDを荒しておき·玫瑚箕にパスがそこに毗茫したら * 赖澄な2-gramを纷换すれば紊い. * * DFA矢恕蝗脱箕は·デフォルトでは咐胳扩腆(カテゴリ滦扩腆)を * カテゴリ帽疤で腾を菇蜜することで琅弄に山附する. このため· * これらの factoring 怠菇は脱いられない. ただし· * CATEGORY_TREE が undefined であれば·疯年弄 factoring を脱いた咐胳扩腆 * 努脱を乖うことも材墙である. * すなわち·肌ノ〖ドに successor list が赂哼すれば, * その successor list 柒の称帽胳と木涟帽胳の帽胳滦扩腆を拇べ, * そのうち办つでも儡鲁材墙な帽胳があれば·その莲败を钓し·办つも * なければ莲败させない. この怠墙は祷窖徊雇のために荒されているのみである. * </JA> * * <EN> * @brief LM factoring on 1st pass. * </EN> * * This file contains functions to do language score factoring on the 1st * pass. They build a successor lists which holds the successive words in * each sub tree on the tree lexicon, and also provide a factored LM * probability on each nodes on the tree lexicon. * * The "successor list" will be assigned for each lexicon tree node to * represent a list of words that exist in the sub-tree and share the node. * Actually they will be assigned to the branch node. * Below is the example of successor lists on a tree lexicon, in which * the lists is assigned to the numbered nodes. * * <pre> * 2-o-o - o-o-o - o-o-o word "A" * / * 1-o-o * \ 4-o-o word "B" * \ / * 3-o-o - 5-o-o - 7-o-o word "C" * \ \ * \ 8-o-o word "D" * 6-o-o word "E" * </pre> * * The contents of the successor lists are the following: * * <pre> * node | successor list (wchmm->state[node].sc) * ======================= * 1 | A B C D E * 2 | A * 3 | B C D E * 4 | B * 5 | C D * 6 | E * 7 | C * 8 | D * </pre> * * When the 1st pass proceeds, if the next going node has a successor list, * all the word 2-gram scores in the successor list on the next node * will be computed, and the propagating LM value in the token on * the current node will be replaced by the maximum value of the scores * when copied to the next node. Appearently, if the successor list has * only one word, it means that the word can be determined on that point, * and the precise 2-gram value will be assigned as is. * * When using 1-gram factoring, the computation will be slightly different. * Since the factoring value (maximum value of 1-gram scores on each successor * list) is independent of the word context, they can be computed statically * before the search. Thus, for all the successor lists that have more than * two words, the maximum 1-gram value is computed and stored to * "fscore" member in tree lexicon, and the successor lists will be freed. * The successor lists with only one word should still remain in the * tree lexicon, to compute the precise 2-gram scores for the words. * * * When using DFA grammar, Julian builds separated lexicon trees for every * word categories, to statically express the catergory-pair constraint. * Thus these factoring scheme is not used by default. * However you can still force Julian to use the grammar-based * deterministic factoring scheme by undefining CATEGORY_TREE. * If CATEGORY_TREE is undefined, the word connection constraint will be * performed based on the successor list at the middle of tree lexicon. * This enables single tree search on Julian. This function is left * only for technical reference. * * @author Akinobu LEE * @date Mon Mar 7 23:20:26 2005 * * $Revision: 1.3 $ * *//* * Copyright (c) 1991-2007 Kawahara Lab., Kyoto University * Copyright (c) 2000-2005 Shikano Lab., Nara Institute of Science and Technology * Copyright (c) 2005-2007 Julius project team, Nagoya Institute of Technology * All rights reserved */#include <julius/julius.h>/*----------------------------------------------------------------------*//** * <JA> * @brief 腾菇陇步辑今惧のあるノ〖ドの successor list に帽胳を纳裁する. * * すでに票じ帽胳が判峡されていれば·糠たに判峡はされない. * 帽胳はIDで竞界に瘦赂される. * * @param wchmm [i/o] 腾菇陇步辑今 * @param node [in] ノ〖ド戎规 * @param w [in] 帽胳ID * </JA> * <EN> * @brief Add a word to the successor list on a node in tree lexicon. * Words in lists should be ordered by ID. * * @param wchmm [i/o] tree lexicon * @param node [in] node id * @param w [in] word id * </EN> */static voidadd_successor(WCHMM_INFO *wchmm, int node, WORD_ID w){ S_CELL *sctmp, *sc; /* malloc a new successor list element */ sctmp=(S_CELL *) mymalloc(sizeof(S_CELL)); /* assign word ID to the new element */ sctmp->word = w; /* add the new element to existing list (keeping order) */ if (wchmm->state[node].scid == 0) { j_internal_error("add_successor: sclist id not assigned to branch node?\n"); } sc = wchmm->sclist[wchmm->state[node].scid]; if (sc == NULL || sctmp->word < sc->word) { sctmp->next = sc; wchmm->sclist[wchmm->state[node].scid] = sctmp; } else { for(;sc;sc=sc->next) { if (sc->next == NULL || sctmp->word < (sc->next)->word) { if (sctmp->word == sc->word) break; /* avoid duplication */ sctmp->next = sc->next; sc->next = sctmp; break; } } }}/** * <JA> * 2つのノ〖ド惧の successor list が办米するかどうかチェックする * * @param wchmm [in] 腾菇陇步辑今 * @param node1 [in] 1つめのノ〖ドID * @param node2 [in] 2つめのノ〖ドID * * @return 窗链に办米すれば TRUE·办米しなければ FALSE. * </JA> * <EN> * Check if successor lists on two nodes are the same. * * @param wchmm [in] tree lexicon * @param node1 [in] 1st node id * @param node2 [in] 2nd node id * * @return TRUE if they have the same successor list, or FALSE if they differ. * </EN> */static booleanmatch_successor(WCHMM_INFO *wchmm, int node1, int node2){ S_CELL *sc1,*sc2; /* assume successor is sorted by ID */ if (wchmm->state[node1].scid == 0 || wchmm->state[node2].scid == 0) { j_internal_error("match_successor: sclist id not assigned to branch node?\n"); } sc1 = wchmm->sclist[wchmm->state[node1].scid]; sc2 = wchmm->sclist[wchmm->state[node2].scid]; for (;;) { if (sc1 == NULL || sc2 == NULL) { if (sc1 == NULL && sc2 == NULL) { return TRUE; } else { return FALSE; } } else if (sc1->word != sc2->word) { return FALSE; } sc1 = sc1->next; sc2 = sc2->next; }}/** * <JA> * 回年ノ〖ド惧の successor list を鄂にする. * * @param wchmm [i/o] 腾菇陇步辑今 * @param scid [in] node id * </JA> * <EN> * Free successor list at the node * * @param wchmm [i/o] tree lexicon * @param scid [in] node id * </EN> */static voidfree_successor(WCHMM_INFO *wchmm, int scid){ S_CELL *sc; S_CELL *sctmp; /* free sclist */ sc = wchmm->sclist[scid]; while (sc != NULL) { sctmp = sc; sc = sc->next; free(sctmp); }}/** * <JA> * 腾菇陇步辑今惧からリンクが久された successor list について· * その悸挛を猴近してリストを低めるガ〖ベ〖ジコレクションを乖う. * * @param wchmm [i/o] 腾菇陇步辑今 * </JA> * <EN> * Garbage collection of the successor list, by deleting successor lists * to which the link was deleted on the lexicon tree. * * @param wchmm [i/o] tree lexiton * </EN> */static voidcompaction_successor(WCHMM_INFO *wchmm){ int src, dst; dst = 1; for(src=1;src<wchmm->scnum;src++) { if (wchmm->state[wchmm->sclist2node[src]].scid <= 0) { /* already freed, skip */ continue; } if (dst != src) { wchmm->sclist[dst] = wchmm->sclist[src]; wchmm->sclist2node[dst] = wchmm->sclist2node[src]; wchmm->state[wchmm->sclist2node[dst]].scid = dst; } dst++; } if (debug2_flag) { jlog("DEBUG: successor list shrinked from %d to %d\n", wchmm->scnum, dst); } wchmm->scnum = dst;}/** * <JA> * successor list 脱に充り烧けられたメモリ挝拌を铜跟な墓さに教める. * 介袋菇蜜箕や·1-gram factoring のために猴近された successor list 尸の * メモリを豺庶する. * * @param wchmm [i/o] 腾菇陇步辑今 * </JA> * <EN> * Shrink the memory area that has been allocated for building successor list. * * @param wchmm [i/o] tree lexicon * </EN> */static voidshrink_successor(WCHMM_INFO *wchmm){ if (wchmm->sclist) { wchmm->sclist = (S_CELL **)myrealloc(wchmm->sclist, sizeof(S_CELL *) * wchmm->scnum); } if (wchmm->sclist2node) { wchmm->sclist2node = (int *)myrealloc(wchmm->sclist2node, sizeof(int) * wchmm->scnum); }}/** * <JA> * 腾菇陇步辑今惧の链ノ〖ドに successor list を菇蜜するメイン簇眶 * * @param wchmm [i/o] 腾菇陇步辑今 * </JA> * <EN> * Main function to build whole successor list to lexicon tree. * * @param wchmm [i/o] tree lexicon * </EN> * * @callgraph * @callergraph * */voidmake_successor_list(WCHMM_INFO *wchmm){ int node; WORD_ID w; int i; boolean *freemark; int s; jlog("STAT: make successor lists for factoring\n"); /* 1. initialize */ /* initialize node->sclist index on wchmm tree */ for (node=0;node<wchmm->n;node++) wchmm->state[node].scid = 0; /* parse the tree to get the maximum size of successor list */ s = 1; for (w=0;w<wchmm->winfo->num;w++) { for (i=0;i<wchmm->winfo->wlen[w];i++) { if (wchmm->state[wchmm->offset[w][i]].scid == 0) { wchmm->state[wchmm->offset[w][i]].scid = s; s++; } } if (wchmm->state[wchmm->wordend[w]].scid == 0) { wchmm->state[wchmm->wordend[w]].scid = s; s++; } } wchmm->scnum = s; if (debug2_flag) { jlog("DEBUG: initial successor list size = %d\n", wchmm->scnum); } /* allocate successor list for the maximum size */ wchmm->sclist = (S_CELL **)mymalloc(sizeof(S_CELL *) * wchmm->scnum); for (i=1;i<wchmm->scnum;i++) wchmm->sclist[i] = NULL; wchmm->sclist2node = (int *)mymalloc(sizeof(int) * wchmm->scnum); /* allocate misc. work area */ freemark = (boolean *)mymalloc(sizeof(boolean) * wchmm->scnum); for (i=1;i<wchmm->scnum;i++) freemark[i] = FALSE; /* 2. make initial successor list: assign at all possible nodes */ for (w=0;w<wchmm->winfo->num;w++) { /* at each start node of phonemes */ for (i=0;i<wchmm->winfo->wlen[w];i++) { wchmm->sclist2node[wchmm->state[wchmm->offset[w][i]].scid] = wchmm->offset[w][i]; add_successor(wchmm, wchmm->offset[w][i], w); } /* at word end */ wchmm->sclist2node[wchmm->state[wchmm->wordend[w]].scid] = wchmm->wordend[w]; add_successor(wchmm, wchmm->wordend[w], w); } /* 3. erase unnecessary successor list */ /* sucessor list same as the previous node is not needed, so */ /* parse lexicon tree from every leaf to find the same succesor list */ for (w=0;w<wchmm->winfo->num;w++) { node = wchmm->wordend[w]; /* begin from the word end node */ i = wchmm->winfo->wlen[w]-1; while (i >= 0) { /* for each phoneme start node */ if (node == wchmm->offset[w][i]) { /* word with only 1 state: skip */ i--; continue; } if (match_successor(wchmm, node, wchmm->offset[w][i])) { freemark[wchmm->state[node].scid] = TRUE; /* mark the node */ }/* * if (freemark[wchmm->offset[w][i]] != FALSE) { * break; * } */ node = wchmm->offset[w][i]; i--; } } /* really free */
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -