首页 › 资源下载 › Java编程 › It is the Speech rec › 源码查看

makelm

来自「It is the Speech recognition software. 」· 代码 · 共 76 行

TXT

76 行

## Generates the language model for a 5k general business vocabulary## uses:#       wsj5k.txt         - transcript of wsj#       5k_words.vocab - hand prepared vocabulary list## generates:#       wsj5k.lm - arpa format of the language model#       wsj5k.DMP - CMU binary format of the language model##       # requires:#       CMU language model toolkit:#               http://www.speech.cs.cmu.edu/SLM_info.html#       lm3g2dmp - utility to generate DMP format models:#           http://cmusphinx.sourceforge.net/webpage/html/download.php#utilities## unix commands:#       gawk mv rmdir rm tr## All commands should be in your path## Convert transcript to all lower casetr "[A-Z]" "[a-z]" < wsj5k.txt > wsj5k.lc.tmp# normalize the tagsgawk -f prep.awk < wsj5k.lc.tmp > wsj5k.prep.tmp## Generate the word frequencies#text2wfreq < wsj5k.prep.tmp > wsj5k.wfreq.tmp## Generate the vocabulary (this should be a subset wsj5k.vocab)#wfreq2vocab  -top 5000 < wsj5k.wfreq.tmp > wsj5k.vocab.tmp## Generate the idngram#text2idngram -vocab $file.tmp.vocab < $file  > $file.tmp.idngram# uses the custom word listtext2idngram -vocab 5k_words.vocab < wsj5k.prep.tmp  > wsj5k.idngram.tmp## generates the language model#idngram2lm -vocab_type 0 -idngram $file.tmp.idngram -vocab $file.tmp.vocab -arpa $file.arpa#idngram2lm  -idngram $file.tmp.idngram -vocab $file.tmp.vocab -arpa $file.arpa#idngram2lm  -witten_bell -idngram $file.tmp.idngram -vocab $file.tmp.vocab -arpa $file.arpa#idngram2lm  -two_byte_alphas -idngram $file.tmp.idngram -vocab $file.tmp.vocab -arpa $file.arpaidngram2lm  -vocab_type 0 -idngram wsj5k.idngram.tmp -vocab 5k_words.vocab -arpa wsj5k.lm## generate the DMP version of the language model#mkdir dmplm3g2dmp wsj5k.lm dmpmv dmp/wsj5k.lm.DMP wsj5k.DMP## cleanup#rmdir dmprm *.tmp

makelm - 源码说明

本页面展示了「It is the Speech recognition software. It is platform independent. To execute the source code,」中的 makelm 源码文件，采用编程语言编写，共 76 行代码。您可以在线阅读完整代码内容，也可以返回资源详情页下载完整源码包进行本地学习和开发。

虫虫下载站收录了大量与independent相关的技术资源，包括源代码、技术文档、电路图等，是电子工程师和嵌入式开发者的专业学习平台。

⌨️ 快捷键说明

复制代码Ctrl + C

搜索代码Ctrl + F

全屏模式F11

增大字号Ctrl + =

减小字号Ctrl + -

显示快捷键?