htklat-vocab.gawk

来自「这是一款很好用的工具包」· GAWK 代码 · 共 51 行

GAWK
51
字号
#!/usr/local/bin/gawk -f## htklat-vocab --#	extract vocabulary used in an HTK lattice## usage: htklat-vocab HTK-LATTICE ... > VOCAB## $Header: /home/srilm/devel/utils/src/RCS/htklat-vocab.gawk,v 1.3 2004/02/27 21:42:28 stolcke Exp $#BEGIN {	null = "!NULL";	quotes = 0;}{	for (i = 1; i <= NF; i ++) {		# skip comments		if ($i ~ /^#/) next;		# Note: this doesn't handle quoted spaces		# (as SRILM generally doesn't)		if ($i ~ /^W=/ || $i ~ /^WORD=/) {		    word = substr($i, index($i, "=") + 1);		    if (quotes) {			# HTK quoting conventions			if (word ~ /^['"]/) {			    word = substr(word, 2, length(word)-2);			}			if (word ~ /\\/) {			    gsub(/\\\\/, "@QuOtE@", word);			    gsub(/\\/, "", word);			    gsub(/@QuOtE@/, "\\", word);			}		    }		    if (word != null) {			is_word[word] = 1;		    }		}	}}END {	for (word in is_word) {		print word;	}}

⌨️ 快捷键说明

复制代码Ctrl + C
搜索代码Ctrl + F
全屏模式F11
增大字号Ctrl + =
减小字号Ctrl + -
显示快捷键?