make-lm-subset.gawk

来自「这是一款很好用的工具包」· GAWK 代码 · 共 33 行

GAWK
33
字号
#!/usr/local/bin/gawk -f## filter a backoff model with a count file, so that only ngrams# in the countfile are represented in the output## usage: make-lm-subset count-file bo-file## $Header: /home/srilm/devel/utils/src/RCS/make-lm-subset,v 1.3 1999/10/17 06:10:10 stolcke Exp $#ARGIND==1 {	ngram = $0;	sub("[ 	]*[0-9]*$", "", ngram);	count[ngram] = 1;	next;}ARGIND==2 && /^$/ {	print; next;}ARGIND==2 && /^\\/ {	print; next;}ARGIND==2 && /^ngram / {	print; next;}ARGIND==2 {	ngram = $0;	# strip numeric stuff	sub("^[-.e0-9]*[ 	]*", "", ngram);	sub("[ 	]*[-.e0-9]*$", "", ngram);	if (count[ngram]) print;	next;}

⌨️ 快捷键说明

复制代码Ctrl + C
搜索代码Ctrl + F
全屏模式F11
增大字号Ctrl + =
减小字号Ctrl + -
显示快捷键?