continuous-ngram-count.gawk

来自「这是一款很好用的工具包」· GAWK 代码 · 共 36 行

GAWK
36
字号
#!/usr/local/bin/gawk -f## continuous-ngram-count --#	Generate ngram counts ignoring line breaks #	# usage: continous-ngram-count order=ORDER textfile | ngram-count -read -## $Header: /home/srilm/devel/utils/src/RCS/continuous-ngram-count,v 1.1 1998/08/24 00:52:30 stolcke Exp $#BEGIN {	order = 3;	head = 0;	# next position in ring buffer}function process_word(w) {	buffer[head] = w;	ngram = "";	for (j = 0; j < order; j ++) {		w1 = buffer[(head + order - j) % order];		if (w1 == "") {			break;		}		ngram = w1 " " ngram;		print ngram 1;	}	head = (head + 1) % order;}{	for (i = 1; i <= NF; i ++) {		process_word($i);	}}

⌨️ 快捷键说明

复制代码Ctrl + C
搜索代码Ctrl + F
全屏模式F11
增大字号Ctrl + =
减小字号Ctrl + -
显示快捷键?