continuous-ngram-count.gawk
来自「这是一款很好用的工具包」· GAWK 代码 · 共 36 行
GAWK
36 行
#!/usr/local/bin/gawk -f## continuous-ngram-count --# Generate ngram counts ignoring line breaks # # usage: continous-ngram-count order=ORDER textfile | ngram-count -read -## $Header: /home/srilm/devel/utils/src/RCS/continuous-ngram-count,v 1.1 1998/08/24 00:52:30 stolcke Exp $#BEGIN { order = 3; head = 0; # next position in ring buffer}function process_word(w) { buffer[head] = w; ngram = ""; for (j = 0; j < order; j ++) { w1 = buffer[(head + order - j) % order]; if (w1 == "") { break; } ngram = w1 " " ngram; print ngram 1; } head = (head + 1) % order;}{ for (i = 1; i <= NF; i ++) { process_word($i); }}
⌨️ 快捷键说明
复制代码Ctrl + C
搜索代码Ctrl + F
全屏模式F11
增大字号Ctrl + =
减小字号Ctrl + -
显示快捷键?