lengthbins.java

来自「mallet是自然语言处理、机器学习领域的一个开源项目。」· Java 代码 · 共 53 行

JAVA
53
字号
package edu.umass.cs.mallet.share.upenn.ner;import edu.umass.cs.mallet.base.types.*;import edu.umass.cs.mallet.base.pipe.*;import java.util.*;/** * A feature approximating string length. */public class LengthBins extends Pipe implements java.io.Serializable {    String name;    int[] bins;    String[] binNames;    /**     * <p>bins contains the maximum sizes of elements in each bin.     * <p>For example, passing in {1,3,7} would produce 4 bins, for strings     * of lengths 1, 2-3, 4-7, and 8+.     */    public LengthBins (String featureName, int[] binMaxes) {		this.name = featureName;        this.bins = binMaxes;        Arrays.sort(bins);                int min = 1;        binNames = new String[bins.length+1];        for (int i=0; i<bins.length; i++) {            binNames[i] = (min == bins[i] ? "["+min+"]" :                           "["+min+"-"+bins[i]+"]");            min = bins[i]+1;        }        binNames[bins.length] = "["+min+"+]";	}    public Instance pipe (Instance carrier) {        TokenSequence ts = (TokenSequence) carrier.getData();        tokens:        for (int i=0; i < ts.size(); i++) {            Token t = ts.getToken(i);            int length = t.getText().length();            for (int j=0; j<bins.length; j++)                if (length <= bins[j]) {                    t.setFeatureValue(name+"="+binNames[j], 1.0);                    continue tokens;                }            t.setFeatureValue(name+"="+binNames[bins.length], 1.0);        }        return carrier;    }}

⌨️ 快捷键说明

复制代码Ctrl + C
搜索代码Ctrl + F
全屏模式F11
增大字号Ctrl + =
减小字号Ctrl + -
显示快捷键?