⭐ 欢迎来到虫虫下载站! | 📦 资源下载 📁 资源专辑 ℹ️ 关于我们
⭐ 虫虫下载站

📄 simpletagger.java

📁 常用机器学习算法,java编写源代码,内含常用分类算法,包括说明文档
💻 JAVA
📖 第 1 页 / 共 2 页
字号:
  /**   * Test a transducer on the given test data, evaluating accuracy   * with the given evaluator   *   * @param model a <code>Transducer</code>   * @param eval accuracy evaluator   * @param testing test data   */  public static void test(Transducer model, TransducerEvaluator eval,                          InstanceList testing)  {    eval.test(model, testing, "Testing", null);  }  /**   * Apply a transducer to an inpput sequence to produce the highest-scoring   * output sequence.   *   * @param model the <code>Transducer</code>   * @param input the input sequence   * @return the best scoring output sequence   */  public static Sequence apply(Transducer model, Sequence input)  {    return model.viterbiPath(input).output();  }    /**   * Command-line wrapper to train, test, or run a generic CRF-based tagger.   *   * @param args the command line arguments. Options (shell and Java quoting should be added as needed):   *<dl>   *<dt><code>--help</code> <em>boolean</em></dt>   *<dd>Print this command line option usage information.  Give <code>true</code> for longer documentation. Default is <code>false</code>.</dd>   *<dt><code>--prefix-code</code> <em>Java-code</em></dt>   *<dd>Java code you want run before any other interpreted code.  Note that the text is interpreted without modification, so unlike some other Java code options, you need to include any necessary 'new's. Default is null.</dd>   *<dt><code>--gaussian-variance</code> <em>positive-number</em></dt>   *<dd>The Gaussian prior variance used for training. Default is 10.0.</dd>   *<dt><code>--train</code> <em>boolean</em></dt>   *<dd>Whether to train. Default is <code>false</code>.</dd>   *<dt><code>--iterations</code> <em>positive-integer</em></dt>   *<dd>Number of training iterations. Default is 500.</dd>   *<dt><code>--test</code> <code>lab</code> or <code>seg=</code><em>start-1</em><code>.</code><em>continue-1</em><code>,</code>...<code>,</code><em>start-n</em><code>.</code><em>continue-n</em></dt>   *<dd>Test measuring labeling or segmentation (<em>start-i</em>, <em>continue-i</em>) accuracy. Default is no testing.</dd>   *<dt><code>--training-proportion</code> <em>number-between-0-and-1</em></dt>   *<dd>Fraction of data to use for training in a random split. Default is 0.5.</dd>   *<dt><code>--model-file</code> <em>filename</em></dt>   *<dd>The filename for reading (train/run) or saving (train) the model. Default is null.</dd>   *<dt><code>--random-seed</code> <em>integer</em></dt>   *<dd>The random seed for randomly selecting a proportion of the instance list for training Default is 0.</dd>   *<dt><code>--orders</code> <em>comma-separated-integers</em></dt>   *<dd>List of label Markov orders (main and backoff)  Default is 1.</dd>   *<dt><code>--forbidden</code> <em>regular-expression</em></dt>   *<dd>If <em>label-1</em><code>,</code><em>label-2</em> matches the expression, the corresponding transition is forbidden. Default is <code>\\s</code> (nothing forbidden).</dd>   *<dt><code>--allowed</code> <em>regular-expression</em></dt>   *<dd>If <em>label-1</em><code>,</code><em>label-2</em> does not match the expression, the corresponding expression is forbidden. Default is <code>.*</code> (everything allowed).</dd>   *<dt><code>--default-label</code> <em>string</em></dt>   *<dd>Label for initial context and uninteresting tokens. Default is <code>O</code>.</dd>   *<dt><code>--viterbi-output</code> <em>boolean</em></dt>   *<dd>Print Viterbi periodically during training. Default is <code>false</code>.</dd>   *<dt><code>--fully-connected</code> <em>boolean</em></dt>   *<dd>Include all allowed transitions, even those not in training data. Default is <code>true</code>.</dd>   *</dl>   * Remaining arguments:   *<ul>   *<li><em>training-data-file</em> if training </li>   *<li><em>training-and-test-data-file</em>, if training and testing with random split</li>   *<li><em>training-data-file</em> <em>test-data-file</em> if training and testing from separate files</li>   *<li><em>test-data-file</em> if testing</li>   *<li><em>input-data-file</em> if applying to new data (unlabeled)</li>   *</ul>   * @exception Exception if an error occurs   */  public static void main (String[] args) throws Exception  {    Reader trainingFile = null, testFile = null;    InstanceList trainingData = null, testData = null;    int numEvaluations = 0;    int iterationsBetweenEvals = 16;    int restArgs = commandOptions.processOptions(args);    if (restArgs == args.length)    {      commandOptions.printUsage(true);      throw new IllegalArgumentException("Missing data file(s)");    }    if (trainOption.value)    {      trainingFile = new FileReader(new File(args[restArgs]));      if (testOption.value != null && restArgs < args.length - 1)        testFile = new FileReader(new File(args[restArgs+1]));    } else       testFile = new FileReader(new File(args[restArgs]));        Pipe p = null;    CRF4 crf = null;    TransducerEvaluator eval = null;		if (continueTrainingOption.value || !trainOption.value) {			if (modelOption.value == null)			{				commandOptions.printUsage(true);				throw new IllegalArgumentException("Missing model file option");			}			ObjectInputStream s =				new ObjectInputStream(new FileInputStream(modelOption.value));			crf = (CRF4) s.readObject();			s.close();			p = crf.getInputPipe();		}		else {			p = new SimpleTaggerSentence2FeatureVectorSequence();			p.getTargetAlphabet().lookupIndex(defaultOption.value);		}        if (testOption.value != null)    {      if (testOption.value.startsWith("lab"))        eval = new TokenAccuracyEvaluator(viterbiOutputOption.value);      else if (testOption.value.startsWith("seg="))      {        String[] pairs = testOption.value.substring(4).split(",");        if (pairs.length < 1)        {          commandOptions.printUsage(true);          throw new IllegalArgumentException(            "Missing segment start/continue labels: " + testOption.value);        }        String startTags[] = new String[pairs.length];        String continueTags[] = new String[pairs.length];        for (int i = 0; i < pairs.length; i++)        {          String[] pair = pairs[i].split("\\.");          if (pair.length != 2)          {            commandOptions.printUsage(true);            throw new              IllegalArgumentException(                "Incorrectly-specified segment start and end labels: " +                pairs[i]);          }          startTags[i] = pair[0];          continueTags[i] = pair[1];        }        eval = new MultiSegmentationEvaluator(startTags, continueTags,                                              viterbiOutputOption.value);      }      else      {        commandOptions.printUsage(true);        throw new IllegalArgumentException("Invalid test option: " +                                        testOption.value);      }    }    if (trainOption.value)    {      p.setTargetProcessing(true);      trainingData = new InstanceList(p);      trainingData.add(        new LineGroupIterator(trainingFile,                              Pattern.compile("^\\s*$"), true));      logger.info        ("Number of features in training data: "+p.getDataAlphabet().size());      if (testOption.value != null)      {        if (testFile != null)        {          testData = new InstanceList(p);          testData.add(            new LineGroupIterator(testFile,                                  Pattern.compile("^\\s*$"), true));        } else        {          Random r = new Random (randomSeedOption.value);          InstanceList[] trainingLists =            trainingData.split(              r, new double[] {trainingFractionOption.value,                               1-trainingFractionOption.value});          trainingData = trainingLists[0];          testData = trainingLists[1];        }      }    } else if (testOption.value != null)    {      p.setTargetProcessing(true);      testData = new InstanceList(p);      testData.add(        new LineGroupIterator(testFile,                              Pattern.compile("^\\s*$"), true));    } else    {      p.setTargetProcessing(false);      testData = new InstanceList(p);      testData.add(        new LineGroupIterator(testFile,                              Pattern.compile("^\\s*$"), true));    }    logger.info ("Number of predicates: "+p.getDataAlphabet().size());    if (p.isTargetProcessing())    {      Alphabet targets = p.getTargetAlphabet();      StringBuffer buf = new StringBuffer("Labels:");      for (int i = 0; i < targets.size(); i++)        buf.append(" ").append(targets.lookupObject(i).toString());      logger.info(buf.toString());    }    if (trainOption.value)    {      crf = train(trainingData, testData, eval,                 ordersOption.value, defaultOption.value,                 forbiddenOption.value, allowedOption.value,                 connectedOption.value, iterationsOption.value,                 gaussianVarianceOption.value, crf);      if (modelOption.value != null)      {        ObjectOutputStream s =          new ObjectOutputStream(new FileOutputStream(modelOption.value));        s.writeObject(crf);        s.close();      }    }    else    {      if (crf == null)      {        if (modelOption.value == null)        {          commandOptions.printUsage(true);          throw new IllegalArgumentException("Missing model file option");        }        ObjectInputStream s =          new ObjectInputStream(new FileInputStream(modelOption.value));        crf = (CRF4) s.readObject();        s.close();      }      if (eval != null)        test(crf, eval, testData);      else      {        for (int i = 0; i < testData.size(); i++)        {          Sequence input = (Sequence)testData.getInstance(i).getData();          Sequence output = apply(crf, input);          if (output.size() != input.size())            System.out.println("Failed to decode input sequence " + i);          else          {            for (int j = 0; j < output.size(); j++)            {              FeatureVector fv = (FeatureVector)input.get(j);              System.out.println(output.get(j).toString() + " " +                                 fv.toString(true));            }            System.out.println();          }        }      }    }  }}

⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -