⭐ 欢迎来到虫虫下载站! | 📦 资源下载 📁 资源专辑 ℹ️ 关于我们
⭐ 虫虫下载站

📄 duc2004mtarticleparser.java

📁 dragontoolkit用于机器学习
💻 JAVA
字号:
package dragon.onlinedb.trec;import dragon.onlinedb.*;/** * <p>DUC 2004 MT article parser </p> * <p></p> * <p>Copyright: Copyright (c) 2005</p> * <p>Company: IST, Drexel University</p> * @author Davis Zhou * @version 1.0 */public class DUC2004MTArticleParser implements ArticleParser {    public String assemble(Article article){        return null;    }    public Article parse(String content){        BasicArticle article;        StringBuffer body;        String sentence;        int start, end;        article=null;        try{            article=new BasicArticle();            //get PMID            start=content.indexOf("<DOCNO>")+7;            end=content.indexOf("<",start);            article.setKey(content.substring(start, end).trim());            //Body            body=null;            start=content.indexOf("<s num",end);            while(start>0){                start=content.indexOf(">",start+6)+1;                end=content.indexOf("</s>",start);                sentence=content.substring(start,end);                start=sentence.indexOf("(AFP) -");                if(start>=0)                    sentence=sentence.substring(start+7);                if(body==null)                    body=new StringBuffer(sentence);                else{                    body.append(' ');                    body.append(sentence);                }                start=content.indexOf("<s num",end+4);            }            article.setBody(body.toString());            return article;        }        catch(Exception e){            e.printStackTrace();            if(article.getKey()!=null)               return article;            else                return null;        }    }}

⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -