⭐ 欢迎来到虫虫下载站! | 📦 资源下载 📁 资源专辑 ℹ️ 关于我们
⭐ 虫虫下载站

📄 robotsdirectives.java

📁 高性能分词算法
💻 JAVA
字号:
/* RobotsDirectives.java
 *
 * $Id: PrefixSet.java 4947 2007-03-01 04:47:24Z gojomo $
 *
 * Created April 29, 2008
 *
 * Copyright (C) 2008 Internet Archive.
 *
 * This file is part of the Heritrix web crawler (crawler.archive.org).
 *
 * Heritrix is free software; you can redistribute it and/or modify
 * it under the terms of the GNU Lesser Public License as published by
 * the Free Software Foundation; either version 2.1 of the License, or
 * any later version.
 *
 * Heritrix is distributed in the hope that it will be useful,
 * but WITHOUT ANY WARRANTY; without even the implied warranty of
 * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
 * GNU Lesser Public License for more details.
 *
 * You should have received a copy of the GNU Lesser Public License
 * along with Heritrix; if not, write to the Free Software
 * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA
 */
package org.archive.crawler.datamodel;

import java.io.Serializable;

import org.archive.util.PrefixSet;

/**
 * Represents the directives that apply to a user-agent (or set of
 * user-agents)
 */
public class RobotsDirectives implements Serializable {
    private static final long serialVersionUID = 5386542759286155383L;
    
    PrefixSet disallows = new PrefixSet();
    PrefixSet allows = new PrefixSet();
    float crawlDelay = -1; 

    public boolean allows(String path) {
        if(disallows.containsPrefixOf(path)) {
            return allows.containsPrefixOf(path);
        }
        return true;
    }

    public void addDisallow(String path) {
        if(path.length()==0) {
            // ignore empty-string disallows 
            // (they really mean allow, when alone)
            return;
        }
        disallows.add(path);
    }

    public void addAllow(String path) {
        allows.add(path);
    }

    public void setCrawlDelay(float i) {
        crawlDelay=i;
    }

    public float getCrawlDelay() {
        return crawlDelay;
    }
}

⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -