⭐ 欢迎来到虫虫下载站! | 📦 资源下载 📁 资源专辑 ℹ️ 关于我们
⭐ 虫虫下载站

📄 aralesettings.java

📁 一个网络爬虫
💻 JAVA
字号:
package org.flaviotordini.arale;

import java.util.*;
import java.text.*;
import java.io.*;
import java.net.*;

/**
 *  Arale main class
 *
 * @author     Flavio Tordini
 * @created    16 dicembre 2001
 */
public class AraleSettings {

    boolean cleanupLocalpath;
    File localpath = new File("output");
    URL homepage;

    List scanTokens;
    List downloadTokens;
    List globalTokens;

    int domainDepth = 1;
    int maxThreads = 4;

    int minimumFileSize = -1;
    int maximumFileSize = -1;
    boolean downloadUnknownFileSize = true;

    boolean renameDynamicFiles = true;

    long writtenBytesCount;

    String lLimits;
    String rLimits;

    boolean forceHtmlScanning;
    boolean ensureHtmlScanning;

    int pauseMilliseconds;


    /**
     *  Constructor for the AraleSettings object
     *
     * @since    9 gennaio 2002
     */
    public AraleSettings() {
        downloadTokens = new ArrayList();
        scanTokens = new ArrayList();
        globalTokens = new ArrayList();
    }


    /**
     *  Reads parameters from a properties file.
     *
     * @param  propfile       The new parameters value
     * @exception  Exception  Description of Exception
     * @since                 9 gennaio 2002
     */
    public void setParameters(File propfile) throws Exception {

        Properties properties = new Properties();
        FileInputStream fis = new FileInputStream(propfile);
        properties.load(fis);
        fis.close();

        setParameters((Hashtable) properties);

    }


    /**
     *  Reads parameters.
     *
     * @param  parameters     The new parameters value
     * @exception  Exception  Description of Exception
     * @since                 9 gennaio 2002
     */
    public void setParameters(Hashtable parameters) throws Exception {
        /*
            String logfile = (String) parameters.get("log.file");
            if (logfile != null) {
            logger.setLogFile(logfile);
            }
          */
        if (homepage == null) {
            try {
                homepage = new URL((String) parameters.get("URL"));
            } catch (MalformedURLException e) {
                Arale.logger.log("invalid URL: " + e);
                return;
            }
        }

        // cleanupLocalpath = Boolean.valueOf((String) parameters.get("clean.output.directory")).booleanValue();
        if (localpath == null) {
            localpath = new File((String) parameters.get("output.directory") + File.separator);
        }
        if (!localpath.exists() || !localpath.isDirectory()) {
            // Arale.logger.log("creating " + localpath.getCanonicalPath());
            localpath.mkdirs();
        }
        /*
            else if (cleanupLocalpath) {
            logger.log("cleaning " + localpath.getCanonicalPath());
            localpath.delete();
            localpath.mkdirs();
            }
          */
        String tokenstring = (String) parameters.get("scan.tokens");
        for (StringTokenizer st = new StringTokenizer(tokenstring); st.hasMoreElements(); ) {
            scanTokens.add(st.nextToken());
        }

        tokenstring = (String) parameters.get("download.tokens");
        for (StringTokenizer st = new StringTokenizer(tokenstring); st.hasMoreElements(); ) {
            downloadTokens.add(st.nextToken());
        }

        // globalTokens is a TreeSet with reverse sorting order! why sort tokens?
        // because order is important due to the following reason:
        // if we have .js and .jsp tokens we must search more general .js
        // AFTER searching for more specific .jsp
        Set reversesorter = new TreeSet(Collections.reverseOrder());
        reversesorter.addAll(downloadTokens);
        reversesorter.addAll(scanTokens);

        if (reversesorter.remove(".html")) {
            globalTokens.add(".html");
        }
        if (reversesorter.remove(".htm")) {
            globalTokens.add(".htm");
        }
        globalTokens.addAll(reversesorter);


        System.out.println("globalTokens: " + globalTokens);

        /*
            for (Iterator i = globalTokens.iterator(); i.hasNext(); ) {
            System.out.println((String) i.next());
            }
          */
        lLimits = (String) parameters.get("url.leftdelimiters");
        rLimits = (String) parameters.get("url.rightdelimiters");

        domainDepth = Integer.parseInt((String) parameters.get("domain.depth"));
        maxThreads = Integer.parseInt((String) parameters.get("thread.count"));
        minimumFileSize = Integer.parseInt((String) parameters.get("file.minsize"));
        maximumFileSize = Integer.parseInt((String) parameters.get("file.maxsize"));

        downloadUnknownFileSize = Boolean.valueOf((String) parameters.get("file.download.unknown.size")).booleanValue();
        renameDynamicFiles = Boolean.valueOf((String) parameters.get("rename.dynamic.files")).booleanValue();

        forceHtmlScanning = Boolean.valueOf((String) parameters.get("force.html.scanning")).booleanValue();
        ensureHtmlScanning = Boolean.valueOf((String) parameters.get("ensure.html.scanning")).booleanValue();

        pauseMilliseconds = Integer.parseInt((String) parameters.get("pause.milliseconds"));

    }

}

⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -