⭐ 欢迎来到虫虫下载站! | 📦 资源下载 📁 资源专辑 ℹ️ 关于我们
⭐ 虫虫下载站

📄 getsourcetool.java

📁 用来为垂直搜索引擎抓取数据的采集系统
💻 JAVA
字号:
/*
 * ***************************************************** Copyright (c) 2005 IIM
 * Lab. All Rights Reserved. Created by xuehao at Dec 14, 2005 Contact:
 * zxuehao@mail.ustc.edu.cn
 * *****************************************************
 */
package org.indigo.util;

import org.indigo.parser.*;
import java.io.*;
import java.net.HttpURLConnection;
import java.net.MalformedURLException;
import java.net.URL;
/**
 * 实验用,在采集中没有发挥作用。
 * @author wbz
 *
 */
public class GetSourceTool
{

    public static void main(String[] args)
    {
        String url = null, str = null;
//        System.out.println( args.length );
        if (args.length < 1)
        {
            System.out.println("GetSourceTool www.sample.com");
            return;
        } else
        {
            url = args[0];
            System.out.println("Get source from " + url);
        }

        URL itsUrl = null;
        HttpURLConnection itsConn = null;
        try
        {
            itsUrl = new URL(url);
            itsConn = (HttpURLConnection) itsUrl.openConnection();
        } catch (MalformedURLException e)
        {
            e.printStackTrace();
        } catch (IOException e1)
        {
            e1.printStackTrace();
        }

        BufferedReader rd = null;
        itsConn.disconnect();

        itsConn.setRequestProperty("User-Agent",
                "Mozilla/4.0 (compatible; MSIE 6.0; Windows 2000)");
        try
        {
            itsConn.connect();
            rd = new BufferedReader(new InputStreamReader(itsConn
                    .getInputStream(), "gb2312"));

        } catch (IOException e3)
        {
            e3.printStackTrace();
        }

        if (rd == null)
            return;

        File file = null;
        BufferedWriter out = null;
        try
        {

            file = new File("source.txt");
            out = new BufferedWriter(new OutputStreamWriter(
                    new FileOutputStream(file)));

        } catch (FileNotFoundException e)
        {
            e.printStackTrace();
        }

        int i = 0;
        try
        {
            while ((str = rd.readLine()) != null)
            {
                i++;
                out.write(str + "\n");
            }
        } catch (IOException e1)
        {
            e1.printStackTrace();
        }
        try
        {
            rd.close();
        } catch (IOException e2)
        {
            e2.printStackTrace();
        }
        itsConn.disconnect();

        try
        {
            out.flush();
            out.close();
        } catch (IOException e1)
        {
            e1.printStackTrace();
        }

    }
}

⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -