⭐ 欢迎来到虫虫下载站! | 📦 资源下载 📁 资源专辑 ℹ️ 关于我们
⭐ 虫虫下载站

📄 pageservice.java

📁 网页采集系统 ================= 安装配置 ------- 1 程序我就不说了 2 配置文件 applicationContext.xml 里面有详细的注释 3 已经
💻 JAVA
字号:
package com.laozizhu.search.util;

import java.io.BufferedReader;
import java.io.FileNotFoundException;
import java.io.InputStream;
import java.io.InputStreamReader;
import java.io.OutputStream;
import java.net.ConnectException;
import java.net.HttpURLConnection;
import java.net.URL;
import java.util.zip.GZIPInputStream;

/**
 * 读取URL的文本工具
 * 
 * @author 赵学庆 www.java2000.net
 */
public class PageService {
  private static final String BR = "\r\n";

  /**
   * 读取文本。默认使用UTF-8编码
   * 
   * @param page 页面的URL,比如 http://www.java2000.net
   * @return 读取到的文本字符串
   */
  public static String getPage(String page) {
    return getPage(page, "UTF-8");
  }

  /**
   * 读取文本
   * 
   * @param page 页面的URL,比如 http://www.java2000.net
   * @param charset 页面的编码
   * @return 读取到的文本字符串
   */
  public static String getPage(String page, String charset) {
    String str = null;
    int count = 3;
    do {
      str = _getPage(page, charset);
      if (str == null || str.length() == 0) {
        try {
          Thread.sleep(1000);
        } catch (InterruptedException e) {
          e.printStackTrace();
        }
      }
    } while (str == null && count-- > 0);
    return str;
  }

  private static String _getPage(String page, String charset) {
    try {
      URL url = new URL(page);
      HttpURLConnection con = (HttpURLConnection) url.openConnection();
      // 增加了浏览器的类型,就用Firefox好了,也许
      con.setRequestProperty("User-Agent",
          "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.2; Trident/4.0; .NET CLR 1.1.4322; .NET CLR 2.0.50727)");
      int index = page.indexOf("/", 10);
      con.setRequestProperty("Host", index == -1 ? page.substring(7) : page.substring(7, index));
      InputStream is = con.getInputStream();
      if (con.getContentEncoding() != null && con.getContentEncoding().equalsIgnoreCase("gzip")) {
        is = new GZIPInputStream(con.getInputStream());
      }
      BufferedReader reader = new BufferedReader(new InputStreamReader(is, charset));
      StringBuilder b = new StringBuilder();
      String line;
      while ((line = reader.readLine()) != null) {
        b.append(line);
        b.append(BR);
      }
      return b.toString();
    } catch (FileNotFoundException ex) {
      System.out.println("NOT FOUND:" + page);
      return null;
    } catch (ConnectException ex) {
      System.out.println("Timeout:" + page);
      return null;
    } catch (Exception ex) {
      ex.printStackTrace();
      return null;
    }
  }

  public static String postPage(String page, String msg) throws Exception {
    URL url = new URL(page);
    HttpURLConnection con = (HttpURLConnection) url.openConnection();
    con.setDoOutput(true); // POST方式
    con.setRequestProperty("User-Agent",
        "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.2; Trident/4.0; .NET CLR 1.1.4322; .NET CLR 2.0.50727)");
    int index = page.indexOf("/", 10);
    con.setRequestProperty("Host", index == -1 ? page.substring(7) : page.substring(7, index));
    con.setRequestMethod("POST");
    con.addRequestProperty("Content-Type", "application/x-www-form-urlencoded");
    OutputStream os = con.getOutputStream(); // 输出流,写数据
    os.write(msg.getBytes("UTF-8"));
    InputStream is = con.getInputStream();
    if (con.getContentEncoding() != null && con.getContentEncoding().equalsIgnoreCase("gzip")) {
      is = new GZIPInputStream(con.getInputStream());
    }
    BufferedReader reader = new BufferedReader(new InputStreamReader(is, "UTF-8")); // 读取结果
    StringBuilder b = new StringBuilder();
    String line;
    while ((line = reader.readLine()) != null) {
      b.append(line);
      b.append(BR);
    }
    os.close();
    reader.close();
    return b.toString();
  }
}

⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -