⭐ 欢迎来到虫虫下载站! | 📦 资源下载 📁 资源专辑 ℹ️ 关于我们
⭐ 虫虫下载站

📄 mql_webdownloader.java

📁 用于抽取网页文本评论的源程序
💻 JAVA
字号:
package cn.casia.ailab.ldy.cmt;

import java.io.BufferedReader;
import java.io.IOException;
import java.io.InputStreamReader;
import java.net.SocketException;
import java.net.SocketTimeoutException;
import java.net.URL;
import java.net.URLConnection;
/* This Class is citing Thunder's work*/
public class Mql_WebDownloader {
	public boolean downloadedFlag = false; // 下载是否成功的标志

	public Mql_WebDownloader() {

	}

	/**
	 * 根据获得的网页编码方式下载相关网页,并存入相关文档中。
	 * 
	 * @param urlName
	 * @return downloadedFlag 网页下载成功标志
	 * @throws IOException
	 * @throws InterruptedException
	 */
	public String webpageDownload(String urlName, String encoding)
			throws IOException, InterruptedException {

		BufferedReader buffRead = null;
		StringBuffer htmlContent = new StringBuffer(); // 存储下载的网页内容
		try {
			// 按照获得的网页编码方式重新下载网页
			URL u = new URL(urlName);
			URLConnection uc = u.openConnection();

			uc.setRequestProperty(
							"User-Agent",
							"Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; .NET CLR 1.1.4322; .NET CLR 2.0.50727)");
			uc.setReadTimeout(60000); // 设定网络查询超时时间,现在暂定为60秒。

			buffRead = new BufferedReader(new InputStreamReader(uc
					.getInputStream(), encoding));

			int c;
			while (true) {
				c = buffRead.read();
				if (c != -1) {
					htmlContent.append((char) c);
				} else
					break;
			}
			buffRead.close();

			downloadedFlag = true; // 下载成功,置标志为true

		} catch (SocketTimeoutException ex) {
			downloadedFlag = false;
		} catch (SocketException ex) {
			downloadedFlag = false;
		} catch (IOException ex) {
			downloadedFlag = false;
		} finally {
			if (buffRead != null) {
				buffRead.close();
			}
		}
		if(downloadedFlag==false){
			return null;
		}else{
			return htmlContent.toString();
		}
		
	}// webpageDownload ends

}

⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -