⭐ 欢迎来到虫虫下载站! | 📦 资源下载 📁 资源专辑 ℹ️ 关于我们
⭐ 虫虫下载站

📄 spurl.java

📁 是个java写的sipder,非常不错!能承受很大的压力,每天采集的数量在10000万
💻 JAVA
字号:
package cn.yicha.subject.spdier.url;

import java.sql.*;
import java.util.*;

import cn.yicha.common.db.DAOFactory;
import cn.yicha.common.db.DataAccessObject;
import cn.yicha.common.util.StringParser;


	
/**
* 采集过程中会遇到以下几类地址,包括反向订购地址和一般订购地址,形式分别为
  1) http://wap.monternet.com/?userType=B&serviceID=10111000
  2) http://wap.monternet.com/reversesubscribe?SPID=100501&ServiceID=900503&SPURL=http://...
  3) http://wap.monternet.com/reversesubscribe?SPID=100501&ServiceID=900503
  本类的用途是把订购地址转换为互联网可采集的地址
*/
public class SpUrl
{
	private String _sp_id = "";
	private String _service_id = "";
	private String _charge_url = "";
	private String _free_url = "";

	private static Hashtable _hash_sp_urls = null;

	/**
	* 初始化所有移动梦网的收费地址对应表
	*/
	public static void initSpUrls()
	{
		Hashtable ht = new Hashtable();
		String sql = "select SpID, ServiceID, FreeUrl, ChargeUrl from SpElement";

		DataAccessObject dao = null;
		try
		{
			// 利用DAO获取数据
			dao = DAOFactory.getDAO();
			ResultSet rs = dao.getResult(sql);
			while (rs.next()) {
				SpUrl su = new SpUrl();
				su.setSpID(rs.getString("SpID"));
				su.setServiceID(rs.getString("ServiceID"));
				su.setFreeUrl(rs.getString("FreeUrl"));
				su.setChargeUrl(rs.getString("ChargeUrl"));

				if (su.getServiceID() != null) {
					ht.put(su.getServiceID(), su);
				}
			}
			rs.close();
		}
		catch (Exception ex) {
			ex.printStackTrace();
		}
		finally {
			if (dao != null)
				dao.dispose();
		}

		_hash_sp_urls = ht;
	}

	/**
	* 判断URL地址是否是梦网地址
	*/
	public static boolean isMonternetUrl(String url)
	{
		String pattern = "^http://wap.monternet.com|^http://218.200.244.114|^http://218.200.244.81";
		url = url.trim();
		return StringParser.matchPattern(url, pattern);
	}

	/**
	* 把梦网地址转换为收费入口地址
	*/
	public static String transSpUrl(String url)
	{
		// 从带SPURL参数的反向订购地址提取出入口地址
		String transUrl = getSpUrlTagValue(url);
		if (!transUrl.equals("")) {
			return transUrl;
		}

		// 提取出ServiceID,根据ServiceID获取收费入口地址
		String serviceID = extractTagValueFromUrl(url, "ServiceID");
		if (serviceID.equals("") || _hash_sp_urls == null) {
			return url;
		}

		SpUrl su = (SpUrl) _hash_sp_urls.get(serviceID);
		if (su != null) {
			return su.getChargeUrl();
		}
		else {
			return url;
		}
	}

	/**
	* 从缓存中根据url获取serviceID
	*/
	public static String getServiceIDFromCache(String url)
	{
		if (_hash_sp_urls == null) {
			return null;
		}
		
		Iterator it = _hash_sp_urls.values().iterator();
		while (it.hasNext()) {
			SpUrl su = (SpUrl) it.next();
			if (su.getFreeUrl().indexOf(url) >= 0) {
				return su.getServiceID();
			}
		}
		
		return null;
	}
	
	/**
	* 从URL地址中获取ServiceID并返回
	*/
	public static String getServiceIDFromUrl(String url)
	{
		return extractTagValueFromUrl(url, "ServiceID");
	}

	/**
	* 从URL中取得SPURL参数
	*/
	private static String getSpUrlTagValue(String url)
	{
		final String tag = "spurl=";
		
		String result = "";
		int pos = url.toLowerCase().indexOf(tag);
		if (pos >= 0) {
			pos += tag.length();
			result = url.substring(pos);
		}

		return result;
	}
	
	/**
	* 从反向订购地址中提取出参数数值
	*/
	private static String extractTagValueFromUrl(String url, String tag)
	{
		// 参数在URL最末的情况下提取标签值
		String pattern = tag + "\\s*=\\s*(.*?)\\s*&";
		String value = StringParser.getMatchedElement(url, pattern);
		if (!value.equals("")) {
			return value;
		}

		pattern = tag + "\\s*=\\s*(.*?)$";
		return StringParser.getMatchedElement(url, pattern);
	}
	
	public void setSpID(String _sp_id) {
		this._sp_id = _sp_id;
	}

	public String getSpID() {
		return _sp_id;
	}

	public void setServiceID(String _service_id) {
		this._service_id = _service_id;
	}

	public String getServiceID() {
		return _service_id;
	}

	public void setChargeUrl(String _charge_url) {
		this._charge_url = _charge_url;
	}

	public String getChargeUrl() {
		return _charge_url;
	}

	public void setFreeUrl(String _free_url) {
		this._free_url = _free_url;
	}

	public String getFreeUrl() {
		return _free_url;
	}
	
	public static void main(String[] args)
	{
		// SpUrl.initSpUrls();
		String url = "http://218.200.244.81?userType=B";
		String pattern = "^http://wap.monternet.com|^http://218.200.244.114|^http://218.200.244.81";
		url = url.trim();
		if (StringParser.matchPattern(url, pattern)) {
			System.out.println("match pattern");
		}
		else {
			System.out.println("not match pattern");
		}
	}
}

⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -