⭐ 欢迎来到虫虫下载站! | 📦 资源下载 📁 资源专辑 ℹ️ 关于我们
⭐ 虫虫下载站

📄 collectedpage.java

📁 用来为垂直搜索引擎抓取数据的采集系统
💻 JAVA
字号:
/*
 * *****************************************************
 * Copyright (c) 2005 IIM Lab. All  Rights Reserved.
 * Created by xuehao at 2005-10-12
 * Contact: zxuehao@mail.ustc.edu.cn
 * *****************************************************
 */

package org.indigo.pages;

import java.util.ArrayList;

import org.indigo.parser.Parser;

public class CollectedPage extends BeginPage
{
    private IdStrategy itsStrategy = null;

    protected boolean IsByPost=false;
    public CollectedPage(String key)
    {
        super(key);
    }
/**
 * add by wbz for  homepages which are the same as һվͨ;
 * @param al
 * @param front
 * @param back
 * @return
 */
    public ArrayList getIDs(ArrayList al,String front,String back)
    {
    	ArrayList aa=new ArrayList();
    	int l=al.size();
    	for(int i=0;i<l;i++)
    	{
    		Parser p=new Parser();
    		p.setUrl(getCollectedUrl((String) al.get(i)));
    		p.open();
    		String temp=p.parseWith(front, back);
    		aa.add(temp);
    		
    	}
    	return aa;
    }
    public void setIsByPost(boolean IsByPost)
    {
    	this.IsByPost=IsByPost;
    }
    public String getCollectedUrl(String id)
    {
    	if(IsByPost)
    	{
    		return id;
    	}
        String url = null;
        if (itsStrategy == null)
        {
            if (id == null)
                return getSelfCurrentLink();
            int i;
            id = id.trim();
//            i = Integer.parseInt(id);
            url = buildLink(id, itsKeyForLink);
        }else
        {
//            if( !itsBeginUrl.endsWith("/") )
//                itsBeginUrl += "/";
//            url = itsBeginUrl + itsStrategy.getStrategy( id );
        	url = itsStrategy.getStrategy( id );
        }
//        System.out.println( "CollectedUrl=" + url );
        return url;
    }

    public void setIdStrategy(IdStrategy s)
    {
        itsStrategy = s;
        itsStrategy.setBeginUrl( this.itsBeginUrl );
    }
    
    private String getSelfCurrentLink()
    {
        int i;
        i = getCurrentPNum();
        String currentLink = null;
        currentLink = buildLink(i, itsKeyForLink);
        pNum++;
        return currentLink;
    }
    private String buildLink( String id, String key )
    {
        String link=null;
        link = getBeginUrl();
        
        int i;
        i = link.indexOf( key );
        if( i==-1 )
        	return link+id;
//            return null;
        i = i+key.length();
        String subStr1=null,subStr2=null;
        subStr1 = link.substring( 0, i );
        subStr2 = link.substring( i );
        
        if( subStr2.equals("") )
        {
            link = subStr1+id;
            return link;
        }
        i = 0;
        char ch;
        ch = subStr2.charAt(i);
        for( i=0;  i<subStr2.length(); i++ )
        {
            ch = subStr2.charAt(i);
            if( !Character.isDigit( ch ) )
                break;
        }
        subStr2 = subStr2.substring( i );

        link = subStr1 + id + subStr2;
        
        return link;
    }
}

⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -