⭐ 欢迎来到虫虫下载站! | 📦 资源下载 📁 资源专辑 ℹ️ 关于我们
⭐ 虫虫下载站

📄 testcollectedpage1.java

📁 用来为垂直搜索引擎抓取数据的采集系统
💻 JAVA
字号:
/*
 * *****************************************************
 * Copyright (c) 2005 IIM Lab. All  Rights Reserved.
 * Created by xuehao at 2005-10-12
 * Contact: zxuehao@mail.ustc.edu.cn
 * *****************************************************
 */

package org.indigo.tests.pages;

import java.util.ArrayList;

import junit.framework.TestCase;

import org.indigo.pages.CollectedIdsPage;
import org.indigo.pages.CollectedPage;
import org.indigo.pages.VisitPage;

public class TestCollectedPage1 extends TestCase
{
    public void testCollectedPage1()
    {
        CollectedPage colPage = new CollectedPage( "id" );
        colPage.setBeginUrl( "http://www.ahnw.gov.cn/scxx/gqrx/content.asp?id=" );
        
        VisitPage visitPage = new VisitPage( "page" );
        visitPage.setBeginUrl( "http://www.ahnw.gov.cn/scxx/gqrx/?page=1&lb=1%C5%A9%B8%B1%B2%FA%C6%B7&key=&r=a&SortName=" );
        visitPage.setParameters( 1, 3, 1 );
        
        String front, back;
        front = "<a href=\"javascript:opencontent(";
        back = ");";
        
        front = "<a href=\"count.asp?id=";
        back = "\"";
        CollectedIdsPage vIdsPage = new CollectedIdsPage( front,back );
        String url=null;

        url = visitPage.getCurrentLink();
        while( url!=null )
        {
	        vIdsPage.setUrl( url );
	        ArrayList ids=null;
	        ids = vIdsPage.getIds();

	        for( int i=0; i<ids.size(); i++ )
	        {
	            String id=null;
	            id = (String) ids.get(i);
//	            System.out.println( "id=" + id );
	            url = colPage.getCollectedUrl( id );
	            System.out.println( url );
	        }      
        
	        url = visitPage.getNextVisitLink();
        }
        System.out.println( "TestCollectedPage over." );
        
    }

}

⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -