⭐ 欢迎来到虫虫下载站! | 📦 资源下载 📁 资源专辑 ℹ️ 关于我们
⭐ 虫虫下载站

📄 mainmethod.java

📁 针对音乐论坛的爬虫程序 给出地址匹配特征
💻 JAVA
字号:
package main;

import java.net.MalformedURLException;
import java.net.URL;
import java.util.ArrayList;

import procURL.ProcURL;
import savePage.SavePage;



public class MainMethod {
	
	 public static void main(String args[])
	 {
		 
		 String str,str1,str2,str3,str4;
		 Object o1,o2;
		 
		 SavePage ou=new SavePage();
		 ProcURL procURL = new ProcURL();
		 procURL.getBlockList().clear();
		 procURL.getTopicList().clear();
		 procURL.clear();
			 try{
				  str = "http://bbs.breezecn.com/thread.php?fid=5";
				  str1 = str.substring(str.indexOf("?")+1);
				  URL urlTest =new URL(str);
				  procURL.processURL(urlTest);
				  procURL.getBlockPageURL();
				  procURL.clear();
				  
				  //将版块所有页面保存到本地
				  for(int j=0;j<((ArrayList)procURL.getBlockList()).size();j++){
					  o2 = ((ArrayList)procURL.getBlockList()).get(j);
					  str3 = o2.toString();
					  str4 = str3.substring(str3.indexOf("?")+1,str3.indexOf("&"))+str3.substring(str3.lastIndexOf("&"));
					  ou.getBlockContent(str3,str1,str4);
					  
				  }
				  
				  //从版块第一页开始循环获取主题页面URL列表
				  for(int i=0;i<((ArrayList)procURL.getBlockList()).size();i++){
					  o1 = ((ArrayList)procURL.getBlockList()).get(i);
					  str2 = o1.toString();
					  URL blockURL =new URL(str2);
					  procURL.processURL(blockURL);
					  procURL.getTopicPageURL();
					  procURL.clear();
				  }
				  for(int j=0;j<((ArrayList)procURL.getTopicList()).size();j++){
					  o2 = ((ArrayList)procURL.getTopicList()).get(j);
					  str3 = o2.toString();
					  str4 = str3.substring(str3.indexOf("?")+1);
					  ou.getTopicContent(str3,str1,str4);
					  
				  }
				  System.out.println("全部提取完成");
			 }catch(MalformedURLException e){
				  System.out.println("Found malformed URL");
			  }
		  
		 }
}

⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -