⭐ 欢迎来到虫虫下载站! | 📦 资源下载 📁 资源专辑 ℹ️ 关于我们
⭐ 虫虫下载站

📄 getallitempage.java

📁 Light in the box 抓取程序。 使用HttpClient
💻 JAVA
字号:
package com.blogool.crawl;

import java.io.*;

import org.flytinge.ContentHandle;
import org.flytinge.HttpGet;

import com.blogool.crawl.lib.Cat;

public class GetAllItemPage {
	public static void main(String[] args) {
		File pagesDir = new File("d:/libox1/pages");
		File pagesDir1 = new File("d:/libox1/pages");
		if (pagesDir1.exists()) {
			pagesDir1.renameTo(new File("d:/libox1/pages"
					+ System.currentTimeMillis()));
		}
		if (!pagesDir.exists())
			pagesDir.mkdirs();

		Cat root = Util.loadCat(new File("d:/libox1/cats1.xml"));
		for (int i = 0; i < root.getCats().size(); i++) {
			Cat c = root.getCats().get(i);

			for (int j = 0; j < c.getCats().size(); j++) {
				Cat cat = c.getCats().get(j);
				String url = cat.getUrl();
				SaveFileContentHandle sfch = new SaveFileContentHandle(
						new File(pagesDir, Util.modifyName(cat.getCatName())));
				HttpGet hg = new HttpGet(url, "</html>", sfch);
				
				try {
					Thread.sleep(1000);
				} catch (InterruptedException e) {
					// TODO 自动生成 catch 块
					e.printStackTrace();
				}
				
				hg.start();
			}
		}

	}



	public static class SaveFileContentHandle implements ContentHandle {
		private File f;

		public SaveFileContentHandle(File f) {
			this.f = f;
		}

		public void handle(String content) {
			BufferedWriter bw = null;
			if (f.exists()) {
				System.out
						.println("EEEEEEEEEEEERRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRRROOOOOOOOOOOOOOORRRRRRRRRRRR");
			}

			try {
				bw = new BufferedWriter(new FileWriter(f));
				bw.write(content);
			} catch (Exception e) {

			} finally {
				if (bw != null) {
					try {
						bw.close();
					} catch (Exception e) {
						e.printStackTrace();
					}

				}
			}

		}

	}
}

⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -