urldatabase.java

来自「java下的 多线程爬虫 输入线程数目」· Java 代码 · 共 56 行

JAVA
56
字号
package crawler;

import java.util.ArrayList;
import java.util.HashSet;
import java.util.LinkedHashSet;

public class UrlDatabase {
	HashSet<String> crawledList = new HashSet<String>();
	LinkedHashSet<String> toCrawlList = new LinkedHashSet<String>();
	int n = 0;
	
	public UrlDatabase(ArrayList<String> startUrls)
	{
		toCrawlList.addAll(startUrls);
		n = startUrls.size();
	}
	public void clearAll()
	{
		toCrawlList.clear();
	}
	public synchronized void addUrls(ArrayList<String> links)
	{
		for (int i = 0; i < links.size(); i++)
		{
			if (!crawledList.contains(links.get(i)))
			{
				toCrawlList.add(links.get(i));
				n++;
			}
		}
		notifyAll();
	}
	public synchronized String getUrl()
	{
		
		String temp = "";
		if (toCrawlList.isEmpty()) {
			notifyAll();
			return null;
		}
		else {
			n--;
			temp = toCrawlList.iterator().next();
			crawledList.add(temp);
			toCrawlList.remove(temp);
			notifyAll();
			return temp;
		}
	}
	public synchronized int getNum()
	{
		System.out.println(n);
		notifyAll();
		return n;
	}
}

⌨️ 快捷键说明

复制代码Ctrl + C
搜索代码Ctrl + F
全屏模式F11
增大字号Ctrl + =
减小字号Ctrl + -
显示快捷键?