⭐ 欢迎来到虫虫下载站! | 📦 资源下载 📁 资源专辑 ℹ️ 关于我们
⭐ 虫虫下载站

📄 getcontent1.java

📁 解析html网页
💻 JAVA
字号:
package com.unison.learn.http.wxx.main;

import java.io.*;
import java.net.*;
import java.util.ArrayList;
import java.util.List;
import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class GetContent1 {

	public static String content() throws Exception {

		String resultStr = new String();

		URL url = new URL("http://youxi.zol.com.cn/pc/index4869.html");
		URLConnection connection = url.openConnection();

		BufferedReader in = new BufferedReader(new InputStreamReader(connection
				.getInputStream()));
	
		int read_rst = in.read();
		StringBuffer readBuffer = new StringBuffer();
		
		while (-1 != read_rst) {
			
			char singleChar = (char) read_rst;
			readBuffer.append(singleChar);
			read_rst = in.read();
		}
		in.close();
		return resultStr = readBuffer.toString();
		
		//System.out.println(resultStr);
	}
	
	public String getList(final String s)  
	 {  
		String theList =null;
	  String regex;  
	  String title = "";  
	  final List<String> list = new ArrayList<String>();  
	  regex = "<div class=\"Ar mt0\"  style=\"padding:0 0 0 8px\">.*?</div>";  
	  final Pattern pa = Pattern.compile(regex, Pattern.CANON_EQ);  
	  final Matcher ma = pa.matcher(s);  
	  if (ma.find())  
	  {  
	 theList = ma.group();  
	  }  
	  System.out.println(theList);
	  
	  
	return  theList;  
	 }  
	
	
	public String outTag(final String s)  
	 {  
	  return s.replaceAll("<.*?>", "");  
	 }  

	
	public static void main(String args[]) throws Exception {
		
		content();
		GetContent1 gc =new GetContent1();
		System.out.println("==================================================");
		String htmlContent = null;
		htmlContent = content();
		String title =gc.getList(htmlContent);
		System.out.println(title);

	}
}

⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -