⭐ 欢迎来到虫虫下载站! | 📦 资源下载 📁 资源专辑 ℹ️ 关于我们
⭐ 虫虫下载站

📄 savehtmltodb.java

📁 网页抓取。 实现对指定网址的网页内容进行抓取。网页抓取。 实现对指定网址的网页内容进行抓取。网页抓取。 实现对指定网址的网页内容进行抓取。
💻 JAVA
字号:
package com.snoics.reptile.parse;

import java.util.ArrayList;
import java.util.List;
import java.util.Set;
import java.util.Date;
import java.io.Writer;
import java.sql.*;

import com.snoics.base.interfaces.log.Log;
import com.snoics.base.util.StringClass;
import com.snoics.reptile.file.CreateHTMLFile;
import com.snoics.reptile.file.ICreateFile;
import com.snoics.reptile.link.createUrl.BuildUrl;
import com.snoics.reptile.link.createUrl.IBuildUrl;
import com.snoics.reptile.regex.url.IFilterAllUrl;
import com.snoics.reptile.regex.url.IMakeUpUrl;
import com.snoics.reptile.regex.url.impl.FilterAllUrl;
import com.snoics.reptile.regex.url.impl.MakeUpUrl;
import com.snoics.reptile.system.common.Common;
import com.snoics.reptile.system.common.CommonObject;
import com.snoics.reptile.util.UrlUtil;
import com.snoics.useclass.SnoicsClass;

public class SaveHtmlToDB{
	private CommonObject commonObject=new CommonObject();
	public SaveHtmlToDB(){
		
	}
	
	public boolean savetodb(String htmlstr,String did,String cateid){
		
		String sql = "";
		Connection con = null ;
		Statement stmt = null ;
		Statement linkst =null;
		Statement st =null;
		String username=commonObject.getConfigInfo(Common.CONFIGFILE_NODE_DBUSER);
		String password=commonObject.getConfigInfo(Common.CONFIGFILE_NODE_DBPWD);
		String dbip=commonObject.getConfigInfo(Common.CONFIGFILE_NODE_DBIP);
		String dbname=commonObject.getConfigInfo(Common.CONFIGFILE_NODE_DBNAME);
		String jdbcurl="jdbc:oracle:thin:@"+dbip+":"+dbname;
		boolean exflag=false;
		try{
			Class.forName("oracle.jdbc.driver.OracleDriver");
			con = DriverManager.getConnection(jdbcurl, username, password);
			stmt = con.createStatement();
			String tmpstr=htmlstr.substring(htmlstr.indexOf("<center><h3>"), htmlstr.indexOf("</center>")+9);
			String title=tmpstr.substring(tmpstr.indexOf("<center><h3>")+12, tmpstr.indexOf("</h3>"));
			String datestr=tmpstr.substring(tmpstr.indexOf("i_insert_time")+28,tmpstr.indexOf("i_author"));
			datestr=datestr.substring(0,datestr.indexOf("<font ")-5);
			datestr=datestr.trim();
			String author="";
			author=tmpstr.substring(tmpstr.indexOf("i_author")+23,tmpstr.indexOf("</center>")-18);
			author=author.replaceAll("<br>", " ");
			author=author.trim();
			String bodytmpstr=htmlstr.substring(htmlstr.indexOf("<div "),htmlstr.lastIndexOf("</div>"));
			String bodystr=bodytmpstr.substring(bodytmpstr.indexOf("<div "),bodytmpstr.lastIndexOf("</div>")+6);
			//java.util.Date curtime=new java.util.Date();
			
			sql ="insert into IN_INFO(ID,INFOTYPE,TITLE,SOURCE,IMPORTANCE,SUMMARY,STATUS,AUTHOR,REFERENCEFLAG,TIME,IPACCESS) values(IN_S_INFO.nextval,0,'"+title+"','总行',null,' ','1001xxxx','系统','0','"+datestr+"',0)";
			//PreparedStatement ps=con.prepareStatement(sql, PreparedStatement.RETURN_GENERATED_KEYS);
			//ps.setInt(1, x)
			//stmt=con.prepareStatement(sql, Statement.RETURN_GENERATED_KEYS);
			//int id=stmt.
			exflag = stmt.execute(sql);
			//int key=stmt.executeUpdate(sql, Statement.RETURN_GENERATED_KEYS);
			ResultSet rs=stmt.executeQuery("select IN_S_INFO.currval from dual");
			int key=0;
			if ( rs.next() ) { 
				key = rs.getInt(1);
			} 
			//插入栏目信息关系表
			linkst = con.createStatement();
			String sqllink="insert into IN_CATEINFOLINK(ID,REFERENCED,CATEGORYID,INFOID,IMPORTANCE) values(IN_S_CATEINFOLINK.nextval,0,"+cateid+","+key+",0)";
			exflag =linkst.execute(sqllink);			
			//插入正文clob字段
			con.setAutoCommit(false);
		    st = con.createStatement();
			String sqlcontent="insert into IN_INFOCONTENT(ID,PAGEINDEX,SUBTITLE,STATUS,INFO_ID,CONTENT) values(IN_S_INFOCONTENT.nextval,1,' ','1',"+key+",empty_clob())";
			exflag =st.execute(sqlcontent);
			ResultSet ckrs=st.executeQuery("select IN_S_INFOCONTENT.currval from dual");
			int contentkey=0;
			if ( ckrs.next() ) { 
				contentkey = ckrs.getInt(1);
			}
			ResultSet crs = st.executeQuery("select CONTENT from IN_INFOCONTENT where ID="+contentkey+" for update");
			Writer outStream=null;
			if (crs.next())
		    {
		        //得到java.sql.Clob对象后强制转换为oracle.sql.CLOB
		        oracle.sql.CLOB clob = (oracle.sql.CLOB) crs.getClob("CONTENT");
		        outStream = clob.getCharacterOutputStream();
		        //data是传入的字符串,定义:String data
		        char[] c = bodystr.toCharArray();
		        outStream.write(c, 0, c.length);
		    }
		    outStream.flush();
		    outStream.close();
		    con.commit();
		    System.out.println("抓取到信息插入门户数据库成功,插入的栏目ID="+cateid);
		}
		catch (Exception e) {
			e.printStackTrace();
		}
		finally{
			try {
				if (stmt!=null)
				{
					stmt.close();
				}if (linkst!=null)
				{
					linkst.close();
				}if (st!=null)
				{
					st.close();
				}if (con!=null)
				{
					con.close();
				}
				} catch (Exception e) {
				e.printStackTrace();
			}
		}
		return exflag;
	}
}

⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -