📄 复件 浅谈自动采集程序及入库.htm
字号:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3c.org/TR/1999/REC-html401-19991224/loose.dtd">
<!-- saved from url=(0042)http://mysheji.com/blog/article.asp?id=401 -->
<HTML lang=UTF-8 xmlns="http://www.w3.org/1999/xhtml"><HEAD><TITLE>浅谈自动采集程序及入库 - ┢┦aρpy bullet╯blog</TITLE>
<META http-equiv=Content-Type content="text/html; charset=UTF-8">
<META http-equiv=Content-Language content=UTF-8>
<META content=all name=robots>
<META content=hmilygood@gmail.com,Hmily name=author>
<META content="PJBlog2 CopyRight 2005" name=Copyright>
<META
content=PuterJam,Blog,ASP,designing,with,web,standards,xhtml,css,graphic,design,layout,usability,accessibility,w3c,w3,w3cn
name=keywords>
<META content="┢┦aρpy bullet╯blog - ---专注WEB技术,关注生活点滴……" name=description><LINK
title="订阅 ┢┦aρpy bullet╯blog - Development 所有文章(rss2)"
href="http://www.mysheji.com/blog/feed.asp?cateID=15" type=application/rss+xml
rel=alternate><LINK title="订阅 ┢┦aρpy bullet╯blog - Development 所有文章(atom)"
href="http://www.mysheji.com/blog/atom.asp?cateID=15" type=application/atom+xml
rel=alternate><LINK rev=stylesheet media=all href="浅谈自动采集程序及入库.files/global.css"
type=text/css rel=stylesheet><!--全局样式表--><LINK rev=stylesheet media=all
href="浅谈自动采集程序及入库.files/layout.css" type=text/css
rel=stylesheet><!--层次样式表--><LINK rev=stylesheet media=all
href="浅谈自动采集程序及入库.files/typography.css" type=text/css rel=stylesheet><!--局部样式表--><LINK rev=stylesheet media=all
href="浅谈自动采集程序及入库.files/link.css" type=text/css
rel=stylesheet><!--超链接样式表--><LINK rev=stylesheet media=all
href="浅谈自动采集程序及入库.files/editor.css" type=text/css
rel=stylesheet><!--UBB编辑器代码--><LINK href="favicon.ico" type=image/x-icon
rel=icon><LINK href="favicon.ico" type=image/x-icon rel="shortcut icon">
<SCRIPT src="浅谈自动采集程序及入库.files/common.js" type=text/javascript></SCRIPT>
<!--<script type="text/javascript" src="common/nicetitle.js"></script>-->
<META content="MSHTML 6.00.2900.3243" name=GENERATOR></HEAD>
<BODY onkeydown=PressKey() onload=initJS()><A accessKey=i
href="http://mysheji.com/blog/default.asp"></A><A accessKey=z
href="javascript:history.go(-1)"></A>
<DIV id=container><!--顶部-->
<DIV id=header>
<DIV id=blogname>┢┦aρpy bullet╯blog
<DIV id=blogTitle>---专注WEB技术,关注生活点滴……</DIV></DIV>
<DIV id=menu>
<DIV id=Left></DIV>
<DIV id=Right></DIV>
<UL>
<LI class=menuL></LI>
<LI class=menuR></LI></UL></DIV></DIV><!--内容-->
<DIV id=Tbody>
<DIV id=mainContent>
<DIV id=innermainContent>
<DIV id=mainContent-topimg></DIV>
<DIV class=content-width id=Content_ContentList><A accessKey=B
href="http://mysheji.com/blog/article.asp?id=401#body" name=body></A>
<DIV class=pageContent>
<DIV style="FLOAT: right; WIDTH: auto"><A title="上一篇日志: 沧桑岁月" accessKey=,
href="http://mysheji.com/blog/article.asp?id=394"><IMG alt=""
src="浅谈自动采集程序及入库.files/Cprevious.gif" border=0>上一篇</A> | <A
title="下一篇日志: 批量删除记录的二种方法" accessKey=.
href="http://mysheji.com/blog/article.asp?id=402"><IMG alt=""
src="浅谈自动采集程序及入库.files/Cnext.gif" border=0>下一篇</A></DIV><IMG
style="MARGIN: 0px 2px -4px 0px" alt="" src="浅谈自动采集程序及入库.files/27.gif">
<STRONG><A title="查看所有Web Development的日志"
href="http://mysheji.com/blog/default.asp?cateID=15">Web
Development</A></STRONG> <A title="订阅所有Web Development的日志" accessKey=O
href="http://mysheji.com/blog/feed.asp?cateID=15" target=_blank><IMG
style="MARGIN-BOTTOM: -1px" alt="订阅所有Web Development的日志"
src="浅谈自动采集程序及入库.files/rss.png" border=0></A> </DIV>
<DIV class=Content>
<DIV class=Content-top>
<DIV class=ContentLeft></DIV>
<DIV class=ContentRight></DIV>
<H1 class=ContentTitle><STRONG>浅谈自动采集程序及入库</STRONG></H1>
<H2 class=ContentAuthor>作者:Hmily 日期:2006-04-01</H2></DIV>
<DIV class=Content-Info>
<DIV class=InfoOther>字体大小: <A accessKey=1
href="javascript:SetFont('12px')">小</A> <A accessKey=2
href="javascript:SetFont('14px')">中</A> <A accessKey=3
href="javascript:SetFont('16px')">大</A></DIV>
<DIV class=InfoAuthor><IMG style="MARGIN: 0px 2px -6px 0px" alt=""
src="浅谈自动采集程序及入库.files/hn2_sunny.gif"><IMG alt=""
src="浅谈自动采集程序及入库.files/hn2_t_sunny.gif"> <IMG style="MARGIN: 0px 2px -1px 0px"
alt="" src="浅谈自动采集程序及入库.files/level3.gif"></DIV></DIV>
<DIV class=Content-body
id=logPanel>最近网上流行着一些采集程序,更多人拿着这些东西在网上叫卖,很多不太懂的人看着那些程序眼羡,其实如果你懂一些ASP,了解自动采集程序的原理后,你会感觉实现自动化也是那么的简单.<BR>原理及优点:通过XML中的XMLHTTP组件调用其它网站上的网页,然后批量截取或替换原有的信息使其转化成变量后再一一储存到数据库中。其主要的优点便是无需再手工添加大量的信息了,可以指定对某一个站信息的截取进行批量录入,达到省时省力的目的。与其单纯的ASP小偷程序不同的是:它已经不再依赖其目标网站。<BR>简单事例:<BR><BR>
<DIV class=UBBPanel>
<DIV class=UBBTitle><IMG style="MARGIN: 0px 2px -3px 0px" alt=程序代码
src="浅谈自动采集程序及入库.files/code.gif"> 程序代码</DIV>
<DIV class=UBBContent><%<BR>'声明取得目标信息的函数,通过XML组件进行实现。<BR>Function GetURL(url)
<BR>Set Retrieval = CreateObject("Microsoft.XMLHTTP") <BR>With Retrieval
<BR>.Open "GET", url, False<BR>.Send <BR>GetURL =
bytes2bstr(.responsebody)<BR>'对取得信息进行验证,如果信息长度小于100则说明截取失败<BR>if
len(.responsebody)<100 then<BR>response.write "获取远程文件 <a
href="&url&" target=_blank>"&url&"</a>
失败。"<BR>response.end<BR>end if<BR><BR>End With <BR>Set Retrieval = Nothing
<BR>End Function<BR>' 二进制转字符串,否则会出现乱码的!<BR>function bytes2bstr(vin)
<BR>strreturn = "" <BR>for i = 1 to lenb(vin) <BR>thischarcode =
ascb(midb(vin,i,1)) <BR>if thischarcode < &h80 then <BR>strreturn =
strreturn & chr(thischarcode) <BR>else <BR>nextcharcode =
ascb(midb(vin,i+1,1)) <BR>strreturn = strreturn & chr(clng(thischarcode) *
&h100 + cint(nextcharcode)) <BR>i = i + 1 <BR>end if <BR>next <BR>bytes2bstr
= strreturn <BR>end function <BR>'声明截取的格式,从Start开始截取,到Last为结束<BR>Function
GetKey(HTML,Start,Last)<BR>filearray=split(HTML,Start)<BR>filearray2=split(filearray(1),Last)<BR>GetKey=filearray2(0)<BR>End
Function<BR><BR>Dim Softid,Url,Html,Title
<BR><BR>'获取要取页面的ID<BR><BR>SoftId=Request("Id")<BR><BR> Url="<A
href="http://www3.skycn.com/soft/"
target=_blank>http://www3.skycn.com/soft/</A>"&SoftId&".html"
<BR><BR> Html = GetURL(Url) <BR><BR>'以截取天空软件的软件名为例子<BR><BR> Title =
GetKey(Html,"<font color='#004FC6'
size='3'>","</font></b></td></tr>")<BR><BR>'打开数据库,准备入库<BR><BR>dim
connstr,conn,rs,sql<BR><BR>connstr="DBQ="+server.mappath("db1.mdb")+";DefaultDir=;DRIVER={Microsoft
Access Driver (*.mdb)};"<BR><BR>set
conn=server.createobject("ADODB.CONNECTION")<BR><BR>conn.open connstr<BR><BR>set
rs=server.createobject("adodb.recordset")<BR><BR>sql="select [列名] from [表名]
where [列名]='"&Title&"'"<BR><BR>rs.open sql,conn,3,3<BR><BR>if rs.eof and
rs.bof then <BR><BR>rs("列名")=Title<BR><BR>rs.update <BR><BR>set
rs=nothing<BR><BR>end if<BR><BR>set
rs=nothing<BR><BR>Response.Write"采集完毕!"<BR><BR>%>
</DIV></DIV><BR><BR><BR></DIV>
<DIV class=Content-body><IMG style="MARGIN: 0px 2px -4px 0px" alt=""
src="浅谈自动采集程序及入库.files/From.gif"><STRONG>文章来自:</STRONG> <A
href="http://www.mysheji.com/blog/" target=_blank>本站原创</A><BR><IMG
style="MARGIN: 4px 2px -4px 0px" alt=""
src="浅谈自动采集程序及入库.files/icon_trackback.gif"><STRONG>引用通告:</STRONG> <A
href="http://mysheji.com/blog/trackback.asp?tbID=401&action=view"
target=_blank>查看所有引用</A> | <A title=获得引用文章的链接 onclick=getTrackbackURL(401)
href="javascript:;">我要引用此文章</A><BR><IMG style="MARGIN: 4px 2px -4px 0px" alt=""
src="浅谈自动采集程序及入库.files/tag.gif"><STRONG>Tags:</STRONG> <A
href="http://mysheji.com/blog/default.asp?tag=ASP">ASP</A><A
style="DISPLAY: none" href="http://technorati.com/tag/ASP" rel=tag>ASP</A>
<BR></DIV>
<DIV class=Content-bottom>
<DIV class=ContentBLeft></DIV>
<DIV class=ContentBRight></DIV>评论: 0 | <A
href="http://mysheji.com/blog/trackback.asp?tbID=401&action=view"
target=_blank>引用: 2</A> | 查看次数: 1524</DIV></DIV></DIV><A accessKey=C
href="http://mysheji.com/blog/article.asp?id=401#comm_top" name=comm_top></A>
<DIV id=MsgContent style="WIDTH: 94%">
<DIV id=MsgHead>发表评论</DIV>
<DIV id=MsgBody>
<SCRIPT type=text/javascript> function checkCommentPost(){ if (!CheckPost) return false // 备用方法 return true } </SCRIPT>
<FORM style="MARGIN: 0px" name=frm onsubmit="return checkCommentPost()"
action=blogcomm.asp method=post>
<TABLE cellSpacing=0 cellPadding=0 width="100%">
<TBODY>
<TR>
<TD align=right width=70><STRONG>昵 称:</STRONG></TD>
<TD
style="PADDING-RIGHT: 3px; PADDING-LEFT: 3px; PADDING-BOTTOM: 3px; PADDING-TOP: 3px"
align=left><INPUT class=userpass maxLength=24 size=18 name=username></TD></TR>
<TR>
<TD align=right width=70><STRONG>密 码:</STRONG></TD>
<TD
style="PADDING-RIGHT: 3px; PADDING-LEFT: 3px; PADDING-BOTTOM: 3px; PADDING-TOP: 3px"
align=left><INPUT class=userpass type=password maxLength=24 size=18
name=password> 游客发言不需要密码.</TD></TR>
<TR>
<TD align=right width=70><STRONG>验证码:</STRONG></TD>
<TD
style="PADDING-RIGHT: 3px; PADDING-LEFT: 3px; PADDING-BOTTOM: 3px; PADDING-TOP: 3px"
align=left><INPUT class=userpass maxLength=4 size=4 name=validate> <SPAN
style="MARGIN-RIGHT: 40px">=3+5</SPAN></TD></TR>
<TR>
<TD vAlign=top align=right width=70><STRONG>内 容:</STRONG><BR></TD>
<TD
style="PADDING-RIGHT: 2px; PADDING-LEFT: 2px; PADDING-BOTTOM: 2px; PADDING-TOP: 2px">
<SCRIPT language=javascript src="浅谈自动采集程序及入库.files/UBBCode.js"
type=text/javascript></SCRIPT>
<SCRIPT language=javascript src="浅谈自动采集程序及入库.files/UBBCode_help.js"
type=text/javascript></SCRIPT>
<DIV class=UBBSmiliesPanel id=UBBSmiliesPanel>
<TABLE cellSpacing=2 cellPadding=0>
<TBODY>
<TR>
<TD><A class=Smilie title=[smile]
href="javascript:AddSmiley('[smile]')"><IMG alt=""
src="浅谈自动采集程序及入库.files/smile.gif" border=0></A></TD>
<TD><A class=Smilie title=[scared]
href="javascript:AddSmiley('[scared]')"><IMG alt=""
src="浅谈自动采集程序及入库.files/scared.gif" border=0></A></TD>
<TD><A class=Smilie title=[sadclown]
href="javascript:AddSmiley('[sadclown]')"><IMG alt=""
src="浅谈自动采集程序及入库.files/sadclown.gif" border=0></A></TD>
<TD><A class=Smilie title=[rolleyes]
href="javascript:AddSmiley('[rolleyes]')"><IMG alt=""
src="浅谈自动采集程序及入库.files/rolleyes.gif" border=0></A></TD>
<TD><A class=Smilie title=[right]
href="javascript:AddSmiley('[right]')"><IMG alt=""
src="浅谈自动采集程序及入库.files/right.gif" border=0></A></TD>
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -