⭐ 欢迎来到虫虫下载站! | 📦 资源下载 📁 资源专辑 ℹ️ 关于我们
⭐ 虫虫下载站

📄 7.txt

📁 This complete matlab for neural network
💻 TXT
字号:
发信人: yaomc (白头翁&山东大汉), 信区: DataMining
标  题: [合集]What is Web Mining ?
发信站: 南京大学小百合站 (Tue Nov 27 11:30:14 2001), 站内信件

greenflower (小呆) 于Thu Aug 23 16:37:11 2001)
提到:

What is Web Mining ?

Web Mining is the extraction of interesting and potentially useful patterns an
d implicit information from artifacts or activity related to the WorldWide Web
. There are roughly three knowledge discovery domains that pertain to web mini
ng: Web Content Mining, Web Structure Mining, and Web Usage Mining. Web conten
t mining is the process of extracting knowledge from the content of documents 
or their descriptions. Web document text mining, resource discovery based on c
oncepts indexing or agentbased technology may also fall in this category. Web 
structure mining is the process of inferring knowledge from the WorldWide Web 
organization and links between references and referents in the Web. Finally, w
eb usage mining, also known as Web Log Mining, is the process of extracting in
teresting patterns in web access logs. 

Web Content Mining 

Web content mining is an automatic process that goes beyond keyword extraction
. Since the content of a text document presents no machinereadable semantic, s
ome approaches have suggested to restructure the document content in a represe
ntation that could be exploited by machines. The usual approach to exploit kno
wn 

structure in documents is to use wrappers to map documents to some data model.
 Techniques using lexicons for content interpretation are yet to come. 

There are two groups of web content mining strategies: Those that directly min
e the content of documents and those that improve on the content search of oth
er tools like search engines. 

Web Structure Mining 

WorldWide Web can reveal more information than just the information contained 
in documents. For example, links pointing to a document indicate the popularit
y of the document, while links coming out of a document indicate the richness 
or perhaps the variety of topics covered in the document. This can be compared
 to bibliographical citations. When a paper is cited often, it ought to be imp
ortant. The PageRank and CLEVER methods take advantage of this information con
veyed by the links to find pertinent web pages. By means of counters, higher l
evels cumulate the number of artifacts subsumed by the concepts they hold. Cou
nters of hyperlinks, in and out documents, retrace the structure of the web ar
tifacts summarized. 

Web Usage Mining 

Web servers record and accumulate data about user interactions whenever reques
ts for resources are received. Analyzing the web access logs of di#erent web s
ites 

can help understand the user behaviour and the web structure, thereby improvin
g the design of this colossal collection of resources. There are two main tend
encies in Web Usage Mining driven by the applications of the discoveries: Gene
ral Access Pattern Tracking and Customized Usage Tracking. 

The general access pattern tracking analyzes the web logs to understand access
 patterns and trends. These analyses can shed light on better structure and gr
ouping of resource providers. Many web analysis tools existd but they are limi
ted and usually unsatisfactory. We have designed a web log data mining tool, W
ebLogMiner, and proposed techniques for using data mining and OnLine Analytica
l Processing (OLAP) on treated and transformed web access files. Applying data
 mining techniques on access logs unveils interesting access patterns that can
 be used to restructure sites in a more efficient grouping, pinpoint effective
 advertising locations, 

and target specific users for specific selling ads. 

Customized usage tracking analyzes individual trends. Its purpose is to custom
ize web sites to users. The information displayed, the depth of the site struc
ture and the format of the resources can all be dynamically customized for eac
h user over time based on their access patterns.

While it is encouraging and exciting to see the various potential applications
 of web log file analysis, it is important to know that the success of such ap
plications depends on what and how much valid and reliable knowledge one can d
iscover from the large raw log data. Current web servers store limited informa
tion about the accesses. Some scripts customtailored for some sites may store 
additional information. However, for an effective web usage mining, an importa
nt cleaning and data transformation step before analysis may be needed.



ccipt (北方的狼) 于Thu Aug 23 17:29:41 2001)
提到:

更多的请到:

http://www.cs.ualberta.ca/~tszhu/webmining.htm




⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -