60.txt
来自「This complete matlab for neural network」· 文本 代码 · 共 375 行 · 第 1/2 页
TXT
375 行
d products or services offered, such as insurance and financial institutions,
where human marketers can write a small number of rules and walk away.
Other personalization systems, such as Andromedia LikeMinds, emphasize automat
ic realtime selection of items to be offered or suggested. Systems that use th
e idea that "people like you make good predictors for what you will do" are ca
lled "collaborative filters." These systems are usually deployed in situations
where there are many items offered, such as clothing, entertainment, office s
upplies, and consumer goods. Human marketers go insane trying to determine wha
t to offer to whom, when there are thousands of items to offer. As a result au
tomatic systems are usually more effective in these environments. Personalizin
g from large inventories is complex, unintuitive, and requires processing huge
amounts of data.
Association. Also called market-basket analysis, association identifies items
that are likely to be purchased or viewed in the same session. If you place re
ferences to these items together on the same page in a Web catalog, you may re
mind your visitor to purchase or view something otherwise forgotten. If you ho
ld a promotion on one item in an association group, you're likely to increase
purchases of other items in that group.
Association can be deployed in situations even where you have static catalog p
ages. In this case, you rely on the visitor to select the first catalog page t
o view, and then serve up related items as cross-sells. Association is the dat
a-mining solution Amazon uses when it says, "Customers who bought The Grapes o
f Wrath also bought The Great Gatsby."
Knowledge Management. These systems seek to identify and leverage patterns in
natural language documents. A more specific term is "text analysis," since the
vast majority operate on text. The first step is associating words and contex
t with high-level concepts. This can be done in a directed way by training a s
ystem with documents that have been tagged by a human with the relevant concep
ts. The system then builds a pattern matcher for each concept. When presented
with a new document, the pattern matcher decides how strongly the document rel
ates to the concept.
This approach can be used to sort incoming documents into predefined categorie
s. Companies use this approach to build automatic site indices for visitors. N
ews and portal sites use this to reduce the cost of categorizing and selecting
news from syndicators. Some systems also provide automatic summaries of key p
oints, and cross-reference documents to related material.
Knowledge management systems can be used to personalize online publications. I
magine a pattern matcher for the "what Dan Greening likes" concept. This syste
m would find new documents that contain words and context also contained in ar
ticles that I've read before. Products in this area include Autonomy and HNC S
electResponse. (Also see "Mining Camps".)
Knowledge management systems can assist in creating automatic responses to hel
p requests. For example, inbound requests to a customer-support email address
can be categorized, and an automatic response can be sent from a library of FA
Qs. Vendors in this area include Kana and eGain. (See the box "Knowledge Manag
ement" in the November 1999 article "You Asked For It: Solving the Customer Su
pport Dilemma.")
One of the most interesting applications in this area is Abuzz Beehive, which
creates a "knowledge network" within a community of experts. If you send a que
stion to Beehive, it first tries to find a good answer in its archive. If it d
oesn't have a good answer, it redirects the question to an expert it thinks ca
n properly respond. If the expert does respond, it squirrels the response away
in case the question is asked again. In this way, it builds up a permanent, a
dapting knowledge base.
Abuzz has created something I find both exciting and spooky: a more informativ
e organism bred from machine and human. Beehive is a computer broker that brin
gs together human experts with different specializations. Students of biology
will note this parallels important evolutionary events, such as the aggregatio
n and differentiation of single-celled organisms into more effective multicell
ed organisms.
Clustering. Sometimes called segmentation, clustering identifies people who sh
are common characteristics, and averages those characteristics to form a "char
acteristic vector" or "centroid." Clustering systems usually let you specify h
ow many clusters to identify within a group of profiles, and then try to find
the set of clusters that best represents the most profiles.
Clustering is used directly by some vendors to provide reports on general char
acteristics of different visitor groups. These techniques require training, an
d suffer from drift on Web sites with dynamic Web pages. (Again, see the artic
le "Tracking Users," Web Techniques, July 1999.)
Estimation and Prediction. Estimation guesses an unknown value, such as income
, when you know other things about a person. Prediction guesses a future value
, such as the probability of buying a car next year, when a person hasn't done
it yet, or the expected number of stocks that a person will trade in the comi
ng year. The same algorithms can perform estimation and prediction.
Estimation is often used in demographics to fill in the blanks. If you don't k
now what income a person has, an estimator can identify other variables that c
orrelate well with income -- such as location, car preference, job title -- th
en find other people with similar traits and use them to estimate income and c
onfidence value.
Prediction can compute important future attributes of a person -- such as life
time monetary value, next visit interval, learning speed, promotion susceptibi
lity, and so on -- based on the same approach. These values can be used in per
sonalization applications.
Marketers often aggregate information to understand groups of customers. Even
adding up or averaging past events over different dimensions -- such as visito
r category, content category, referrer, and time -- can provide useful informa
tion. This simple aggregation is called OLAP, online analytic processing: onli
ne because the marketer uses an online reporting engine to interactively move
through the data; analytic because the marketer is passively looking through p
ast data, not trying to change it.
Prediction can be applied in combination with OLAP techniques to generalize pr
operties of groups of people visiting a Web site. This can help a marketer to
slice and dice the data to find which item attributes or site characteristics
appeal to the most valuable customers.
Decision Trees. A decision tree is essentially a flow chart of questions or da
ta points that ultimately leads to a decision. For example, a car-buying decis
ion tree might start by asking whether you want a 1999 or 2000 model year car,
then ask what type of car, then ask whether you prefer power or economy, and
so on, until it determines what might be the best car for you. Decision tree s
ystems try to create optimized paths, ordering the questions so a decision can
be made in the least number of steps.
Decision tree systems are incorporated in product-selection systems offered by
many vendors. They're great for situations in which a visitor comes to a Web
site with a particular need. But once the decision has been made, the answers
to the questions contribute little to targeting or personalization for that vi
sitor in the future.
For example, decision trees are used in the "paper clip" office assistant in M
icrosoft Office: It watches what you click on, and observes your mistakes. It
may decide you need help and bring up a help page with more information. Some
of us find the paper clip helpful. Others wish we could strangle it.*
Picking a Solution
Data mining isn't for the faint of heart. You face three major problems. First
, many good data-mining professionals are serious nerds who speak the foreign
language of statistics. Second, there are few plug-and-play solutions. And thi
rd, everything useful is expensive.
I wrote this article to strengthen your resolve.
The previous sections showed you how to determine the data you should collect,
the metrics you hope to improve, and the framework of the problem. If you kno
w these things, you can communicate more fluently with data-mining professiona
ls.
Use caution when listening to traditional offline data-mining professionals. I
t's likely that your Web site operates at a faster rate, involves more data, a
nd is more mission critical than anything they've done. Traditionalists are fa
miliar with a more relaxed world: where data mining is used once per month, ra
ther than once per click; where data accumulates in gigabytes per year rather
than gigabytes per month; and where a crashed application needs to be fixed in
the morning, rather than instantly by redundant machines and fail-safe rollov
er.
Data-mining algorithms overlap in the problems they can solve, but for a given
problem there's usually a "best algorithm." When you buy a product, make sure
the algorithm it uses is appropriate for the task you're trying to perform. T
he box titled "Picks, Pans, & Dynamite" discusses the most common data-mining
techniques used on the Web.
Though data-mining applications are expensive, everything is relative. Androme
dia's LikeMinds personalization system increased average spend rate on the Lev
i-Strauss online store by 33 percent and increased repeat visitation by 225 pe
rcent. This adds up to a lot of revenue.
The world of Web data mining is simultaneously a minefield and a gold mine. By
saving data associated with visitors, content, and interactions, you can at l
east ensure you'll be able use it later. Despite the difficulties, you might c
onsider evaluating and incorporating data-mining applications now. The sooner
you start learning from your data, the sooner you can leave your competitors i
n the dust.
------------------------------------------------------------------------------
--
Dan holds a Ph.D. in computer science from UCLA, emphasizing parallel statisti
cal optimization. He is currently chief technology officer at Andromedia. He c
an be reached at greening@andromedia.com.
--
FAMILY=(F)ATHER (A)ND (M)OTHER, (I) (L)OVE (Y)OU!
※ 来源:.南京大学小百合站 http://bbs.nju.edu.cn [FROM: 202.100.5.132]
⌨️ 快捷键说明
复制代码Ctrl + C
搜索代码Ctrl + F
全屏模式F11
增大字号Ctrl + =
减小字号Ctrl + -
显示快捷键?