📄 8.txt
字号:
发信人: ashun (阿顺), 信区: DataMining
标 题: 数据挖掘术语简介(八)
发信站: 南京大学小百合站 (Thu Aug 30 12:15:02 2001)
range
The range of the data is the difference between the maximum value and the mini
mum value. Alternatively, range can include the minimum and maximum, as in "Th
e value ranges from 2 to 8."
RDBMS
Relational Database Management System.
regression tree
A decision tree that predicts values of continuous variables.
resubstitution error
The estimate of error based on the differences between the predicted values of
a trained model and the observed values in the training set.
right-hand side
When an association between two variables is defined, the second item is calle
d the right-hand side (or consequent). For example, in the relationship "When
a prospector buys a pick, he buys a shovel 14% of the time," "buys a shovel" i
s the right-hand side.
r-squared
A number between 0 and 1 that measures how well a model fits its training data
. One is a perfect fit; however, zero implies the model has no predictive abil
ity. It is computed as the covariance between the predicted and observed value
s divided by the standard deviations of the predicted and observed values.
sampling
Creating a subset of data from the whole. Random sampling attempts to represen
t the whole by choosing the sample through a random mechanism.
sensitivity analysis
Varying the parameters of a model to assess the change in its output.
sequence discovery
The same as association, except that the time sequence of events is also consi
dered. For example, "Twenty percent of the people who buy a VCR buy a camcorde
r within four months."
significance
A probability measure of how strongly the data support a certain result (usual
ly of a statistical test). If the significance of a result is said to be .05,
it means that there is only a .05 probability that the result could have happe
ned by chance alone. Very low significance (less than .05) is usually taken as
evidence that the data mining model should be accepted since events with very
low probability seldom occur. So if the estimate of a parameter in a model sh
owed a significance of .01 that would be evidence that the parameter must be i
n the model.
SMP
Symmetric multi-processing is a computer configuration where many CPUs share a
common operating system, main memory and disks. They can work on different pa
rts of a problem at the same time.
standardize
A collection of numeric data is standardized by subtracting a measure of centr
al location (such as the mean or median) and by dividing by some measure of sp
read (such as the standard deviation, interquartile range or range). This yiel
ds data with a similarly shaped histogram with values centered around 0. It is
sometimes useful to do this with inputs into neural nets and also inputs into
other regression models. (Also see normalize.)
supervised learning
The collection of techniques where analysis uses a well-defined (known) depend
ent variable. All regression and classification techniques are supervised.
support
The measure of how often the collection of items in an association occur toget
her as a percentage of all the transactions. For example, "In 2% of the purcha
ses at the hardware store, both a pick and a shovel were bought."
--
业精于勤荒于嬉,行成于思毁于随。 —— 韩愈
临渊羡鱼不如退而结网。 —— 班固
勿以恶小而为之,勿以善小而不为。 —— 刘备
※ 来源:.南京大学小百合站 http://bbs.nju.edu.cn [FROM: 202.119.80.20]
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -