📄 7.txt
字号:
发信人: ashun (阿顺), 信区: DataMining
标 题: 数据挖掘术语简介(七)
发信站: 南京大学小百合站 (Thu Aug 30 12:14:19 2001)
OLAP
On-Line Analytical Processing tools give the user the capability to perform mu
lti-dimensional analysis of the data.
optimization criterion
A positive function of the difference between predictions and data estimates t
hat are chosen so as to optimize the function or criterion. Least squares and
maximum likelihood are examples.
outliers
Technically, outliers are data items that did not (or are thought not to have)
come from the assumed population of data -- for example, a non-numeric when y
ou are expecting only numeric values. A more casual usage refers to data items
that fall outside the boundaries that enclose most other data items in the da
ta set.
overfitting
A tendency of some modeling techniques to assign importance to random variatio
ns in the data by declaring them important patterns.
overlay
Data not collected by the organization, such as data from a proprietary databa
se, that is combined with the organization's own data.
parallel processing
Several computers or CPUs linked together so that each can be computing simult
aneously.
pattern
Analysts and statisticians spend much of their time looking for patterns in da
ta. A pattern can be a relationship between two variables. Data mining techniq
ues include automatic pattern discovery that makes it possible to detect compl
icated non-linear relationships in data. Patterns are not the same as causalit
y.
precision
The precision of an estimate of a parameter in a model is a measure of how var
iable the estimate would be over other similar data sets. A very precise estim
ate would be one that did not vary much over different data sets. Precision do
es not measure accuracy. Accuracy is a measure of how close the estimate is to
the real value of the parameter. Accuracy is measured by the average distance
over different data sets of the estimate from the real value. Estimates can b
e accurate but not precise, or precise but not accurate. A precise but inaccur
ate estimate is usually biased, with the bias equal to the average distance fr
om the real value of the parameter.
predictability
Some data mining vendors use predictability of associations or sequences to me
an the same as confidence.
prevalence
The measure of how often the collection of items in an association occur toget
her as a percentage of all the transactions. For example, "In 2% of the purcha
ses at the hardware store, both a pick and a shovel were bought."
pruning
Eliminating lower level splits or entire sub-trees in a decision tree. This te
rm is also used to describe algorithms that adjust the topology of a neural ne
t by removing (i.e., pruning) hidden nodes.
--
业精于勤荒于嬉,行成于思毁于随。 —— 韩愈
临渊羡鱼不如退而结网。 —— 班固
勿以恶小而为之,勿以善小而不为。 —— 刘备
※ 来源:.南京大学小百合站 http://bbs.nju.edu.cn [FROM: 202.119.80.20]
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -