📄 6.txt
字号:
发信人: ashun (阿顺), 信区: DataMining
标 题: 数据挖掘术语简介(六)
发信站: 南京大学小百合站 (Thu Aug 30 12:12:15 2001)
MARS
Multivariate Adaptive Regression Splines. MARS is a generalization of a decisi
on tree.
maximum likelihood
Another training or estimation method. The maximum likelihood estimate of a pa
rameter is the value of a parameter that maximizes the probability that the da
ta came from the population defined by the parameter.
mean
The arithmetic average value of a collection of numeric data.
median
The value in the middle of a collection of ordered data. In other words, the v
alue with the same number of items above and below it.
missing data
Data values can be missing because they were not measured, not answered, were
unknown or were lost. Data mining methods vary in the way they treat missing v
alues. Typically, they ignore the missing values, or omit any records containi
ng missing values, or replace missing values with the mode or mean, or infer m
issing values from existing values.
mode
The most common value in a data set. If more than one value occurs the same nu
mber of times, the data is multi-modal.
model
An important function of data mining is the production of a model. A model can
be descriptive or predictive. A descriptive model helps in understanding unde
rlying processes or behavior. For example, an association model describes cons
umer behavior. A predictive model is an equation or set of rules that makes it
possible to predict an unseen or unmeasured value (the dependent variable or
output) from other, known values (independent variables or input). The form of
the equation or rules is suggested by mining data collected from the process
under study. Some training or estimation technique is used to estimate the par
ameters of the equation or rules.
MPP
Massively parallel processing, a computer configuration that is able to use hu
ndreds or thousands of CPUs simultaneously. In MPP each node may be a single C
PU or a collection of SMP CPUs. An MPP collection of SMP nodes is sometimes ca
lled an SMP cluster. Each node has its own copy of the operating system, memor
y, and disk storage, and there is a data or process exchange mechanism so that
each computer can work on a different part of a problem. Software must be wri
tten specifically to take advantage of this architecture.
neural network
A complex nonlinear modeling technique based on a model of a human neuron. A n
eural net is used to predict outputs (dependent variables) from a set of input
s (independent variables) by taking linear combinations of the inputs and then
making nonlinear transformations of the linear combinations using an activati
on function. It can be shown theoretically that such combinations and transfor
mations can approximate virtually any type of response function. Thus, neural
nets use large numbers of parameters to approximate any model. Neural nets are
often applied to predict future outcome based on prior experience. For exampl
e, a neural net application could be used to predict who will respond to a dir
ect mailing.
node
A decision point in a classification (i.e., decision) tree. Also, a point in a
neural net that combines input from other nodes and produces an output throug
h application of an activation function.
noise
The difference between a model and its predictions. Sometimes data is referred
to as noisy when it contains errors such as many missing or incorrect values
or when there are extraneous columns.
non-applicable data
Missing values that would be logically impossible (e.g., pregnant males) or ar
e obviously not relevant.
normalize
A collection of numeric data is normalized by subtracting the minimum value fr
om all values and dividing by the range of the data. This yields data with a s
imilarly shaped histogram but with all values between 0 and 1. It is useful to
do this for all inputs into neural nets and also for inputs into other regres
sion models. (Also see standardize.)
--
业精于勤荒于嬉,行成于思毁于随。 —— 韩愈
临渊羡鱼不如退而结网。 —— 班固
勿以恶小而为之,勿以善小而不为。 —— 刘备
※ 来源:.南京大学小百合站 http://bbs.nju.edu.cn [FROM: 202.119.80.20]
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -