📄 5.txt
字号:
发信人: ashun (阿顺), 信区: DataMining
标 题: 数据挖掘术语简介(五)
发信站: 南京大学小百合站 (Thu Aug 30 12:11:27 2001)
genetic algorithms
A computer-based method of generating and testing combinations of possible inp
ut parameters to find the optimal output. It uses processes based on natural e
volution concepts such as genetic combination, mutation and natural selection.
GUI
Graphical User Interface.
hidden nodes
The nodes in the hidden layers in a neural net. Unlike input and output nodes,
the number of hidden nodes is not predetermined. The accuracy of the resultin
g model is affected by the number of hidden nodes. Since the number of hidden
nodes directly affects the number of parameters in the model, a neural net nee
ds a sufficient number of hidden nodes to enable it to properly model the unde
rlying behavior. On the other hand, a net with too many hidden nodes will over
fit the data. Some neural net products include algorithms that search over a n
umber of alternative neural nets by varying the number of hidden nodes, in the
end choosing the model that gets the best results without overfitting.
independent variable
The independent variables (inputs or predictors) of a model are the variables
used in the equation or rules of the model to predict the output (dependent) v
ariable.
induction
A technique that infers generalizations from the information in the data.
interaction
Two independent variables interact when changes in the value of one change the
effect on the dependent variable of the other.
internal data
Data collected by an organization such as operating and customer data.
k-nearest neighbor
A classification method that classifies a point by calculating the distances b
etween the point and points in the training data set. Then it assigns the poin
t to the class that is most common among its k-nearest neighbors (where k is a
n integer).
Kohonen feature map
A type of neural network that uses unsupervised learning to find patterns in d
ata. In data mining it is employed for cluster analysis.
layer
Nodes in a neural net are usually grouped into layers, with each layer describ
ed as input, output or hidden. There are as many input nodes as there are inpu
t (independent) variables and as many output nodes as there are output (depend
ent) variables. Typically, there are one or two hidden layers.
leaf
A node not further split -- the terminal grouping -- in a classification or de
cision tree.
learning
Training models (estimating their parameters) based on existing data.
least squares
The most common method of training (estimating) the weights (parameters) of a
model by choosing the weights that minimize the sum of the squared deviation o
f the predicted values of the model from the observed values of the data.
left-hand side
When an association between two variables is defined, the first item is called
the left-hand side (or antecedent). For example, in the relationship "When a
prospector buys a pick, he buys a shovel 14% of the time", "buys a pick" is th
e left-hand side.
logistic regression (logistic discriminant analysis)
A generalization of linear regression. It is used for predicting a binary vari
able (with values such as yes/no or 0/1). An example of its use is modeling th
e odds that a borrower will default on a loan based on the borrower's
--
业精于勤荒于嬉,行成于思毁于随。 —— 韩愈
临渊羡鱼不如退而结网。 —— 班固
勿以恶小而为之,勿以善小而不为。 —— 刘备
※ 来源:.南京大学小百合站 http://bbs.nju.edu.cn [FROM: 202.119.80.20]
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -