📄 27.txt

📁 This complete matlab for neural network
💻 TXT
📖 第 1 页 / 共 3 页
字号:
上一页 1 23
he application purpose of the new knowledge to be discovered (e.g. accurate 
classification or structure uncovering), pattern language, evaluation, and e
xtraction effort (granularity and extent of search).
INTERESTINGNESS of a pattern instance measures its quality and has several d
imensions. The main dimensions are the validation on the sample set, the rel
iability on the universe, the degree of redundancy with respect to other alr
eady known pattern instances, the generality, the simplicity, and the useful
ness.
VALIDATION checks a pattern instance (or component of an instance) referring
 to the subset of data in the sample set which is connected with the instanc
e. To check an instance, usually a statistical test or some other criteria a
re validated. Additionally to the decision, whether an instance is valid or 
not, often also an evidence measure is calculated.
EVIDENCE measures the statistical significance or some other kind of conspic
uousness of a pattern instance.
APPLICATION TEST is a filter which includes some preconditions for a pattern
 instance to be evaluated as interesting. The filter is used to avoid possib
ly extensive evaluation efforts, when the interestingness of an instance can
 be excluded already by the preconditions.
RELIABILITY includes some estimation on the validity in the universe of the 
pattern instance which was discovered in the sample set. Cross-validation me
thods can be applied to derive some estimation on the correctness of the sta
tement (pattern instance) in the universe.
REDUNDANCY relates to several pattern instances or to several knowledge comp
onents of a complex pattern instance (e.g. nodes in a tree). Redundancy is g
iven, if one instance or component follows (logically) from another one. Qua
ntification of redundancy can be introduced to measure the conditional proba
bility of one instance or component given the other one.
GENERALITY measures the strength of a pattern instance in terms of the size 
of the subset of objects which are described by the statement.
SIMPLICITY measures the syntactical complexity of a statement (pattern insta
nce).
USEFULNESS quantifies the possible usefulness of a statement (pattern instan
ce). The usefulness can be related to a task the user of a discovery system 
has to perform or to a task that a computer system can perform on the basis 
on the discovered new knowledge.
PATTERN TYPES are classes of patterns. The main classes are logic-numerical 
pattern, elementary pattern, and complex pattern.
LOGIC-NUMERICAL PATTERN holds the subclasses tree, rule, functional relation
, statistical pattern.
ELEMENTARY PATTERNS work on aggregations of data. Typical aggregations are g
iven by reports (e.g. based on group-by SQL operations) or multi-dimensional
 tabulations. Aggregation operations include count, sum, max, min, average, 
etc. Elementary patterns do not involve a complex search process and investi
gate rows or columns of these tabulations e.g. for monotony, convexity, conc
avity, maximum, minimum, discontinuity, outlier. Several rows or columns can
 also be compared (e.g. all cells in one row are larger than the correspondi
ng cells in another row).
COMPLEX PATTERNS are patterns not of the elementary pattern, or logic-numeri
cal pattern type. They are relevant for discovery in application domains wit
h data of the type complex structure.
TREE is a tree-like partition of an universe or sample set into a hierachica
lly ordered set of concepts. Each concept on a hierarchical level is recursi
vely divided into subconcepts on a next lower hierarchical level. Typically,
 concepts on each hierarchical level are disjoint and collectively exhaustiv
e, and the description of the subconcepts on the next level (concept languag
e) includes a further conjunctive term built with one further attribute. The
 main subtypes of this pattern type are classification trees. and regression
 trees.
TREE EXTRACTION METHOD uses criteria to select a (next) attribute for each c
oncept on a hierarchical level, to divide the attribute domain of this attri
bute in (disjoint) subsets which correspond to the subconcepts on the next l
evel, to terminate further partitioning of a concept, and to prune the tree.
 The criteria used by the extraction method depend on the extraction goals.
CLASSIFICATION TREE is a tree representing a set of classification rules for
 concept classes. Each leave of a classification tree is associated to a con
cept class, where the description of the leave constitutes a sufficient cond
ition for the concept class. Classification trees can be used to classify ob
jects following the concept descriptions from root to leaves.
REGRESSION TREE is a tree representing a set of homogenous concepts. A conce
pt (node) in this tree is homogeneous referring to a continuous attribute, i
.e. the variance of this attribute in the concept is minimal.
RULE correlates two concepts. Typically, the left hand side concept LHS is a
 sufficient condition for the right hand side concept RHS. A rule can be pre
sented as: If LHS then RHS. There are exact, strong, and probabilistic rules
 as well as attributive and first order rules.
EXACT RULE allows no exceptions. Each object of the LHS concept of a rule mu
st also be an element of the RHS concept.
STRONG RULE allows some exceptions. The number of exceptions mostly may not 
exceed a given limit expressed as percentage or absolutely.
PROBABILISTIC RULE relates the conditional probability P(RHS | LHS) to the p
robability P(RHS).
ATTRIBUTIVE or PROPOSITIONAL RULE is based on a propositional concept langua
ges.
FIRST ORDER RULE is based on a first order concept language. In this case, L
HS and RHS concepts are sets of object tuples (including several object clas
ses).
CLASSIFICATION RULE belongs to a system of classification rule which has to 
be discovered for given right hand side concept classes.
CHARACTERISTIC RULE belongs to a system of characteristic rules which has to
 be discovered for a given left hand side concept.
RULE EXTRACTION METHOD discovers a system of rules. The extraction goals inc
lude the desired subtypes of rules, (partly) fixing LHS resp. RHS concepts, 
and other goals like predictivity (maximal classification accuracy) or uncov
ering of structure.
FUNCTIONAL RELATION is a pattern relating a dependent attribute to one or se
veral independent attributes. Functional dependency and equation are subtype
s.
FUNCTIONAL DEPENDENCY exists between a dependent attribute and some independ
ent attributes, if for each pair of objects with equal values of the indepen
dent attributes the values of the dependent attribute are equal too. An APPR
OXIMATE FUNCTIONAL DEPENDENCY allows some exceptions (e.g. due to noise).
EQUATION is a pattern which relates a dependent attribute to independent att
ributes in the form of a mathematical functional equation.
FUNCTIONAL RELATIONSHIP EXTRACTION METHOD discovers the existence and/or equ
ation of a functional relationship. Typically search spaces of terms connect
ed by mathematical operations are generated and processed to identify equati
ons. In some application domains, it is useful to construct the space of ter
ms from predefined component terms.
STATISTICAL PATTERN describes significant concepts. The significancy of a co
ncept is verified by a statistical test based on a hypothesis on the concept
. Statistical dependency patterns are a main subtype.
STATISTICAL DEPENDENCY PATTERNS are statistical patterns, for which the hypo
thesis on a concept relates to the distribution of a dependent attribute in 
the concept. Subtypes of this pattern are given by distinguishing the attrib
ute type of the dependent attribute and some distribution parameters. The co
ncept (resp. the distribution) can be related to the whole Range. The concep
t can also be compared e.g. for different time points.
STATISTICAL PATTERN EXTRACTION METHOD operates on a search space of concepts
.
7. Search
DISCOVERY FOCUSSING is a major discovery task which has the aim to fix an in
dividual discovery problem. The main specifications resulting from this task
 relate to selecting a subset of data to be analysed and to extraction goals
. Discovery Focussing is primarily done by the user of a discovery system. H
owever, a system can transform more global and nontechnical specifications o
f the user into its technical constructs.
SEARCH is the central approach for pattern extraction. Search is performed i
n a search space usually by exploiting some structure in this space. Differe
nt search strategies can be applied by pattern extraction methods to generat
e and process the search space. Search can be arranged in several search pha
ses or iterations. search refinement can be provided to refine the results o
f a preliminary search.
SEARCH SPACE is a one or multi-dimensional space with a partial ordering. Th
e elements (nodes) of a search space can correspond to pattern instances (e.
g. rules) or to components of pattern instances (e.g. conjunct within a rule
 or single rule within a system of rules, node of a tree, or term of an equa
tion). Search spaces have to be constructed by a pattern extraction method a
ccording to discovery focussing. Search spaces can be constructed statically
 before search or dynamically during search.
SEARCH STRATEGIES are general approaches to construct and process search spa
ces . The main strategies are heuristic search, exhaustive search, data driv
en search, and concept driven search.
HEURISTIC SEARCH is a search strategy applied to generate and/or process onl
y a part of a total search space which includes all possible pattern instanc
es or components of pattern instances. Heuristic criteria determine, which p
arts are included into search. Typically heuristic search generates a satisf
ying solution, but not an optimal solution. Often search spaces are so large
, that only heuristic search can produce a solution in reasonable time. One 
step optimal search and beam search belong to the main heuristic search appr
oaches.
ONE STEP OPTIMAL SEARCH (or stepwise search) is a heuristic search strategy 
that is performed in several recursive steps. At each step, successor node(s
) of a node are determined which optimize a given local criterium.
BEAM SEARCH is a heuristic search strategy similar to One step optimal searc
h. At each step, the best n partial solutions are determined according to th
e local optimizing criterium and further processed.
EXHAUSTIVE SEARCH processes and evaluates all nodes of a search space , poss
ibly omitting those nodes which can be excluded as not interesting. Exhausti
ve search ensures an optimal solution, but often is not realistic because of
 time constraints.
DATA DRIVEN SEARCH organizes the major loop during search over the records o
f data. Each record is accessed sequentially and associated to a node in the
 search space . This node is updated according to the record. Several passes
 on data may proceed. After each path, some filter operation selects the bes
t nodes in the search space according to some criterium and elaborates these
 nodes in the next pass. Data driven search minimizes data accesses and can 
result in time efficient discovery.
CONCEPT DRIVEN SEARCH organizes the major loop during search around the stru
cture (e.g. partial ordering) of the search space . When a node of the searc
h space is processed, the associated subset of data is accessed. If these ac
cesses are performed randomly to external data, time efficiency of discovery
 may be a problem.
SEARCH REFINEMENT refines the results of a previous search phase. E.g., sear
ch granularity can be increased to search in the neighbourhood of a previou
sly identified node. pruning is another refinement technique.
PRUNING cuts search spaces. This can be done after search (postpruning) or d
uring search. E.g., a tree can be cut, to eliminate overspecializations.
DOMAIN EXPLORATION is a major discovery task. ...

--
※ 来源:．南京大学小百合站 bbs.nju.edu.cn．[FROM: 202.118.237.14]
上一页 1 23
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -