📄 27.txt

📁 This complete matlab for neural network
💻 TXT
📖 第 1 页 / 共 3 页
字号:
e number of records times the number of attributes and in practical applicat
ions ranges from bytes to terabytes.
DATA FORMAT is a data structure to represent a particular piece of data. Dat
a formats may be different for different applications. For instance, data ab
out a particular object can be arranged into a record, and many records can 
be arranged into a data matrix. The attribute type describes the set of valu
es of a given attribute and the meaningful operations on those values. Disco
very methods may be limited to special data types.
INCONCLUSIVE DATA. Especially in Knowledge Discovery in Databases (KDD) appl
ications, the available databases are installed for special purposes which m
ay differ from the KDD purposes. Therefore, some attributes which may be rel
evant for a discovery process are often missing in data. Those hidden variab
les may be important and their absence may make it impossible to discover si
gnificant knowledge about a given domain.
NOISY DATA. Often data are infected with errors due to the nature of the col
lection, measuring, or sensoring procedures. Statistical methods can treat p
roblems of noisy data.
MISSING DATA. Attribute values for some objects may be missing, because they
 were not measured, not answered, or simply lost. Discovery methods can trea
t missing data by omitting the corresponding records, inferring values for t
he missing values, or treat missing data as a special value to be included a
dditionally in the attribute domain.
NOT APPLICABLE DATA. Sometimes attribute values are missing, because they ar
e logically impossible for some objects, like the value "pregnant" for "male
" objects. Information about this special kind of missing data can be includ
ed in the domain knowledge and can be treated in a special way by discovery 
methods.
SPARSE DATA. The events actually represented in given databases or sample se
ts typically build only a very small (sparse) subset of the event space. The
 order of magnitude is much higher for the event space, due to the abundance
 of combinations in building the product set of the attribute domains. Espec
ially, this holds for sample sets.
EXTERNAL DATA refers to the permanently stored data and data structure. Exte
rnal data are often stored in a database management system. A discovery syst
em can transform data available in a database system into its own special ex
ternal data organization to speed up access and processing of data.
INTERNAL DATA refers to the data and its structure that is processed by a di
scovery method in main memory. Internal data are typically organized in data
 matrices. Discovery methods may process data incrementally. In this case, a
 loop over the input data can be organized, where at each step of the loop, 
only a small part of the input data is used by the method when processing th
e input data. A special incremental technique is data driven search.
RECORD is the collection of data belonging to one object. In the relational 
model, it is also called a tuple.
VIRTUAL ATTRIBUTE derives a value for each object of an object class by some
 user defined specification (transformation, method, etc.). Often, this spec
ification refers to other attributes.
ATTRIBUTE TYPE of an attribute can be nominal, ordinal, continuous, *complex
*.
ATTRIBUTE DOMAIN is the set of possible values of an attribute.
NOMINAL is an attribute type characterizing an attribute with an attribute d
omain for which no ordering is given.
ORDINAL is an attribute type characterizing an attribute with an attribute d
omain with an ordering.
CONTINUOUS is an attribute type characterizing an attribute with an attribut
e domain of a (dense) subset of real numbers. RATIO is a subtype with an int
erpretation of arithmetic operations (addition, multiplication) on domain va
lues.
TAXONOMY is a hierachical system of subsets of an attribute domain, mostly a
rranged as a tree.
DATA MATRIX is a subset of data systematically organized into a matrix in wh
ich each row represents the values of all attributes (of a subset of attribu
tes) for one object and each column represents values of one attribute for e
ach object (of a subset of objects).
DATA TYPE characterizes (a subset of) data by the number of object classes a
nd the attribute types of the attributes (that exist in this subset). Typica
l data types are rectangular and multi relational.
RECTANGULAR is a simple data type characterizing data with one class of obje
cts and non *complex* attribute types. In the relational model, this assumes
 a single table.
MULTI RELATIONAL is a data type characterizing data for several classes of o
bjects with non *complex* attribute types. Relations are available connectin
g the object classes.
TIME-SERIES is a data type for time series as logical data units. Relational
, object oriented, or special time series databases can be used to store tim
e series. One attribute represents different moments of time; the values of 
this attribute are ordered. Other attributes store information about co-inci
ding properties of objects.
COMPLEX-STRUCTURE is a data type characterizing data that do not belong to t
he rectangular, multi relational, or time-series type. Examples of complex-s
tructured data are chemical, genetical, physical structures, image data, tex
t and multimedia data.
DATA DICTIONARY includes information about the attribute types and values.
5. Sets of objects
EVENT SPACE refers to a selection of attributes of one object class. The eve
nt space is the product set of the attribute domains. To each event, a set o
f objects is associated. If a probability measure is defined for the univers
e, an event holds a probability.
CONCEPT is a subset of objects which may have some relevance in the applicat
ion domain. Often a concept is defined by an event belonging to some event s
pace, if a concept refers to one object class. In case of several object cla
sses, a concept is a product set of subsets of object classes, defined by pr
edicates. A concept language determines the concepts that can be defined.
CONCEPT LANGUAGE is used to construct concepts. Typical languages are first 
order concept languages and propositional concept languages.
CONCEPT SPACE is the set of all concepts which may be built within a concept
 language. The number of elements of this space depends on the type of the c
oncept language. For languages of strictly conjunctive form of order n with 
no internal disjunctions, this number is mostly limited enough to prevent se
vere combinatorial problems. The problem of combinatorial explosion, however
, is usually present for disjunctive normal forms without any order limitati
ons. A concept lattice is given by the partially ordered space of concept ex
tensions (subsets of objects with set inclusion as partial ordering) and the
 partially ordered space of concept descriptions (terms in the concept langu
age partially ordered by generality).
FIRST ORDER CONCEPT LANGUAGES use some subset of predicate logic, mostly fun
ction-free Horn clauses, to represent concepts and rule patterns.
PROPOSITIONAL CONCEPT LANGUAGES (attributive languages) refer to conditions 
on attributes and their values. The main subtypes of these languages are str
ictly conjunctive form and disjunctive normal forms.
STRICTLY CONJUNCTIVE FORM is a (propositional) concept languages with terms 
built by conjunctions of selectors. A main subtype is the strictly conjuncti
ve form of order n, allowing at most n conjunctions. Other subtypes are defi
ned by restrictions for the construction of selectors.
SELECTOR defines a selection condition with an attribute and one or several 
values of the attribute domain. In case of an ordinal attribute type, one or
 several intervals may appear in a selector. An internal disjunction include
s several values or intervals.
INTERNAL DISJUNCTION is a disjunctive selection built with several values of
 one attribute domain. In case of an ordinal attribute type, also a disjunct
ion of several intervals is possible. To restrict the number of internal dis
junctions, taxonomies can be defined.
DISJUNCTIVE NORMAL FORM is a (propositional) concept language with terms bui
lt by one or several disjunctions of conjunctions of selectors. The number o
f disjunctions is limited by n for disjunctive normal form of order n.
CONCEPT CLASSES are a set of concepts. Rule patterns refer to concept classe
s in their conditional or conclusion parts. Typically concept classes are di
sjoint. They form a partition, if they also cover all objects.
6. Patterns
PATTERN is a statement class which can be also regarded as a generic stateme
nt with free variables. Instantiated patterns (pattern instance) are candida
tes (hypotheses) to capture new knowledge on an application domain. An evalu
ation of a candidate exploits data and domain knowledge. A pattern is define
d by a pattern representation. Various pattern types are applied in Machine 
Discovery.
PATTERN INSTANCE is a member of a pattern class. It can capture an elemental
 part of new knowledge like a single rule or a composite part like a system 
of rules or a tree. A pattern instance is fixed by an instantiation of the f
ree variables in the generic statement belonging to the pattern class. A pat
tern instance is a statement S in a pattern language describing relationship
s among a subset DS of data of the application domain with interestingness i
. S is simpler than an enumeration of all records in DS.
PATTERN LANGUAGE is a formalism to communicate new knowledge on an applicati
on domain. The kind of statements constructed in such a language depends on 
the pattern type and varies from natural-language-like sentences like rules 
to more abstract statements like trees or even graphical statements of a gra
phical language. An important component of a pattern language is the concept
 language used to build concepts within patterns.
PATTERN REPRESENTATION refers to the representation of a pattern in a discov
ery system. The main representation components refer to pattern extraction, 
evaluation, presentation specifications, and pattern arguments.
PATTERN EXTRACTION is a major discovery task. For the various patterns and p
attern types, special pattern extraction methods (e.g. tree extraction metho
d, rule extraction method, functional dependency extraction method, or stati
stical pattern extraction method) discover new knowledge in the form of patt
ern instances. Pattern extraction methods rely on search and evaluation.
EVALUATION checks a pattern instance by measuring its interestingness. An ap
plication test can verify some preconditions for the interestingness of an i
nstance. In case of a composite instance like a tree, a system of rules, or 
an equation, also the components of the instance (node, single rule, term of
 an equation) can be evaluated.
PRESENTATION SPECIFICATIONS determine, how the contents of a pattern instanc
e are presented to the user, e.g. in natural language, tabular, graphical, o
r audio-visual form. Presentation templates are typical simple presentation 
specifications.
PRESENTATION TEMPLATE is a schema for a textual or graphical presentation of
 a statement (pattern instance). Typically such a schema has some parameters
. The values of the parameters are fixed by the pattern instance.
PATTERN ARGUMENTS correspond to free variables in the generic statement of a
 pattern. This component includes specifications on the admissible instantia
tions of the free variables and their properties, e.g. extraction properties
 exploited by a pattern extraction method. Range is an argument which is ava
ilable in most patterns.
RANGE is a subset of objects. Typically it is defined by a logical condition
 on some attributes and their values (concept language). It is used to restr
ict the scope of a pattern to a subset of objects. If the statement refers t
o several object classes, a range is a product of subsets of objects of thes
e classes.
EXTRACTION PROPERTIES of pattern arguments are exploited by a pattern extrac
tion method to construct a search space and operate on it. E.g., extraction 
properties can determine conditions, that exclude all subnodes of a node fro
m further search.
EXTRACTION GOALS are general directives for pattern extraction specified by 
the user of a discovery system during discovery focussing . They relate to t
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -