⭐ 欢迎来到虫虫下载站! | 📦 资源下载 📁 资源专辑 ℹ️ 关于我们
⭐ 虫虫下载站

📄 27.txt

📁 This complete matlab for neural network
💻 TXT
📖 第 1 页 / 共 3 页
字号:
发信人: mining (key), 信区: DataMining
标  题: Machine Discovery Terminology
发信站: 南京大学小百合站 (Tue Apr 29 13:39:03 2003)

Machine Discovery Terminology
Willi Kloesgen (kloesgen@gmd.de)
Jan Zytkow (zytkow@wise.cs.twsu.edu)
We compiled this preliminary list of terms relevant for Machine Discovery, t
heir definitions, and their most characteristic contents. The final goal is 
to describe the role of Machine Discovery and simplify the discussions withi
n Machine Discovery Community. However, our current definitions are neither 
complete, nor adequate, while their sequencing and grouping of terms may not
 be satisfactory. We invite you to participate in the elaboration and refine
ment process. Therefore, comments and revisions to the definitions and their
 groups, and suggested additional terms are most welcome. Finally, any remar
ks to the implementation of this discussion process which shall be based on 
WWW are very welcome.
Knowledge Discovery in Databases (KDD) uses concepts and techniques develope
d in many areas. Artificial intelligence subfields of machine discovery, mac
hine learning, heuristic search, knowledge representation, and statistics ar
e among the major contributors. Resources have also been acquired from field
s such as databases, various sciences, philosophy of science, logic and roug
h sets. High performance computing methods such as parallel techniques for d
ata management and search are used for discovery in very large databases.
To develop mutual understanding between disciplines and attract interest in 
KDD from other research communities, we frequently reference technical terms
 that can be recognized by researchers in related disciplines. This may also
 give KDD researchers pointers to relevant work in other domains.
KDD is closely related to Machine Discovery, a domain about 10 years older. 
While the main emphasis of Machine Discovery has been on expanding the auton
omy of artificial discoverers by automating new skills, KDD has been oriente
d towards practical results, combining human intervention with automated tec
hniques. Since both fields share many discovery techniques, evaluation metho
ds, and knowledge representation problems, we will include elements of machi
ne discovery terminology.
Ancient Greek philosophers realized that chains of definitions cannot go on 
indefinitely and they recognized the need for primitive, undefined terms. We
 hesitate to use the term ``definition'' to characterize our work on termino
logy, but we must leave some terms undefined, hence:
we do not define terms which are technically defined in other disciplines; t
hose definitions can be easily found elsewhere;
we do not define common sense terms; defining them is asking for trouble;
We rely on common understanding of key terms, such as ``knowledge'', ``theor
y'' and ``model'', so that we can treat them briefly. These terms have indef
initely many shades of meanings, and they also have been technically defined
 in disciplines such as logic and philosophy of science.
Our explanations rely on increasingly less abstract concepts. We frequently 
resort to enumeration of examples.
Complex structures are explained by elements and their interrelations.
1. Discovery Systems
MACHINE DISCOVERY: develops discovery methods and discovery systems to suppo
rt knowledge discovery processes. Although discovery methods and processes s
hare basic commonalities, sufficient differences exist to distinguish Knowle
dge Discovery in Databases, Automated Scientific Discovery, automated discov
ery in mathematics, and discovery by autonomous intelligent robots.
KNOWLEDGE DISCOVERY PROCESS: seeks new knowledge about an application domain
. Consists of many discovery steps, each attempting at the completion of a p
articular discovery task, and accomplished by the application of a discovery
 method. The discovery process interacts repeatedly with a given domain, usi
ng search in various search spaces. New knowledge is inferred from data and/
or from old knowledge. New knowledge is recognized by a discovery system via
 the autonomous use of evaluation criteria.
DISCOVERY STEP: is part of a discovery process. The main discovery steps inc
lude domain exploration, data collection, pattern extraction from data, *ind
uctive generalizations*, *knowledge verification*, *knowledge transformation
*. A knowledge discovery process may use steps which enable further discover
ies, but do not directly lead to new knowledge, such as knowledge presentati
on, management of data, management of domain knowledge, and selection of new
 goals. A concrete discovery step is an application of a concrete discovery 
method.
DISCOVERY METHOD: an algorithm designed to accomplish a discovery task. A di
scovery method can be a reconstruction of a human activity used to acquire n
ew knowledge, can combine human methods in a novel way, but can also be a ne
w method. Machine Discovery adapts methods from Machine Learning (defining n
ew concepts, taxonomy formation, conceptual clustering, learning from exampl
es), Statistics (pattern fitting, pattern evaluation, classification and reg
ression, cross-validation), Intelligent Database Management (parallel data b
ase servers, query optimization), Visualization and Geographical Information
 Systems (interactive graphics, knowledge presentation).
DISCOVERY TASK: a request for a specific component of new knowledge. ``Find 
regularity'', ``generalize a regularity'', ``combine regularities into theor
y'' are examples of tasks. Each discovery task can be best characterized by 
the search space explored to accomplish that task, because we do not know in
 advance the concrete form of new knowledge or even whether any knowledge wi
ll be discovered in a given input.
DISCOVERY SYSTEM: a software (and possibly also hardware system) that autono
mously performs or supports a user in performing knowledge discovery process
es. Typically, a discovery system integrates various discovery methods, the 
majority of which are based on search. Discovery systems can be used in inte
ractive or automated mode and can be compared by evaluating their accuracy, 
autonomy, efficiency, and versatility.
ACCURACY: the degree of fit between discovered *theories* and data. Accuracy
 applies to existing data and to predictions about new data.
AUTONOMY: extent to which a discovery system evaluates its decisions and pro
duces new knowledge automatically, without external intervention. The degree
 of autonomy ranges from "apprentice systems" with low autonomy over "assist
ant systems" to "associate" and "master systems" which are almost automatic 
discoverers.
EFFICIENCY: computational effort to accomplish a given discovery task. Expre
ssed as a function of the complexity of inputs and size of the search space.

VERSATILITY: the variety of application domains to which a discovery system 
can be applied, and the variety of alternative discovery methods, which it c
an use.
KNOWLEDGE DISCOVERY IN DATABASES (KDD): concerns knowledge discovery process
es applied to databases. KDD deals with the ready data available in all doma
ins of science and in applied domains of marketing, planning, control, etc. 
Typically, KDD has to deal with inconclusive data, noisy data, and sparse da
ta.
AUTOMATED SCIENTIFIC DISCOVERY (ASD): deals with knowledge discovery process
es analogous to those used by scientists. In distinction to KDD, a discovery
 process in ASD may seek additional data to improve the quality and expand t
he scope of generated new knowledge, make experiments, and improve experimen
t design. ASD applies mainly in Natural Sciences (Astronomy, Biology, Chemis
try, Physics, etc.).
2. World
APPLICATION DOMAIN is a real or abstract system existing independently from 
the discovery system. An application domain consists of objects, which can b
elong to one or several classes, and of object attributes and relationships 
between objects. Rather than to the whole world, discovery systems apply to 
limited application domains, with the intent to discover useful domain model
s and domain theory. In empirical discovery, the application domain becomes 
known by data, from which a discovery process attempts to generate new knowl
edge.
OBJECT (entity, unit, case) is a member or a part of an application domain (
universe). Objects can belong to different classes of similar objects, such 
as persons, transactions, locations, events, and processes. Objects possess 
attributes and relationships to other objects.
ATTRIBUTE (field, variable) characterizes a single aspect of objects of an o
bject class. An attribute has a value for each object in that class. This va
lue is typically a number or a label. The value may be also a complex struct
ure like a time series or even a picture that represents a person or a locat
ion in a multi-media application.
RELATIONSHIP (relation) combines objects from several object classes. A rela
tion can be seen as a subset of a product set of several object classes. The
 relation holds for elements of this subset.
3. Knowledge
DOMAIN MODEL is a representation of one or several classes of objects of the
 application domain and some of their relations. The set of all objects form
s the universe of the model. Domain model represents the perspective that a 
discovery process has on the application domain. A domain model can include 
data and domain knowledge. The formalisms used to build a domain model range
 from simple data files with added data dictionary to knowledge representati
on paradigms of Artificial Intelligence. An initially defined domain model i
s gradually elaborated in the course of knowledge discovery processes to ach
ieve a domain theory.
UNIVERSE. For a class of objects represented in a domain model, a universe i
s the total set of all possible objects of this class. Often a probability m
easure is given or assumed for the universe. The subset of objects represent
ed in the domain model belongs to a sample set.
SAMPLE SET is the subset of objects of an universe which are represented in 
the domain model and for which data are available. Often some probabilistic 
properties of the sample set are given or assumed.
DOMAIN THEORY is a comprehensive, consistent, and valid model of the applica
tion domain. Depending on the knowledge representation used to build the mod
el, these characteristics measuring the quality of a domain theory are more 
formally defined.
DOMAIN KNOWLEDGE holds all information specific to the application domain, n
ot belonging to data. To the main categories of domain knowledge belong: dat
a dictionary knowledge, taxonomies, global statistical characteristics of th
e data, and specifications for interestingness. Some examples of domain know
ledge ...
NEW KNOWLEDGE augments or refines the contents of the current domain model. 
New knowledge can be presented to the user and can extend the user's mental 
model of the application domain. The "user" of a discovery system can also b
e the same or another system, and new knowledge can augment the performance 
of this system, and may be used in making further discoveries. Typically, th
e user supervises the refinement of the domain model by new knowledge. New k
nowledge is described within a knowledge representation formalism.
KNOWLEDGE REPRESENTATION deals with data structures representing knowledge a
bout many application domains. Artificial Intelligence offers knowledge repr
esentation paradigms like frames, production rules, semantic networks, first
 order logic. Typical knowledge representation structures used in discovery 
systems are patterns like trees, rules, functional relations.
4. Data
DATA consist of the collected (measured, sensed, polled, observed, etc.) att
ribute values for objects and relationships between objects in the applicati
on domain. Data coming from experiments include the results of manipulations
 and the subsequent readings of sensors. For the sake of completeness, speci
al values Missing Data or Not applicable data can be used. Data can be arran
ged in various data formats. The meaning of data in databases is represented
 by a data dictionary. In Automated Scientific Discovery, the meaning of dat
a is represented by manipulators and sensors and operating procedures throug
h which they acquire data. The volume of data may be measured in bytes or th

⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -