📄 34.txt
字号:
发信人: mining (key), 信区: DataMining
标 题: UCI data set description(7)
发信站: 南京大学小百合站 (Tue Apr 29 13:53:10 2003)
Low Resolution Spectrometer Database
From IRAS data -- NASA Ames Research Center
Documentation: no statistics nor class distribution given
LARGE database...and this is only 531 of the instances
98 attributes per instance (all numeric)
Contact NASA-Ames Research Center for more information
Ftp Access
Spambase Database
Donated by George Forman (gforman at nospam hpl.hp.com) 650-857-7835 Mark Ho
pkins, Erik Reeber and Jaap Suermondt.
Number of Instances: 4601 (1813 Spam = 39.4%)
Number of Attributes: 58 (57 continuous, 1 nominal class label)
The "spam" concept is diverse: advertisements for products/web sites, make m
oney fast schemes, chain letters, pornography... Our collection of spam e-ma
ils came from our postmaster and individuals who had filed spam. Our collect
ion of non-spam e-mails came from filed work and personal e-mails, and hence
the word 'george' and the area code '650' are indicators of non-spam. These
are useful when constructing a personalized spam filter. One would either h
ave to blind such non-spam indicators or get a very wide collection of non-s
pam to generate a general purpose spam filter.
Ftp Access
SPECT and SPECTF heart databases
Donated by Krzysztof J. Cios & Lukasz A. Kurgan (Krys.Cios@cudenver. edu)
Documentation: Describes diagnosing of cardiac Single Proton Emission Comput
ed Tomography (SPECT) images. Each of the patients is classified into two ca
tegories: normal and abnormal.
267 image sets (patients) in each dataset
23 attributes per instance (22 binary, 1 binary class) in SPECT
44 attributes per instance (43 binary, 1 binary class) in SPECTF
Ftp Access
Sponge Database
Donated by Javier Bejar and Ulises Cortes
Classification of atlantic-mediterranean marine sponges
76 instances
45 nominal and numeric attributes (some missing values)
Ftp Access
Statlog Project Databases
Donated by Ross King
Vehicle Silhouettes: 3D objects within a 2D image by application of an ensem
ble of shape feature extractors to the 2D silhouettes of the objects.
Landsat Satellite: multi-spectral values of pixels in 3x3 neighbourhoods in
a satellite image, and the classification associated with the central pixel
in each neighbourhood
Shuttle: The shuttle dataset contains 9 attributes all of which are numerica
l. Approximately 80% of the data belongs to class 1
Australian Credit Approval: This file concerns credit card applications. Thi
s database exists elsewhere in the repository (Credit Screening Database) in
a slightly different form
Heart Disease: This dataset is a heart disease database similar to a databas
e already present in the repository (Heart Disease databases) but in a sligh
tly different form
Image Segmentation: This dataset is an image segmentation database similar t
o a database already present in the repository (Image segmentation database)
but in a slightly different form.
German Credit Database: This dataset classifies people described by a set of
attributes as good or bad credit risks. Comes in two formats (one all numer
ic). Also comes with a cost matrix
Ftp Access
Student Loan Relational Database
Donated by Michael Pazzani
Target concept: no_payment_due by person for student loan
1000 instances of target concept
Includes domain theory
10+ extensionally and intesionally defined relations
Ftp Access
Teaching Assistant Evaluation
Collected by Wei-Yin Loh (Department of Statistics, UW-Madison)
Donated by Tjen-Sien Lim (limt@stat.wisc.edu)
151 instances, 6 attributes , 3 classes
The data consist of evaluations of teaching performance over three regular s
emesters and two summer semesters of 151 teaching assistant (TA) assignments
at the Statistics Department of the University of Wisconsin-Madison. The sc
ores were divided into 3 roughly equal-sized categories ("low", "medium", an
d "high") to form the class variable.
Ftp Access
Tic-Tac-Toe Endgame Database
Donated by David W. Aha, Turing Institute
Documentation complete as of Summer 1991
958 instances, all attributes can take on 1 of 3 possible values
Binary classification task (i.e., "win for x")
A paradigmatic domain for constructive induction studies
Ftp Access
Thyroid Disease Database
From Garavan Institute
Documentation: as given by Ross Quinlan
6 databases from the Garavan Institute in Sydney, Australia
Approximately the following for each database:
2800 training (data) instances and 972 test instances
Plenty of missing data
29 or so attributes, either Boolean or continuously-valued
2 additional databases, also from Ross Quinlan, are also here
Hypothyroid.data and sick-euthyroid.data
Quinlan believes that these databases have been corrupted
Their format is highly similar to the other databases
1 more database of 9172 instances that cover 20 classes, and a related domai
n theory
Another thyroid database from Stefan Aeberhard
3 classes, 215 instances, 5 attributes
No missing values
A Thyroid database suited for training ANNs
3 classes
3772 training instances, 3428 testing instances
Includes cost data (donated by Peter Turney)
Ftp Access
Trains Database
Donated by David Aha & Eric Bloedorn
Original owners: R. Michalski & R. Stepp
10 instances
10 attributes + class (direction: east or west)
2 data formats (structured, one-instance-per-line)
Includes "East-West" competion data and results (donated by Peter Turney)
Ftp Access
University Database
Donated by Steve Souders
Documentation: scant; we've left it in its original (LISP-readable) form
285 instances, including some duplicates
At least one attribute, academic-emphasis, can have multiple values per inst
ance
The user is encouraged to pursue the Lebowitz reference for more information
on the database
Ftp Access
Congressional Voting Records Database
1984 United Stated Congressional Voting Records
Classification: Republican or Democrat
Documentation: completed
All attributes are Boolean valued; plenty of missing values; 2 classes
Ftp Access
Water Treatement Plant Database
Donated by Javier Bejar and Ulises Cortes
38 numeric attributes; 527 instances; missing values
Multiple classes predict plant state
Ill-Stuctured Domain
Ftp Access
Waveform Data Generator
From Classification and Regression Trees book
Documentation: no statistics
CART book's waveform domains
21 and 40 continuous attributes respectively
difficult concepts to learn, but known Bayes optimal classification rate of
86% accuracy
Ftp Access
Wine Recognition Database
Donated by Stefan Aeberhard
Using chemical analysis determine the origin of wines
13 attributes (all continuous), 3 classes, no missing values
178 instances
Ftp Access
Yeast Database
Donated by Paul Horton (see also: Ecoli database)
Predicting the Cellular Localization Sites of Proteins
Documentation: On everything
1484 instances, 8 attributes (one nominal)
No missing attribute values
Ftp Access
Zoo Database
From Richard Forsyth
Artificial
7 classes of animals
17 attributes (besides name), 15 Boolean and 2 numeric-valued
No missing attribute values
Ftp Access
Undocumented Databases
Mike Pazzani's economic sanctions database
Philippe Collard's database on cloud cover images
Vince Sigillito's database on dna secondary structure
Nettalk data (see connectionist-bench)
Sonar data (see connectionist-bench)
Vowel data (see connectionist-bench)
Ftp Access
--
※ 来源:.南京大学小百合站 bbs.nju.edu.cn.[FROM: 202.118.237.14]
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -