adult.html
来自「本程序是基于linux系统下c++代码」· HTML 代码 · 共 184 行
HTML
184 行
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html><head><title>R: Adult Data Set</title>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
<link rel="stylesheet" type="text/css" href="../../R.css">
</head><body>
<table width="100%" summary="page for Adult {arules}"><tr><td>Adult {arules}</td><td align="right">R Documentation</td></tr></table>
<h2>Adult Data Set</h2>
<h3>Description</h3>
<p>
The <code>AdultUCI</code> data set contains the questionnaire data of the
“Adult” database (originally called the “Census Income”
Database) formatted as a data.frame. The <code>Adult</code> data set contains the
data already prepared and coerced to <code><a href="transactions-class.html">transactions</a></code> for
use with <span class="pkg">arules</span>.
</p>
<h3>Usage</h3>
<pre>
data("Adult")
data("AdultUCI")
</pre>
<h3>Format</h3>
<p>
The <code>AdultUCI</code> data set contains a data frame with 48842
observations on the following 15 variables.
<dl>
<dt>age</dt><dd>a numeric vector.</dd>
<dt>workclass</dt><dd>a factor with levels <code>Federal-gov</code>,
<code>Local-gov</code>, <code>Never-worked</code>, <code>Private</code>,
<code>Self-emp-inc</code>, <code>Self-emp-not-inc</code>, <code>State-gov</code>,
and <code>Without-pay</code>.</dd>
<dt>education</dt><dd>an ordered factor with levels <code>Preschool</code> <
<code>1st-4th</code> < <code>5th-6th</code> < <code>7th-8th</code> < <code>9th</code> <
<code>10th</code> < <code>11th</code> < <code>12th</code> < <code>HS-grad</code> <
<code>Prof-school</code> < <code>Assoc-acdm</code> < <code>Assoc-voc</code> <
<code>Some-college</code> < <code>Bachelors</code> < <code>Masters</code> <
<code>Doctorate</code>.</dd>
<dt>education-num</dt><dd>a numeric vector.</dd>
<dt>marital-status</dt><dd>a factor with levels <code>Divorced</code>,
<code>Married-AF-spouse</code>, <code>Married-civ-spouse</code>,
<code>Married-spouse-absent</code>, <code>Never-married</code>,
<code>Separated</code>, and <code>Widowed</code>.</dd>
<dt>occupation</dt><dd>a factor with levels <code>Adm-clerical</code>,
<code>Armed-Forces</code>, <code>Craft-repair</code>, <code>Exec-managerial</code>,
<code>Farming-fishing</code>, <code>Handlers-cleaners</code>,
<code>Machine-op-inspct</code>, <code>Other-service</code>,
<code>Priv-house-serv</code>, <code>Prof-specialty</code>,
<code>Protective-serv</code>, <code>Sales</code>, <code>Tech-support</code>, and
<code>Transport-moving</code>.</dd>
<dt>relationship</dt><dd>a factor with levels <code>Husband</code>,
<code>Not-in-family</code>, <code>Other-relative</code>, <code>Own-child</code>,
<code>Unmarried</code>, and <code>Wife</code>.</dd>
<dt>race</dt><dd>a factor with levels <code>Amer-Indian-Eskimo</code>,
<code>Asian-Pac-Islander</code>, <code>Black</code>, <code>Other</code>, and
<code>White</code>.</dd>
<dt>sex</dt><dd>a factor with levels <code>Female</code> and <code>Male</code>.</dd>
<dt>capital-gain</dt><dd>a numeric vector.</dd>
<dt>capital-loss</dt><dd>a numeric vector.</dd>
<dt>fnlwgt</dt><dd>a numeric vector.</dd>
<dt>hours-per-week</dt><dd>a numeric vector.</dd>
<dt>native-country</dt><dd>a factor with levels <code>Cambodia</code>,
<code>Canada</code>, <code>China</code>, <code>Columbia</code>, <code>Cuba</code>,
<code>Dominican-Republic</code>, <code>Ecuador</code>, <code>El-Salvador</code>,
<code>England</code>, <code>France</code>, <code>Germany</code>, <code>Greece</code>,
<code>Guatemala</code>, <code>Haiti</code>, <code>Holand-Netherlands</code>,
<code>Honduras</code>, <code>Hong</code>, <code>Hungary</code>, <code>India</code>,
<code>Iran</code>, <code>Ireland</code>, <code>Italy</code>, <code>Jamaica</code>,
<code>Japan</code>, <code>Laos</code>, <code>Mexico</code>, <code>Nicaragua</code>,
<code>Outlying-US(Guam-USVI-etc)</code>, <code>Peru</code>,
<code>Philippines</code>, <code>Poland</code>, <code>Portugal</code>,
<code>Puerto-Rico</code>, <code>Scotland</code>, <code>South</code>, <code>Taiwan</code>,
<code>Thailand</code>, <code>Trinadad&Tobago</code>, <code>United-States</code>,
<code>Vietnam</code>, and <code>Yugoslavia</code>.</dd>
<dt>income</dt><dd>an ordered factor with levels <code>small</code> <
<code>large</code>.</dd>
</dl>
<h3>Details</h3>
<p>
The “Adult” database was extracted from the census bureau database
found at <a href="http://www.census.gov/ftp/pub/DES/www/welcome.html">http://www.census.gov/ftp/pub/DES/www/welcome.html</a> in 1994 by
Ronny Kohavi and Barry Becker, Data Mining and Visualization, Silicon
Graphics. It was originally used to predict whether income exceeds USD 50K/yr
based on census data. We added the attribute <code>income</code> with levels
<code>small</code> and <code>large</code> (>50K).
</p>
<p>
We prepared the data set for association mining as shown in the
section Examples. We removed the
continuous attribute <code>fnlwgt</code> (final weight).
We also eliminated <code>education-num</code> because it is just a
numeric representation of the attribute <code>education</code>.
The other 4 continuous attributes we mapped to ordinal attributes as
follows:
<dl>
<dt>age</dt><dd>cut into levels
<code>Young</code> (0-25),
<code>Middle-aged</code> (26-45),
<code>Senior</code> (46-65) and
<code>Old</code> (66+).</dd>
<dt>hours-per-week</dt><dd>cut into levels
<code>Part-time</code> (0-25),
<code>Full-time</code> (25-40),
<code>Over-time</code> (40-60) and
<code>Too-much</code> (60+).</dd>
<dt>capital-gain and capital-loss</dt><dd>each cut into levels
<code>None</code> (0),
<code>Low</code> (0 < median of the values greater zero < max) and
<code>High</code> (>=max).</dd>
</dl>
<h3>Source</h3>
<p>
<a href="http://www.ics.uci.edu/~mlearn/MLRepository.html">http://www.ics.uci.edu/~mlearn/MLRepository.html</a>
</p>
<h3>References</h3>
<p>
A. Asuncion & D. J. Newman (2007):
UCI Repository of Machine Learning Databases.
Irvine, CA: University of California, Department of Information and
Computer Science.
</p>
<p>
The data set was first cited in Kohavi, R. (1996): Scaling Up the Accuracy
of Naive-Bayes Classifiers: a Decision-Tree Hybrid. <EM>Proceedings of the
Second International Conference on Knowledge Discovery and Data Mining</EM>.
</p>
<h3>Examples</h3>
<pre>
data("AdultUCI")
dim(AdultUCI)
AdultUCI[1:2,]
## remove attributes
AdultUCI[["fnlwgt"]] <- NULL
AdultUCI[["education-num"]] <- NULL
## map metric attributes
AdultUCI[[ "age"]] <- ordered(cut(AdultUCI[[ "age"]], c(15,25,45,65,100)),
labels = c("Young", "Middle-aged", "Senior", "Old"))
AdultUCI[[ "hours-per-week"]] <- ordered(cut(AdultUCI[[ "hours-per-week"]],
c(0,25,40,60,168)),
labels = c("Part-time", "Full-time", "Over-time", "Workaholic"))
AdultUCI[[ "capital-gain"]] <- ordered(cut(AdultUCI[[ "capital-gain"]],
c(-Inf,0,median(AdultUCI[[ "capital-gain"]][AdultUCI[[ "capital-gain"]]>0]),
Inf)), labels = c("None", "Low", "High"))
AdultUCI[[ "capital-loss"]] <- ordered(cut(AdultUCI[[ "capital-loss"]],
c(-Inf,0, median(AdultUCI[[ "capital-loss"]][AdultUCI[[ "capital-loss"]]>0]),
Inf)), labels = c("None", "Low", "High"))
## create transactions
Adult <- as(AdultUCI, "transactions")
Adult
</pre>
<hr><div align="center">[Package <em>arules</em> version 0.6-6 <a href="00Index.html">Index]</a></div>
</body></html>
⌨️ 快捷键说明
复制代码Ctrl + C
搜索代码Ctrl + F
全屏模式F11
增大字号Ctrl + =
减小字号Ctrl + -
显示快捷键?