📄 interestmeasure.html

📁 本程序是基于linux系统下c++代码
💻 HTML
字号:
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html><head><title>R: Calculating various additional interest measures</title>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
<link rel="stylesheet" type="text/css" href="../../R.css">
</head><body>

<table width="100%" summary="page for interestMeasure {arules}"><tr><td>interestMeasure {arules}</td><td align="right">R Documentation</td></tr></table>
<h2>Calculating various additional interest measures</h2>


<h3>Description</h3>

<p>
Provides the generic function <code>interestMeasure</code> and the needed S4 method 
to calculate various additional interest measures for existing sets of
itemsets or rules.
</p>


<h3>Usage</h3>

<pre>
interestMeasure(x, method, transactions = NULL, reuse = TRUE, ...)
</pre>


<h3>Arguments</h3>

<table summary="R argblock">
<tr valign="top"><td><code>x</code></td>
<td>
a set of itemsets or rules. </td></tr>
<tr valign="top"><td><code>method</code></td>
<td>
name or vector of names of the desired interest measures 
(see details for available measures).</td></tr>
<tr valign="top"><td><code>transactions</code></td>
<td>
the transaction data set used to mine 
the associations. </td></tr>
<tr valign="top"><td><code>reuse</code></td>
<td>
logical indicating if information in quality slot should
be reuse for calculating the measures. This speedes up the process
significantly since only very little (or no) transaction counting 
is necessary if support, confidence and lift are already available.
Use <code>reuse=FALSE</code> to force counting (might be very slow).</td></tr>
<tr valign="top"><td><code>...</code></td>
<td>
further arguments for the measure calculation. </td></tr>
</table>

<h3>Details</h3>

<p>
For itemsets the following measures are implemented:  
<dl>
<dt>"allConfidence"</dt><dd>(see, Omiencinski, 2003) is defined on itemsets as the
minimum confidence of all possible rule generated from the itemset.</dd>
<dt>"crossSupportRatio"</dt><dd>(see, Xiong et al., 2003) is defined on itemsets as
the ratio of the support of the least frequent item to the support of the most
frequent item.  Cross-support patterns have a ratio smaller than a set
threshold. Normally many found patterns are cross-support patterns which
contain frequent as well as rare items. Such patterns often tend to be
spurious.</dd>
<dt>"support"</dt><dd>calculate itemset support.</dd>
</dl>

<p>
For rules the following measures are implemented:  
<dl>
<dt>"chiSquare"</dt><dd>(see Liu et al. 1999). The chi-square statistic 
to test for independence between the lhs and rhs of the rule.
The critical value of the chi-square distribution with <i>1</i> degree of 
freedom (2x2 contengency table) at <i>alpha=0.05</i> 
is <i>3.84</i>; higher chi-square
values indicate that the lhs and the rhs are not independent.  
</dd>
<dt>"confidence"</dt><dd>calculate rule confidence. Range <i>0...1</i>.</dd>
<dt>"conviction"</dt><dd>(see Brin et al. 1997) defined as 
<i>P(X)P(not Y)/P(X and not Y)</i>. 
Range: <i>0.5...1... Inf</i> (<i>1</i> indicates unrelated items).</dd>
<dt>"cosine"</dt><dd>(see Tan et al. 2004) equivalent to the IS measure. 
Range: <i>0...1</i>. 
</dd>
<dt>"coverage"</dt><dd>calculate rule coverage (support of LHS). 
Range: <i>0...1</i>.</dd>
<dt>"doc"</dt><dd>calculate difference of confidence, which is defined
by Hofmann and Wilhelm (2001) as 
<i>conf(X -&gt; Y)-conf(!X -&gt; Y)</i>.
Range: <i>-1...1</i>.</dd>
<dt>"gini"</dt><dd>gini index (see Tan et al. 2004). Range: <i>0...1</i>.</dd>
<dt>"hyperLift"</dt><dd>(see, Hahsler and Hornik, 2007) is an adaptation of the lift
measure which is more robust for low counts. It is based on the idea that under
independence the count <i>c_{XY}</i> of the transactions which contain all items
in a rule <i>X -&gt; Y</i> follows a hypergeometric distribution 
(represented by the random variable <i>C_{XY}</i>) with
the parameters given by the counts  <i>c_X</i> and  <i>c_Y</i>.
</p>
<p>
Lift is defined for the rule <i>X -&gt; Y</i> as:
</p><p align="center"><i>lift(X -&gt; Y) = P(X+Y)/(P(X)*P(Y)) = c_XY / E[C_XY],</i></p>
<p>
where <i>E[C_{XY}] = c_X c_Y / m</i> with <i>m</i> being the number
of transactions in the database.
</p>
<p>
Hyper-lift is defined as:
</p><p align="center"><i>hyperlift(X -&gt; Y) = c_XY / Q_d[C_XY],</i></p>
<p>
where  <i>Q_d[C_XY]</i> is the
quantile of the hypergeometric distribution given by <i>d</i>.
The quantile can be given
as parameter <code>d</code> (default: <code>d=0.99</code>).
Range: <i>0... Inf</i>.
</dd>


<dt>"hyperConfidence"</dt><dd>(Hahsler and Hornik, 2007)
calculates the confidence level that we observe too high/low counts 
for rules <i>X -&gt; Y</i> using the hypergeometric model.
Since the counts are drawn from a hypergeometric distribution 
(represented by the random variable <i>C_{XY}</i>) with
known parameters given by the counts  <i>c_X</i> and  <i>c_Y</i>,
we can calculate a confidence interval for the observed counts 
<i>c_{XY}</i> stemming from the distribution. Hyperconfidence
reports the confidence level 
(significance level if <code>significance=TRUE</code> is used) for
<dl>
<dt>complements -</dt><dd><i>1 - P[C_{XY} &gt;= c_{XY} | c_X, c_Y]</i>
</dd>
<dt>substitutes -</dt><dd><i>1 - P[C_{XY} &lt; c_{XY} | c_X, c_Y]</i>.
</dd>
</dl>
<p>
A confidence level of, e.g., <i>&gt; 0.95</i> indicates that
there is only a  5% chance that the count for the rule was generated
randomly.
</p>
<p>
Per default complementary effects are mined, substitutes can be found
by using the parameter <code>complements = FALSE</code>. 
Range: <i>0...1</i>.
</dd>
<dt>"improvement"</dt><dd>(see Bayardo et al. 2000)
the  improvement of a rule is 
the minimum difference between its confidence and the confidence of any
proper sub-rule with the same consequent. Range: <i>0...1</i>.</dd>
<dt>"leverage"</dt><dd>(see Piatetsky-Shapiro 1991)
defined as <i>P(X-&gt;Y) - (P(X)P(Y))</i>.
It measures the difference of X and Y appearing together in the data set 
and what would be expected if X and Y where statistically dependent. 
Range: {-1...1}.</dd>
<dt>"lift"</dt><dd>calculate rule lift. Range: <i>0... Inf</i>.</dd>
<dt>"oddsRatio"</dt><dd>(see Tan et al. 2004).
The odds of finding X in transactions which contain Y divided by
the odds of finding X in transactions which do not contain Y.
Range: <i>0...1... Inf</i> (
<i>1</i> indicates that Y is not associated to X).</dd> 
<dt>"phi"</dt><dd>the correlation coefficient <i>phi</i> 
(see Tan et al. 2004) Range: <i>-1</i> (perfect neg. correlation)
to <i>+1</i> (perfect pos. correlation).</dd>
<dt>"RLD"</dt><dd>(Relative Linkage Disequilibrium; see Kenett and Salini 2008).
RLD evaluates the deviation
of the support of the whole rule from the support expected under in-
dependence given the supports of the LHS and the RHS. The code was
contributed by Silvia Salini. Range: <i>0...1</i>.</dd>
<dt>"support"</dt><dd>calculate rule support. Range: <i>0...1</i>.</dd>
</dl>

<h3>Value</h3>

<p>
If only one method is used, the function returns a numeric vector 
containing the values of the interest measure for each association
in the set of associations <code>x</code>. 
<br>
If more than one methods are specified, the result is a data.frame 
containing the different measures for each association.</p>

<h3>References</h3>

<p>
R. Bayardo, R. Agrawal, and D. Gunopulos (2000). Constraint-based rule mining
in large, dense databases.  <EM>Data Mining and Knowledge Discovery</EM>,
4(2/3):217&ndash;240, 2000.
</p>
<p>
Sergey Brin, Rajeev Motwani, Jeffrey D. Ullman, and Shalom Tsur (1997). Dynamic
itemset counting and implication rules for market basket data. In <EM>SIGMOD
1997, Proceedings ACM SIGMOD International Conference on Management of Data</EM>,
pages 255&ndash;264, Tucson, Arizona, USA.
</p>
<p>
Michael Hahsler and Kurt Hornik. New probabilistic interest measures for association rules. <EM>Intelligent Data Analysis</EM>, 11(5):437&ndash;455, 2007
</p>
<p>
Heike Hofmann and Adalbert Wilhelm. Visual comparison of association rules. 
<EM>Computational Statistics</EM>, 16(3):399&ndash;415, 2001.
</p>
<p>
Ron Kenett and Silvia Salini. Relative Linkage Disequilibrium: A New
measure for association rules. In <EM>8th Industrial Conference on 
Data Mining ICDM 2008 July 16&ndash;18, 2008, Leipzig/Germany,</EM> to appear, 2008.
</p>
<p>
Bing Liu, Wynne Hsu, and Yiming Ma (1999). Pruning and summarizing the
discovered associations. In <EM>KDD '99: Proceedings of the fifth ACM SIGKDD
international conference on Knowledge discovery and data mining</EM>, pages
125&ndash;134.  ACM Press, 1999.
</p>
<p>
Edward R. Omiecinski (2003). Alternative interest measures for mining
associations in databases. <EM>IEEE Transactions on Knowledge and Data
Engineering</EM>, 15(1):57&ndash;69, Jan/Feb 2003.
</p>
<p>
Pang-Ning Tan, Vipin Kumar, and Jaideep Srivastava (2004). Selecting the right
objective measure for association analysis. <EM>Information Systems</EM>,
29(4):293&ndash;313.
</p>
<p>
Piatetsky-Shapiro, G. (1991). Discovery, analysis, and presentation of strong
rules. In: <EM>Knowledge Discovery in Databases</EM>, pages 229&ndash;248.
</p>
<p>
Hui Xiong, Pang-Ning Tan, and Vipin Kumar (2003). Mining strong affinity
association patterns in data sets with skewed support distribution. In Bart
Goethals and Mohammed J. Zaki, editors, <EM>Proceedings of the IEEE
International Conference on Data Mining</EM>, November 19&ndash;22, 2003, Melbourne,
Florida, pages 387&ndash;394.
</p>


<h3>See Also</h3>

<p>
<code><a href="itemsets-class.html">itemsets-class</a></code>, <code><a href="rules-class.html">rules-class</a></code>
</p>


<h3>Examples</h3>

<pre>
data("Income")
rules &lt;- apriori(Income)

## calculate a single measure and add it to the quality slot
quality(rules) &lt;- cbind(quality(rules), 
        hyperConfidence = interestMeasure(rules, method = "hyperConfidence", 
        Income))

inspect(head(SORT(rules, by = "hyperConfidence")))

## calculate several measures
m &lt;- interestMeasure(rules, c("confidence", "oddsRatio", "leverage"), Income)
inspect(head(rules))
head(m)
</pre>



<hr><div align="center">[Package <em>arules</em> version 0.6-6 <a href="00Index.html">Index]</a></div>

</body></html>
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -