📄 termfreq.html
字号:
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html><head><title>R: Term Frequency Vector</title>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
<link rel="stylesheet" type="text/css" href="../../R.css">
</head><body>
<table width="100%" summary="page for termFreq {tm}"><tr><td>termFreq {tm}</td><td align="right">R Documentation</td></tr></table>
<h2>Term Frequency Vector</h2>
<h3>Description</h3>
<p>
Generate a term frequency vector from a text document.
</p>
<h3>Usage</h3>
<pre>
termFreq(doc, control = list())
</pre>
<h3>Arguments</h3>
<table summary="R argblock">
<tr valign="top"><td><code>doc</code></td>
<td>
an object inheriting from <code>TextDocument</code>.</td></tr>
<tr valign="top"><td><code>control</code></td>
<td>
a list of control options. Possible settings are
<ul>
<li><code>tolower</code>: a function converting characters to lower
case. Defaults to <code>base::tolower</code>.
<li><code>tokenize</code>: a function tokenizing documents to single
tokens. Defaults to <code>function(x) unlist(strsplit(gsub("[^[:alnum:]]+", " ", x), " ", fixed = TRUE)</code>.
<li><code>removeNumbers</code>: a Boolean value indicating whether
numbers should be removed from <code>doc</code>.
<li><code>stemming</code>: a Boolean value indicating whether tokens
should be stemmed. Defaults to <code>FALSE</code>.
<li><code>stopwords</code>: either a Boolean value indicating stopword
removal using default language specific stopword lists shipped
with this package or a character vector holding custom stopwords.
<li><code>dictionary</code>: a character vector to be tabulated
against. No other terms will be listed in the result. Defaults to
no action (i.e., all terms are considered).
<li><code>minDocFreq</code>: an integer value. Words that appear less
often in <code>doc</code> than this number are discarded. Defaults to
<code>1</code> (i.e., every token will be used).
<li><code>minWordLength</code>: an integer value. Words smaller than
this number are discarded. Defaults to length <code>3</code>.
</ul>
</td></tr>
</table>
<h3>Value</h3>
<p>
A named integer vector with term frequencies as values and tokens as
names.</p>
<h3>Examples</h3>
<pre>
data("crude")
termFreq(crude[[1]])
termFreq(crude[[1]], control = list(stemming = TRUE, minWordLength = 4))
</pre>
<hr><div align="center">[Package <em>tm</em> version 0.3 <a href="00Index.html">Index]</a></div>
</body></html>
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -