📄 termfreq.html

📁 R-Project是一个开源的统计软件,专门有一个R语言,类似S语言,这个包里面就是一个R实现的文本挖掘(text mining简称tm)的包.里面有代码和样本数据.

💻 HTML

字号:

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html><head><title>R: Term Frequency Vector</title>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
<link rel="stylesheet" type="text/css" href="../../R.css">
</head><body>

<table width="100%" summary="page for termFreq {tm}"><tr><td>termFreq {tm}</td><td align="right">R Documentation</td></tr></table>
<h2>Term Frequency Vector</h2>


<h3>Description</h3>

<p>
Generate a term frequency vector from a text document.
</p>


<h3>Usage</h3>

<pre>
termFreq(doc, control = list())
</pre>


<h3>Arguments</h3>

<table summary="R argblock">
<tr valign="top"><td><code>doc</code></td>
<td>
an object inheriting from <code>TextDocument</code>.</td></tr>
<tr valign="top"><td><code>control</code></td>
<td>
a list of control options. Possible settings are
<ul>
<li><code>tolower</code>: a function converting characters to lower
case. Defaults to <code>base::tolower</code>.
<li><code>tokenize</code>: a function tokenizing documents to single
tokens. Defaults to <code>function(x) unlist(strsplit(gsub("[^[:alnum:]]+", " ", x), " ", fixed = TRUE)</code>.
<li><code>removeNumbers</code>: a Boolean value indicating whether
numbers should be removed from <code>doc</code>.
<li><code>stemming</code>: a Boolean value indicating whether tokens
should be stemmed. Defaults to <code>FALSE</code>.
<li><code>stopwords</code>: either a Boolean value indicating stopword
removal using default language specific stopword lists shipped
with this package or a character vector holding custom stopwords.
<li><code>dictionary</code>: a character vector to be tabulated
against. No other terms will be listed in the result. Defaults to
no action (i.e., all terms are considered).
<li><code>minDocFreq</code>: an integer value. Words that appear less
often in <code>doc</code> than this number are discarded. Defaults to
<code>1</code> (i.e., every token will be used).
<li><code>minWordLength</code>: an integer value. Words smaller than
this number are discarded. Defaults to length <code>3</code>.
</ul>
</td></tr>
</table>

<h3>Value</h3>

<p>
A named integer vector with term frequencies as values and tokens as
names.</p>

<h3>Examples</h3>

<pre>
data("crude")
termFreq(crude[[1]])
termFreq(crude[[1]], control = list(stemming = TRUE, minWordLength = 4))
</pre>



<hr><div align="center">[Package <em>tm</em> version 0.3 <a href="00Index.html">Index]</a></div>

</body></html>

💿 文件大小 1076 K

👤 上传用户 heyuyutu

📂 所属分类多国语言处理

🏷️ 相关标签

#R-Project #mining #text #语言

⌨️ 快捷键说明

复制代码 Ctrl + C

搜索代码 Ctrl + F

全屏模式 F11

切换主题 Ctrl + Shift + D

显示快捷键 ?

增大字号 Ctrl + =

减小字号 Ctrl + -