⭐ 欢迎来到虫虫下载站! | 📦 资源下载 📁 资源专辑 ℹ️ 关于我们
⭐ 虫虫下载站

📄 textdoccol.html

📁 R-Project是一个开源的统计软件,专门有一个R语言,类似S语言,这个包里面就是一个R实现的文本挖掘(text mining简称tm)的包.里面有代码和样本数据.
💻 HTML
字号:
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html><head><title>R: Text document collection</title>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
<link rel="stylesheet" type="text/css" href="../../R.css">
</head><body>

<table width="100%" summary="page for Corpus {tm}"><tr><td>Corpus {tm}</td><td align="right">R Documentation</td></tr></table>
<h2>Text document collection</h2>


<h3>Description</h3>

<p>
Constructs a text document collection (corpus).
</p>


<h3>Usage</h3>

<pre>
## S4 method for signature 'Source':
Corpus(object, readerControl = list(reader = object@DefaultReader,
language = "en_US", load = TRUE), dbControl = list(useDb = FALSE, dbName = "",
dbType = "DB1"), ...)
</pre>


<h3>Arguments</h3>

<table summary="R argblock">
<tr valign="top"><td><code>object</code></td>
<td>
a <code>Source</code> object.</td></tr>
<tr valign="top"><td><code>readerControl</code></td>
<td>
a list with the named components <code>reader</code>
representing a reading function capable of handling the file format
found in <code>object</code>, <code>language</code> giving the text's language, and
<code>load</code> being a logical value indicating whether the text corpus of
documents should be loaded immediately into memory (<code>load = TRUE</code>) or loaded when
necessary (<code>load = FALSE</code>). This allows to minimize memory
demands for large document collections. If <code>object</code> does not
support load on demand the text corpus is automatically loaded,
i.e., this argument is overruled.</td></tr>
<tr valign="top"><td><code>dbControl</code></td>
<td>
a list with the named components <code>useDb</code>
indicating that database support should be activated, <code>dbName</code>
giving the filename holding the sourced out objects (i.e., the
database), and <code>dbType</code> holding a valid database type as
supported by <span class="pkg">filehash</span>. Under activated database
support the <code>tm</code> packages tries to keep as few as possible
resources in memory under usage of the database.</td></tr>
<tr valign="top"><td><code>...</code></td>
<td>
optional arguments for the <code>reader</code>.</td></tr>
</table>

<h3>Value</h3>

<p>
An S4 object of class <code>Corpus</code> which extends the class
<code>list</code> containing a collection of text documents.</p>

<h3>Author(s)</h3>

<p>
Ingo Feinerer
</p>


<h3>Examples</h3>

<pre>
txt &lt;- system.file("texts", "txt", package = "tm")
## Not run: 
(Corpus(DirSource(txt), readerControl = list(reader
= readPlain, language = "en_US", load = TRUE), dbControl = list(useDb =
TRUE, dbName = "oviddb", dbType = "DB1")))
## End(Not run)
reut21578 &lt;- system.file("texts", "reut21578", package = "tm")
Corpus(DirSource(reut21578), readerControl = list(reader = readReut21578XML, language = "en_US", load = FALSE))
</pre>



<hr><div align="center">[Package <em>tm</em> version 0.3 <a href="00Index.html">Index]</a></div>

</body></html>

⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -