📄 textdoccol.html
字号:
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html><head><title>R: Text document collection</title>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
<link rel="stylesheet" type="text/css" href="../../R.css">
</head><body>
<table width="100%" summary="page for Corpus {tm}"><tr><td>Corpus {tm}</td><td align="right">R Documentation</td></tr></table>
<h2>Text document collection</h2>
<h3>Description</h3>
<p>
Constructs a text document collection (corpus).
</p>
<h3>Usage</h3>
<pre>
## S4 method for signature 'Source':
Corpus(object, readerControl = list(reader = object@DefaultReader,
language = "en_US", load = TRUE), dbControl = list(useDb = FALSE, dbName = "",
dbType = "DB1"), ...)
</pre>
<h3>Arguments</h3>
<table summary="R argblock">
<tr valign="top"><td><code>object</code></td>
<td>
a <code>Source</code> object.</td></tr>
<tr valign="top"><td><code>readerControl</code></td>
<td>
a list with the named components <code>reader</code>
representing a reading function capable of handling the file format
found in <code>object</code>, <code>language</code> giving the text's language, and
<code>load</code> being a logical value indicating whether the text corpus of
documents should be loaded immediately into memory (<code>load = TRUE</code>) or loaded when
necessary (<code>load = FALSE</code>). This allows to minimize memory
demands for large document collections. If <code>object</code> does not
support load on demand the text corpus is automatically loaded,
i.e., this argument is overruled.</td></tr>
<tr valign="top"><td><code>dbControl</code></td>
<td>
a list with the named components <code>useDb</code>
indicating that database support should be activated, <code>dbName</code>
giving the filename holding the sourced out objects (i.e., the
database), and <code>dbType</code> holding a valid database type as
supported by <span class="pkg">filehash</span>. Under activated database
support the <code>tm</code> packages tries to keep as few as possible
resources in memory under usage of the database.</td></tr>
<tr valign="top"><td><code>...</code></td>
<td>
optional arguments for the <code>reader</code>.</td></tr>
</table>
<h3>Value</h3>
<p>
An S4 object of class <code>Corpus</code> which extends the class
<code>list</code> containing a collection of text documents.</p>
<h3>Author(s)</h3>
<p>
Ingo Feinerer
</p>
<h3>Examples</h3>
<pre>
txt <- system.file("texts", "txt", package = "tm")
## Not run:
(Corpus(DirSource(txt), readerControl = list(reader
= readPlain, language = "en_US", load = TRUE), dbControl = list(useDb =
TRUE, dbName = "oviddb", dbType = "DB1")))
## End(Not run)
reut21578 <- system.file("texts", "reut21578", package = "tm")
Corpus(DirSource(reut21578), readerControl = list(reader = readReut21578XML, language = "en_US", load = FALSE))
</pre>
<hr><div align="center">[Package <em>tm</em> version 0.3 <a href="00Index.html">Index]</a></div>
</body></html>
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -