📄 readhtml.html
字号:
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html><head><title>R: Read In A Simple HTML Document</title>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
<link rel="stylesheet" type="text/css" href="../../R.css">
</head><body>
<table width="100%" summary="page for readHTML {tm}"><tr><td>readHTML {tm}</td><td align="right">R Documentation</td></tr></table>
<h2>Read In A Simple HTML Document</h2>
<h3>Description</h3>
<p>
Returns a function which reads in a simple HTML
document extracting both its text and its metadata. The reader uses
<code>h1</code> headings as structure information whereas text and tags
between headings are considered as textual information. Metadata is
extracted from <code>meta</code> tags in the HTML head.
</p>
<h3>Usage</h3>
<pre>
readHTML(...)
</pre>
<h3>Arguments</h3>
<table summary="R argblock">
<tr valign="top"><td><code>...</code></td>
<td>
arguments for the generator function.</td></tr>
</table>
<h3>Details</h3>
<p>
Formally this function is a function generator, i.e., it returns a
function (which reads in a text document) with a well-defined
signature, but can access passed over arguments via lexical
scoping. This is especially useful for reader functions for complex
data structures which need a lot of configuration options.
</p>
<h3>Value</h3>
<p>
A <code>function</code> with the signature <code>elem, language, load, id</code>:
</p>
<table summary="R argblock">
<tr valign="top"><td><code>elem</code></td>
<td>
A <code>list</code> with the two named elements <code>content</code>
and <code>uri</code>. The first element must hold the document corpus to
be read in, the second element must hold a call to the document
corpus. The call is evaluated upon a request for load on demand.</td></tr>
<tr valign="top"><td><code>language</code></td>
<td>
A <code>character</code> giving the text's language.</td></tr>
<tr valign="top"><td><code>load</code></td>
<td>
A <code>logical</code> value indicating whether the document
corpus should be immediately loaded into memory.</td></tr>
<tr valign="top"><td><code>id</code></td>
<td>
A <code>character</code> representing a unique identification
string for the returned text document.</td></tr>
</table>
<p>
<br>
The function returns a <code>StructuredTextDocument</code> representing
<code>content</code>.</p>
<h3>Author(s)</h3>
<p>
Ingo Feinerer
</p>
<h3>Examples</h3>
<pre>
html <- system.file("texts", "html", package = "tm")
## Not run: (Corpus(DirSource(html), readerControl = list(reader = readHTML, load = TRUE)))
</pre>
<hr><div align="center">[Package <em>tm</em> version 0.3 <a href="00Index.html">Index]</a></div>
</body></html>
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -