📄 ch20_04.htm
字号:
<div class="refentry"><table width="515" border="0" cellpadding="5"><tr><td align="left"><font size="+1"><b>get_token</b></font></td><td align="right"><i></i></td></tr></table><hr width="515" size="3" noshade="true" align="left" color="black" /><pre>get_token( )</pre><p>Returns the next token found in the HTML document, or<tt class="literal">undef</tt> if no next token exists. Each token isreturned as an array reference. The arrayreference's first and last items refer to start andend tags concurrently. The rest of the items in the array includetext, comments, declarations, and process instructions.<tt class="literal">get_token</tt> uses the following labels for thetokens:</p><dl><dt><b><tt class="literal">S</tt></b></dt><dd>Start tag</p></dd><dt><b><tt class="literal">E</tt></b></dt><dd>End tag</p></dd><dt><b><tt class="literal">T</tt></b></dt><dd>Text</p></dd><dt><b><tt class="literal">C</tt></b></dt><dd>Comment</p></dd><dt><b><tt class="literal">D</tt></b></dt><dd>Declaration</p></dd><dt><b><tt class="literal">PI</tt></b></dt><dd>Process instructions</p></dd></dl><p>Consider the following code:</p><blockquote><pre class="code">#!/usr/local/bin/perl -wrequire HTML::TokeParser;my $html = '<a href="http://blah">My name is Nate!</a></p>';my $p = HTML::TokeParser->new(\$html);while (my $token = $p->get_token) { my $i = 0; foreach my $tk (@{$token}) { print "token[$i]: $tk\n"; $i++; }}</pre></blockquote><p>The items in each token (in the HTML) are displayed as follows:</p><blockquote><pre class="code">token[0]: Stoken[1]: atoken[2]: HASH(0x8146d3c)token[3]: ARRAY(0x814a380)token[4]: <a href="http://blah">token[0]: Ttoken[1]: My name is Nate!token[2]:token[0]: Etoken[1]: atoken[2]: </a>token[0]: Etoken[1]: ptoken[2]: </p></pre></blockquote></div><div class="refentry"><table width="515" border="0" cellpadding="5"><tr><td align="left"><font size="+1"><b>get_trimmed_text</b></font></td><td align="right"><i></i></td></tr></table><hr width="515" size="3" noshade="true" align="left" color="black" /><pre>get_trimmed_text( )</pre><p>Works the same as <tt class="literal">get_text</tt>, but reduces allinstances of multiple spaces to a single space and removes leadingand trailing whitespace.</p></div><div class="refentry"><table width="515" border="0" cellpadding="5"><tr><td align="left"><font size="+1"><b>unget_token</b></font></td><td align="right"><i></i></td></tr></table><hr width="515" size="3" noshade="true" align="left" color="black" /><pre>unget_token( )</pre><p>Useful for pushing tokens back to the parser so they can be reusedthe next time you call <tt class="literal">get_token</tt>.</p></div></div></div><a name="perlnut2-CHP-20-SECT-4.3" /><div class="sect2"><h3 class="sect2">20.4.3. HTML::Element</h3><p><a name="INDEX-2585" /><a name="INDEX-2586" />The HTML::Element module providesmethods for dealing with nodes in an HTML syntax tree. You can get orset the contents of each node, traverse the tree, and delete a node.</p><p>HTML::Element objects are used to represent elements of HTML. Theseelements include start and end tags, attributes, contained plaintext, and other nested elements.</p><p>The constructor for this class requires the name of the tag for itsfirst argument. You may optionally specify initial attributes andvalues as hash elements in the constructor. Forexample<a name="INDEX-2587" />:</p><blockquote><pre class="code">$h = HTML::Element->new('a', 'href' => 'http:www.oreilly.com');</pre></blockquote><p>The new element is created for the anchor tag,<tt class="literal"><a></tt>, which links to the URL through its<tt class="literal">href</tt> attribute.</p><p>The following methods are provided for objects of the HTML::Elementclass.</p><a name="INDEX-2588" /><div class="refentry"><table width="515" border="0" cellpadding="5"><tr><td align="left"><font size="+1"><b>as_HTML</b></font></td><td align="right"><i></i></td></tr></table><hr width="515" size="3" noshade="true" align="left" color="black" /><pre>$<em class="replaceable">h</em>->as_HTML( )</pre><p><a name="INDEX-2588" />Returns the HTML string thatrepresents the element and its children.</p></div><a name="INDEX-2589" /><div class="refentry"><table width="515" border="0" cellpadding="5"><tr><td align="left"><font size="+1"><b>attr</b></font></td><td align="right"><i></i></td></tr></table><hr width="515" size="3" noshade="true" align="left" color="black" /><pre>$<em class="replaceable">h</em>->attr(<em class="replaceable">name</em> [,<em class="replaceable">value</em>])</pre><p><a name="INDEX-2589" />Sets or retrieves the value ofattribute <em class="replaceable"><tt>name</tt></em> in the current element.</p></div><a name="INDEX-2590" /><div class="refentry"><table width="515" border="0" cellpadding="5"><tr><td align="left"><font size="+1"><b>content</b></font></td><td align="right"><i></i></td></tr></table><hr width="515" size="3" noshade="true" align="left" color="black" /><pre>$<em class="replaceable">h</em>->content( )</pre><p><a name="INDEX-2590" />Returns the content contained in thiselement as a reference to an array that contains plain-text segmentsand references to nested element objects.</p></div><a name="INDEX-2591" /><div class="refentry"><table width="515" border="0" cellpadding="5"><tr><td align="left"><font size="+1"><b>delete</b></font></td><td align="right"><i></i></td></tr></table><hr width="515" size="3" noshade="true" align="left" color="black" /><pre>$<em class="replaceable">h</em>->delete( )</pre><p><a name="INDEX-2591" />Deletes the current element and all ofits child elements.</p></div><a name="INDEX-2592" /><div class="refentry"><table width="515" border="0" cellpadding="5"><tr><td align="left"><font size="+1"><b>delete_content</b></font></td><td align="right"><i></i></td></tr></table><hr width="515" size="3" noshade="true" align="left" color="black" /><pre>$<em class="replaceable">h</em>->delete_content( )</pre><p><a name="INDEX-2592" />Removes the content from thecurrent element.</p></div><a name="INDEX-2593" /><div class="refentry"><table width="515" border="0" cellpadding="5"><tr><td align="left"><font size="+1"><b>dump</b></font></td><td align="right"><i></i></td></tr></table><hr width="515" size="3" noshade="true" align="left" color="black" /><pre>$<em class="replaceable">h</em>->dump( )</pre><p><a name="INDEX-2593" />Prints the tag name of the element andall its children to STDOUT. Useful for debugging. The structure ofthe document is shown by indentation.</p></div><a name="INDEX-2594" /><div class="refentry"><table width="515" border="0" cellpadding="5"><tr><td align="left"><font size="+1"><b>endtag</b></font></td><td align="right"><i></i></td></tr></table><hr width="515" size="3" noshade="true" align="left" color="black" /><pre>$<em class="replaceable">h</em>->endtag( )</pre><p><a name="INDEX-2594" />Returns the original text ofthe end tag, including the <tt class="literal"></</tt> and<tt class="literal">></tt>.</p></div><a name="INDEX-2595" /><div class="refentry"><table width="515" border="0" cellpadding="5"><tr><td align="left"><font size="+1"><b>extract_links</b></font></td><td align="right"><i></i></td></tr></table><hr width="515" size="3" noshade="true" align="left" color="black" /><pre>$<em class="replaceable">h</em>->extract_links([<em class="replaceable">types</em>])</pre><p><a name="INDEX-2595" />Retrieves the links containedwithin an element and all of its child elements. This method returnsa reference to an array in which each element is a reference to anarray with two values: the value of the link and a reference to theelement in which it was found. You may specify the tags from whichyou want to extract links by providing their names in a list of<em class="replaceable"><tt>types</tt></em>.</p></div><a name="INDEX-2596" /><div class="refentry"><table width="515" border="0" cellpadding="5"><tr><td align="left"><font size="+1"><b>implicit</b></font></td><td align="right"><i></i></td></tr></table><hr width="515" size="3" noshade="true" align="left" color="black" /><pre>$<em class="replaceable">h</em>->implicit([<em class="replaceable">boolean</em>])</pre><p><a name="INDEX-2596" />Indicates whether the elementwas contained in the original document (false) or whether it wasassumed to be implicit (true) by the parser. Implicit tags areelements that the parser included to conform to proper HTMLstructure, such as an ending paragraph tag(<tt class="literal"></p></tt>). You may also set this attribute byproviding a <em class="replaceable"><tt>boolean</tt></em> argument.</p></div><a name="INDEX-2597" /><div class="refentry"><table width="515" border="0" cellpadding="5"><tr><td align="left"><font size="+1"><b>insert_element</b></font></td><td align="right"><i></i></td></tr></table><hr width="515" size="3" noshade="true" align="left" color="black" /><pre>$<em class="replaceable">h</em>->insert_element($<em class="replaceable">element</em>, <em class="replaceable">implicit</em>)</pre><p><a name="INDEX-2597" />Inserts the object<tt class="literal">$</tt><em class="replaceable"><tt>element</tt></em> at the currentposition relative to the root object<tt class="literal">$</tt><em class="replaceable"><tt>h</tt></em> and updates theposition (indicated by <tt class="literal">pos</tt>) to the insertedelement. Returns the new<tt class="literal">$</tt><em class="replaceable"><tt>element</tt></em>. The<em class="replaceable"><tt>implicit</tt></em> argument is a Boolean indicatingwhether the element is an implicit tag (true) or the original HTML(false).</p></div><a name="INDEX-2598" /><div class="refentry"><table width="515" border="0" cellpadding="5"><tr><td align="left"><font size="+1"><b>is_empty</b></font></td><td align="right"><i></i></td></tr></table><hr width="515" size="3" noshade="true" align="left" color="black" /><pre>$<em class="replaceable">h</em>->is_empty( )</pre><p><a name="INDEX-2598" />Returns true if the currentobject has no content.</p></div><a name="INDEX-2599" /><div class="refentry"><table width="515" border="0" cellpadding="5"><tr><td align="left"><font size="+1"><b>is_inside</b></font></td><td align="right"><i></i></td></tr></table><hr width="515" size="3" noshade="true" align="left" color="black" /><pre>$<em class="replaceable">h</em>->is_inside(<em class="replaceable">tag1</em> [,<em class="replaceable">tag2</em>, ...])</pre><p><a name="INDEX-2599" />Returns true if the tag forthis element is contained inside one of the tags listed as arguments.</p></div><a name="INDEX-2600" /><div class="refentry"><table width="515" border="0" cellpadding="5"><tr><td align="left"><font size="+1"><b>parent</b></font></td><td align="right"><i></i></td></tr></table><hr width="515" size="3" noshade="true" align="left" color="black" /><pre>$<em class="replaceable">h</em>->parent([$<em class="replaceable">new</em>])</pre><p><a name="INDEX-2600" />Without an argument, returnsthe parent object for this element. If given a reference to anotherelement object, this element is set as the new parent object and isreturned.</p></div><a name="INDEX-2601" /><div class="refentry"><table width="515" border="0" cellpadding="5"><tr><td align="left"><font size="+1"><b>pos</b></font></td><td align="right"><i></i></td></tr></table><hr width="515" size="3" noshade="true" align="left" color="black" /><pre>$<em class="replaceable">h</em>->pos([$<em class="replaceable">element</em>])</pre><p><a name="INDEX-2601" />Sets or retrieves the currentposition in the syntax tree of the current object. The returned valueis a reference to the element object that holds the current position.The "position" object is an elementcontained within the tree that has the current object(<tt class="literal">$</tt><em class="replaceable"><tt>h</tt></em>) at its root.</p></div><a name="INDEX-2602" /><div class="refentry"><table width="515" border="0" cellpadding="5"><tr><td align="left"><font size="+1"><b>push_content</b></font></td><td align="right"><i></i></td></tr></table><hr width="515" size="3" noshade="true" align="left" color="black" /><pre>$<em class="replaceable">h</em>->push_content(<em class="replaceable">content</em>)</pre><p><a name="INDEX-2602" />Inserts the specified contentinto the current element. <em class="replaceable"><tt>content</tt></em> can beeither a scalar containing plain text or a reference to anotherelement. Multiple arguments can be supplied.</p></div><a name="INDEX-2603" /><div class="refentry"><table width="515" border="0" cellpadding="5"><tr><td align="left"><font size="+1"><b>starttag</b></font></td><td align="right"><i></i></td></tr></table><hr width="515" size="3" noshade="true" align="left" color="black" /><pre>$<em class="replaceable">h</em>->starttag( )</pre><p><a name="INDEX-2603" />Returns the original text ofthe start tag for the element. This includes the<tt class="literal"><</tt> and <tt class="literal">></tt> and allattributes.
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -