⭐ 欢迎来到虫虫下载站! | 📦 资源下载 📁 资源专辑 ℹ️ 关于我们
⭐ 虫虫下载站

📄 module-sgmllib.html

📁 一本很好的python的说明书,适合对python感兴趣的人
💻 HTML
📖 第 1 页 / 共 2 页
字号:
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<html>
<head>
<title>13.1 sgmllib -- Simple SGML parser</title>
<META NAME="description" CONTENT="13.1 sgmllib -- Simple SGML parser">
<META NAME="keywords" CONTENT="lib">
<META NAME="resource-type" CONTENT="document">
<META NAME="distribution" CONTENT="global">
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
<link rel="STYLESHEET" href="lib.css" tppabs="http://www.python.org/doc/current/lib/lib.css">
<LINK REL="next" href="module-htmllib.html" tppabs="http://www.python.org/doc/current/lib/module-htmllib.html">
<LINK REL="previous" href="markup.html" tppabs="http://www.python.org/doc/current/lib/markup.html">
<LINK REL="up" href="markup.html" tppabs="http://www.python.org/doc/current/lib/markup.html">
<LINK REL="next" href="module-htmllib.html" tppabs="http://www.python.org/doc/current/lib/module-htmllib.html">
</head>
<body>
<DIV CLASS="navigation"><table align="center" width="100%" cellpadding="0" cellspacing="2">
<tr>
<td><A href="markup.html" tppabs="http://www.python.org/doc/current/lib/markup.html"><img src="previous.gif" tppabs="http://www.python.org/doc/current/icons/previous.gif" border="0" height="32"
  alt="Previous Page" width="32"></A></td>
<td><A href="markup.html" tppabs="http://www.python.org/doc/current/lib/markup.html"><img src="up.gif" tppabs="http://www.python.org/doc/current/icons/up.gif" border="0" height="32"
  alt="Up One Level" width="32"></A></td>
<td><A href="module-htmllib.html" tppabs="http://www.python.org/doc/current/lib/module-htmllib.html"><img src="next.gif" tppabs="http://www.python.org/doc/current/icons/next.gif" border="0" height="32"
  alt="Next Page" width="32"></A></td>
<td align="center" width="100%">Python Library Reference</td>
<td><A href="contents.html" tppabs="http://www.python.org/doc/current/lib/contents.html"><img src="contents.gif" tppabs="http://www.python.org/doc/current/icons/contents.gif" border="0" height="32"
  alt="Contents" width="32"></A></td>
<td><a href="modindex.html" tppabs="http://www.python.org/doc/current/lib/modindex.html" title="Module Index"><img src="modules.gif" tppabs="http://www.python.org/doc/current/icons/modules.gif" border="0" height="32"
  alt="Module Index" width="32"></a></td>
<td><A href="genindex.html" tppabs="http://www.python.org/doc/current/lib/genindex.html"><img src="index.gif" tppabs="http://www.python.org/doc/current/icons/index.gif" border="0" height="32"
  alt="Index" width="32"></A></td>
</tr></table>
<b class="navlabel">Previous:</b> <a class="sectref" href="markup.html" tppabs="http://www.python.org/doc/current/lib/markup.html">13. Structured Markup Processing</A>
<b class="navlabel">Up:</b> <a class="sectref" href="markup.html" tppabs="http://www.python.org/doc/current/lib/markup.html">13. Structured Markup Processing</A>
<b class="navlabel">Next:</b> <a class="sectref" href="module-htmllib.html" tppabs="http://www.python.org/doc/current/lib/module-htmllib.html">13.2 htmllib  </A>
<br><hr></DIV>
<!--End of Navigation Panel-->

<H1><A NAME="SECTION0015100000000000000000">
13.1 <tt class="module">sgmllib</tt> --
         Simple SGML parser</A>
</H1>

<P>


<P>


<P>
This module defines a class <tt class="class">SGMLParser</tt> which serves as the
basis for parsing text files formatted in SGML (Standard Generalized
Mark-up Language).  In fact, it does not provide a full SGML parser
-- it only parses SGML insofar as it is used by HTML, and the module
only exists as a base for the <tt class='module'><a href="module-htmllib.html" tppabs="http://www.python.org/doc/current/lib/module-htmllib.html">htmllib</a></tt>
module.

<P>
<dl><dt><b><a name='l2h-2557'><tt class='class'>SGMLParser</tt></a></b> ()
<dd>
The <tt class="class">SGMLParser</tt> class is instantiated without arguments.
The parser is hardcoded to recognize the following
constructs:

<P>

<UL>
<LI>Opening and closing tags of the form
"<tt class="samp">&lt;<var>tag</var> <var>attr</var>="<var>value</var>" ...&gt;</tt>" and
"<tt class="samp">&lt;/<var>tag</var>&gt;</tt>", respectively.

<P>
</LI>
<LI>Numeric character references of the form "<tt class="samp">&amp;#<var>name</var>;</tt>".

<P>
</LI>
<LI>Entity references of the form "<tt class="samp">&amp;<var>name</var>;</tt>".

<P>
</LI>
<LI>SGML comments of the form "<tt class="samp">&lt;!-<var>text</var>-&gt;</tt>".  Note that
spaces, tabs, and newlines are allowed between the trailing
"<tt class="samp">&gt;</tt>" and the immediately preceding "<tt class="samp">-</tt>".

<P>
</LI>
</UL>
</dl>

<P>
<tt class="class">SGMLParser</tt> instances have the following interface methods:

<P>
<dl><dt><b><a name='l2h-2558'><tt class='method'>reset</tt></a></b> ()
<dd>
Reset the instance.  Loses all unprocessed data.  This is called
implicitly at instantiation time.
</dl>

<P>
<dl><dt><b><a name='l2h-2559'><tt class='method'>setnomoretags</tt></a></b> ()
<dd>
Stop processing tags.  Treat all following input as literal input
(CDATA).  (This is only provided so the HTML tag
<code>&lt;PLAINTEXT&gt;</code> can be implemented.)
</dl>

<P>
<dl><dt><b><a name='l2h-2560'><tt class='method'>setliteral</tt></a></b> ()
<dd>
Enter literal mode (CDATA mode).
</dl>

<P>
<dl><dt><b><a name='l2h-2561'><tt class='method'>feed</tt></a></b> (<var>data</var>)
<dd>
Feed some text to the parser.  It is processed insofar as it consists
of complete elements; incomplete data is buffered until more data is
fed or <tt class="method">close()</tt> is called.
</dl>

<P>
<dl><dt><b><a name='l2h-2562'><tt class='method'>close</tt></a></b> ()
<dd>
Force processing of all buffered data as if it were followed by an
end-of-file mark.  This method may be redefined by a derived class to
define additional processing at the end of the input, but the
redefined version should always call <tt class="method">close()</tt>.
</dl>

<P>
<dl><dt><b><a name='l2h-2563'><tt class='method'>get_starttag_text</tt></a></b> ()
<dd>
Return the text of the most recently opened start tag.  This should
not normally be needed for structured processing, but may be useful in
dealing with HTML ``as deployed'' or for re-generating input with
minimal changes (whitespace between attributes can be preserved,
etc.).
</dl>

<P>
<dl><dt><b><a name='l2h-2564'><tt class='method'>handle_starttag</tt></a></b> (<var>tag, method, attributes</var>)
<dd>
This method is called to handle start tags for which either a
<tt class="method">start_<var>tag</var>()</tt> or <tt class="method">do_<var>tag</var>()</tt> method has been
defined.  The <var>tag</var> argument is the name of the tag converted to
lower case, and the <var>method</var> argument is the bound method which
should be used to support semantic interpretation of the start tag.
The <var>attributes</var> argument is a list of <code>(<var>name</var>,
<var>value</var>)</code> pairs containing the attributes found inside the tag's
<code>&lt;&gt;</code> brackets.  The <var>name</var> has been translated to lower case
and double quotes and backslashes in the <var>value</var> have been interpreted.
For instance, for the tag <code>&lt;A HREF="http://www.cwi.nl/"&gt;</code>, this
method would be called as "<tt class="samp">unknown_starttag('a', [('href',
'http://www.cwi.nl/')])</tt>".  The base implementation simply calls
<var>method</var> with <var>attributes</var> as the only argument.
</dl>

<P>
<dl><dt><b><a name='l2h-2565'><tt class='method'>handle_endtag</tt></a></b> (<var>tag, method</var>)
<dd>
This method is called to handle endtags for which an
<tt class="method">end_<var>tag</var>()</tt> method has been defined.  The

⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -