📄 reference.html

📁 这是symbian下的实现XML解析的源代码。里面包括解析器和解析例子代码。
💻 HTML
📖 第 1 页 / 共 5 页
字号:

<p><a name="builtin_encodings"></a>There are four built-in encodings
in Expat:</p>
<ul>
<li>UTF-8</li>
<li>UTF-16</li>
<li>ISO-8859-1</li>
<li>US-ASCII</li>
</ul>

<p>Anything else discovered in an encoding declaration or in the
protocol encoding specified in the parser constructor, triggers a call
to the <code>UnknownEncodingHandler</code>. This handler gets passed
the encoding name and a pointer to an <code>XML_Encoding</code> data
structure. Your handler must fill in this structure and return
<code>XML_STATUS_OK</code> if it knows how to deal with the
encoding. Otherwise the handler should return
<code>XML_STATUS_ERROR</code>.  The handler also gets passed a pointer
to an optional application data structure that you may indicate when
you set the handler.</p>

<p>Expat places restrictions on character encodings that it can
support by filling in the <code>XML_Encoding</code> structure.
include file:</p>
<ol>
<li>Every ASCII character that can appear in a well-formed XML document
must be represented by a single byte, and that byte must correspond to
it's ASCII encoding (except for the characters $@\^'{}~)</li>
<li>Characters must be encoded in 4 bytes or less.</li>
<li>All characters encoded must have Unicode scalar values less than or
equal to 65535 (0xFFFF)<em>This does not apply to the built-in support
for UTF-16 and UTF-8</em></li>
<li>No character may be encoded by more that one distinct sequence of
bytes</li>
</ol>

<p><code>XML_Encoding</code> contains an array of integers that
correspond to the 1st byte of an encoding sequence. If the value in
the array for a byte is zero or positive, then the byte is a single
byte encoding that encodes the Unicode scalar value contained in the
array. A -1 in this array indicates a malformed byte. If the value is
-2, -3, or -4, then the byte is the beginning of a 2, 3, or 4 byte
sequence respectively. Multi-byte sequences are sent to the convert
function pointed at in the <code>XML_Encoding</code> structure. This
function should return the Unicode scalar value for the sequence or -1
if the sequence is malformed.</p>

<p>One pitfall that novice Expat users are likely to fall into is that
although Expat may accept input in various encodings, the strings that
it passes to the handlers are always encoded in UTF-8 or UTF-16
(depending on how Expat was compiled). Your application is responsible
for any translation of these strings into other encodings.</p>

<h3>Handling External Entity References</h3>

<p>Expat does not read or parse external entities directly. Note that
any external DTD is a special case of an external entity.  If you've
set no <code>ExternalEntityRefHandler</code>, then external entity
references are silently ignored. Otherwise, it calls your handler with
the information needed to read and parse the external entity.</p>

<p>Your handler isn't actually responsible for parsing the entity, but
it is responsible for creating a subsidiary parser with <code><a href=
"#XML_ExternalEntityParserCreate"
>XML_ExternalEntityParserCreate</a></code> that will do the job. This
returns an instance of <code>XML_Parser</code> that has handlers and
other data structures initialized from the parent parser. You may then
use <code><a href= "#XML_Parse" >XML_Parse</a></code> or <code><a
href= "#XML_ParseBuffer">XML_ParseBuffer</a></code> calls against this
parser.  Since external entities my refer to other external entities,
your handler should be prepared to be called recursively.</p>

<h3>Parsing DTDs</h3>

<p>In order to parse parameter entities, before starting the parse,
you must call <code><a href= "#XML_SetParamEntityParsing"
>XML_SetParamEntityParsing</a></code> with one of the following
arguments:</p>
<dl>
<dt><code>XML_PARAM_ENTITY_PARSING_NEVER</code></dt>
<dd>Don't parse parameter entities or the external subset</dd>
<dt><code>XML_PARAM_ENTITY_PARSING_UNLESS_STANDALONE</code></dt>
<dd>Parse parameter entites and the external subset unless
<code>standalone</code> was set to "yes" in the XML declaration.</dd>
<dt><code>XML_PARAM_ENTITY_PARSING_ALWAYS</code></dt>
<dd>Always parse parameter entities and the external subset</dd>
</dl>

<p>In order to read an external DTD, you also have to set an external
entity reference handler as described above.</p>

<h3 id="stop-resume">Temporarily Stopping Parsing</h3>

<p>Expat 1.95.8 introduces a new feature: its now possible to stop
parsing temporarily from within a handler function, even if more data
has already been passed into the parser.  Applications for this
include</p>

<ul>
  <li>Supporting the <a href= "http://www.w3.org/TR/xinclude/"
  >XInclude</a> specification.</li>

  <li>Delaying further processing until additional information is
  available from some other source.</li>

  <li>Adjusting processor load as task priorities shift within an
  application.</li>

  <li>Stopping parsing completely (simply free or reset the parser
  instead of resuming in the outer parsing loop).  This can be useful
  if a application-domain error is found in the XML being parsed or if
  the result of the parse is determined not to be useful after
  all.</li>
</ul>

<p>To take advantage of this feature, the main parsing loop of an
application needs to support this specifically.  It cannot be
supported with a parsing loop compatible with Expat 1.95.7 or
earlier (though existing loops will continue to work without
supporting the stop/resume feature).</p>

<p>An application that uses this feature for a single parser will have
the rough structure (in pseudo-code):</p>

<pre class="pseudocode">
fd = open_input()
p = create_parser()

if parse_xml(p, fd) {
  /* suspended */

  int suspended = 1;

  while (suspended) {
    do_something_else()
    if ready_to_resume() {
      suspended = continue_parsing(p, fd);
    }
  }
}
</pre>

<p>An application that may resume any of several parsers based on
input (either from the XML being parsed or some other source) will
certainly have more interesting control structures.</p>

<p>This C function could be used for the <code>parse_xml</code>
function mentioned in the pseudo-code above:</p>

<pre class="eg">
#define BUFF_SIZE 10240

/* Parse a document from the open file descriptor 'fd' until the parse
   is complete (the document has been completely parsed, or there's
   been an error), or the parse is stopped.  Return non-zero when
   the parse is merely suspended.
*/
int
parse_xml(XML_Parser p, int fd)
{
  for (;;) {
    int last_chunk;
    int bytes_read;
    enum XML_Status status;

    void *buff = XML_GetBuffer(p, BUFF_SIZE);
    if (buff == NULL) {
      /* handle error... */
      return 0;
    }
    bytes_read = read(fd, buff, BUFF_SIZE);
    if (bytes_read &lt; 0) {
      /* handle error... */
      return 0;
    }
    status = XML_ParseBuffer(p, bytes_read, bytes_read == 0);
    switch (status) {
      case XML_STATUS_ERROR:
        /* handle error... */
        return 0;
      case XML_STATUS_SUSPENDED:
        return 1;
    }
    if (bytes_read == 0)
      return 0;
  }
}
</pre>

<p>The corresponding <code>continue_parsing</code> function is
somewhat simpler, since it only need deal with the return code from
<code><a href= "#XML_ResumeParser">XML_ResumeParser</a></code>; it can
delegate the input handling to the <code>parse_xml</code>
function:</p>

<pre class="eg">
/* Continue parsing a document which had been suspended.  The 'p' and
   'fd' arguments are the same as passed to parse_xml().  Return
   non-zero when the parse is suspended.
*/
int
continue_parsing(XML_Parser p, int fd)
{
  enum XML_Status status = XML_ResumeParser(p);
  switch (status) {
    case XML_STATUS_ERROR:
      /* handle error... */
      return 0;
    case XML_ERROR_NOT_SUSPENDED:
      /* handle error... */
      return 0;.
    case XML_STATUS_SUSPENDED:
      return 1;
  }
  return parse_xml(p, fd);
}
</pre>

<p>Now that we've seen what a mess the top-level parsing loop can
become, what have we gained?  Very simply, we can now use the <code><a
href= "#XML_StopParser" >XML_StopParser</a></code> function to stop
parsing, without having to go to great lengths to avoid additional
processing that we're expecting to ignore.  As a bonus, we get to stop
parsing <em>temporarily</em>, and come back to it when we're
ready.</p>

<p>To stop parsing from a handler function, use the <code><a href=
"#XML_StopParser" >XML_StopParser</a></code> function.  This function
takes two arguments; the parser being stopped and a flag indicating
whether the parse can be resumed in the future.</p>

<!-- XXX really need more here -->


<hr />
<!-- ================================================================ -->

<h2><a name="reference">Expat Reference</a></h2>

<h3><a name="creation">Parser Creation</a></h3>

<pre class="fcndec" id="XML_ParserCreate">
XML_Parser XMLCALL
XML_ParserCreate(const XML_Char *encoding);
</pre>
<div class="fcndef">
Construct a new parser. If encoding is non-null, it specifies a
character encoding to use for the document. This overrides the document
encoding declaration. There are four built-in encodings:
<ul>
<li>US-ASCII</li>
<li>UTF-8</li>
<li>UTF-16</li>
<li>ISO-8859-1</li>
</ul>
Any other value will invoke a call to the UnknownEncodingHandler.
</div>

<pre class="fcndec" id="XML_ParserCreateNS">
XML_Parser XMLCALL
XML_ParserCreateNS(const XML_Char *encoding,
                   XML_Char sep);
</pre>
<div class="fcndef">
Constructs a new parser that has namespace processing in effect. Namespace
expanded element names and attribute names are returned as a concatenation
of the namespace URI, <em>sep</em>, and the local part of the name. This
means that you should pick a character for <em>sep</em> that can't be
part of a legal URI.</div>

<pre class="fcndec" id="XML_ParserCreate_MM">
XML_Parser XMLCALL
XML_ParserCreate_MM(const XML_Char *encoding,
                    const XML_Memory_Handling_Suite *ms,
		    const XML_Char *sep);
</pre>
<pre class="signature">
typedef struct {
  void *(XMLCALL *malloc_fcn)(size_t size);
  void *(XMLCALL *realloc_fcn)(void *ptr, size_t size);
  void (XMLCALL *free_fcn)(void *ptr);
} XML_Memory_Handling_Suite;
</pre>
<div class="fcndef">
<p>Construct a new parser using the suite of memory handling functions
specified in <code>ms</code>. If <code>ms</code> is NULL, then use the
standard set of memory management functions. If <code>sep</code> is
non NULL, then namespace processing is enabled in the created parser
and the character pointed at by sep is used as the separator between
the namespace URI and the local part of the name.</p>
</div>

<pre class="fcndec" id="XML_ExternalEntityParserCreate">
XML_Parser XMLCALL
XML_ExternalEntityParserCreate(XML_Parser p,
                               const XML_Char *context,
                               const XML_Char *encoding);
</pre>
<div class="fcndef">
Construct a new <code>XML_Parser</code> object for parsing an external
general entity. Context is the context argument passed in a call to a
ExternalEntityRefHandler. Other state information such as handlers,
user data, namespace processing is inherited from the parser passed as
the 1st argument. So you shouldn't need to call any of the behavior
changing functions on this parser (unless you want it to act
differently than the parent parser).
</div>

<pre class="fcndec" id="XML_ParserFree">
void XMLCALL
XML_ParserFree(XML_Parser p);
</pre>
<div class="fcndef">
Free memory used by the parser. Your application is responsible for
freeing any memory associated with <a href="#userdata">user data</a>.
</div>

<pre class="fcndec" id="XML_ParserReset">
XML_Bool XMLCALL
XML_ParserReset(XML_Parser p,
                const XML_Char *encoding);
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -