📄 xmlparser.html
字号:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html><!-- InstanceBegin template="/Templates/standardPage.dwt" codeOutsideHTMLIsLocked="true" -->
<head>
<!-- InstanceBeginEditable name="doctitle" -->
<title>Small, simple, cross-platform, free and fast C++ XML Parser</title>
<!-- InstanceEndEditable -->
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
<!-- InstanceBeginEditable name="head" -->
<!-- InstanceEndEditable -->
</head>
<BODY LEFTMARGIN=15 MARGINWIDTH=15 >
<H1>
<div align="center"><!-- InstanceBeginEditable name="titre" -->Small, simple,
cross-platform, free and<em><font face="Arial, Helvetica, sans-serif"> fast</font></em>
C++ XML Parser<!-- InstanceEndEditable -->
</div>
</H1>
<!-- InstanceBeginEditable name="content" -->
<p>This project started from my frustration that I could not find any simple,
portable XML Parser to use inside my tools (see <a href="http://www.applied-mathematics.net/CONDORManual/CONDORManual.html">CONDOR</a>
for example). Let's look at the well-known Xerces C++ library: the complete
library is 53 MB! (12.1 MB compressed in a zipfile). I am currently developping
many small tools. I am using XML as standard for all my input /ouput configuration
and data files. The source code of my small tools is usually around 600KB. In
these conditions, don't you think that 53MB to be able to read an XML file is
a little bit "too much"? So I created my own XML parser. My XML parser
"library" is composed of only 2 files: a .cpp file and a .h file.
The total size is 104 KB.<br>
<br>
Here is how it works: The XML parser loads a full XML file in memory, it parses
the file and it generates a tree structure representing the XML file. Of course,
you can also parse XML data that you have already stored yourself into a memory
buffer. Thereafter, you can easily "explore" the tree to get your
data. You can also modify the tree using "add" and "delete"
functions and regenerate a formatted XML string from a subtree. Memory management
is totally transparent through the use of smart pointers (in other words, you
will never have to do any new, delete, malloc or free)("Smart pointers"
are a primitive version of the garbage collector in Java).<br>
<br>
Here are the characteristics of the XMLparser library:
<ul>
<li>Non-validating XML parser written in standard C++ (DTD's or XSD's informations
are ignored). </li>
<li>Cross-plateform: the library is currently used every day on Solaris, Linux
(32bit and 64bit) and Windows to manipulate "small" <a href="http://www.dmg.org/pmml-v3-0.html" target="_top">PMML
documents</a> (10 MB).<br>
The library has been tested and is working flawlessly using the following
compilers: gcc (under linux, Mac OS X Tiger and under many unix flavours),
Visual Studio 6.0, Visual Studio .NET (under Windows 9x,NT,2000,XP,Vista,CE,mobile),
Intel C/C++ compiler, SUN CC compiler, C++ Borland Compiler. The library is
also used under QNX.</li>
<li>The parser builds a tree structure that you can "explore" easily
(DOM-type parser).</li>
<li>The parser can be used to generate XML strings from subtrees (it's called
rendering). You can also save subtrees directly to files (automatic "Byte
Order Mark"-BOM support).</li>
<li> Modification or "from scratch creation" of large XML tree structures
in memory using funtions like <font face="Courier New, Courier, mono">addChild</font>,
<font face="Courier New, Courier, mono">addAttribute</font>,<font face="Courier New, Courier, mono">updateAttribute</font>,<font face="Courier New, Courier, mono">deleteAttribute</font>,...</li>
<li>It's <strong>SIMPLE</strong>: no need to learn how to use dozens of classes:
there is only one simple class: the 'XMLNode' class (that represents one node
of the XML tree).</li>
<li>Very efficient (Efficiency is required to be able to handle <strong>BIG</strong>
files):
<ul>
<li><font size="-1">The string parser is very efficient: It does only one
pass over the XML string to create the tree. It does the minimal amount
of memory allocations. For example: it does NOT use slow STL::String class
but plain, simple and fast C malloc 's. It also allocates large chunk
of memory instead of many small chunks. Inside Visual C++, the "debug
versions" of the memory allocation functions are very slow: Do not
forget to compile in "release mode" to get maximum speed.</font></li>
<li><font size="-1">The "tree exploration" is very efficient because
all operations on the 'XMLNode' class are handled through references:
there are no memory copy, no memory allocation, never. </font></li>
<li><font size="-1">The XML string rendering is very efficient: It does
one pass to compute the total memory size of the XML string and a second
pass to actually create the string. There is thus only one memory allocation
and no extra memory copy. Other libraries are slower because they are
using the string concatenation operator that requires many memory (re-)allocations
and memory copy.</font></li>
</ul>
</li>
<li>In-memory parsing</li>
<li>Supports XML namespaces</li>
<li>Very small and totally stand-alone (not built on top of something else).
Uses only standard <stdio.h> library (and only for the 'fopen' and the
'fread' functions to load the XML file).</li>
<li>Easy to integrate into you own projects: it's only 2 files! The .h file
does not contain any implementation code. Compilation is thus very fast.</li>
<li>Robust (I used it every day at work since 2004). <br>
Optionnally, if you define the C++ prepocessor directives STRICT_PARSING and/or
APPROXIMATE_PARSING, the library can be "forgiving" in case of errors
inside the XML. <br>
I have tried to respect the XML-specs given at: <a href="http://www.w3.org/TR/REC-xml/" target="_top">http://www.w3.org/TR/REC-xml/</a>
<li>Fully integrated error handling :
<ul>
<li><font size="-1">The string parser gives you the precise position and
type of the error inside the XML string (if an error is detected).</font></li>
<li><font size="-1">The library allows you to "explore" a part
of the tree that is missing. However data extracted from "missing
subtrees" will be NULL. This way, it's really easy to code "error
handling" procedures.</font></li>
</ul>
<li>Thread-safe (however the global parameters "guessUnicodeChar"
and"strictUTF8Parsing" must be unique because they are shared by
all threads).</li>
<li>Full Supports for a wide range of character sets & encodings: ANSI /
UTF-8 / Shift-JIS / Unicode 16bit / Unicode 32bit characters support (Windows,
Linux, Linux 64 bits & Solaris version only)
<ul>
<li><font size="-1">For the unicode version of the library: Automatic conversion
to Unicode before parsing (if the input XML file is standard ansi 8bit
characters).</font></li>
<li><font size="-1"> For the ascii version of the library: Automatic conversion
to ascii before parsing (if the input XML file is unicode 16 or 32bit
wide characters). </font> </li>
</ul>
The library is now able to handle successfuly chinese, cyrilic and other extended
characters thanks to an extended UTF-8 support (see this <a href="http://www.applied-mathematics.net/tools/UTF-8-demo.txt">UTF-8-demo</a>
to show the characters available). If you are still experiencing character
encoding problems, I suggest you to convert your XML files to UTF-8 using
a tool like <a href="http://www.gnu.org/software/libiconv/" target="_top">iconv</a>
(precompiled <a href="http://www.applied-mathematics.net/tools/iconv.zip">
win32 binary</a>).</li>
<li>Transparent memory management through the use of smart pointers.</li>
<li> Limited Support for character entities. The current known character entities
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -