📄 libxmlparser.pas

📁 用AIML语言开发的一款中文自动聊天软件
💻 PAS
📖 第 1 页 / 共 2 页
字号:
12 下一页
(**
===============================================================================================
Name    : LibXmlParser
===============================================================================================
Project : All Projects
===============================================================================================
Subject : Progressive XML Parser for all types of XML Files
===============================================================================================
Author  : Stefan Heymann
          Eschenweg 3
          72076 T黚ingen
          GERMANY

E-Mail:   stefan@destructor.de
URL:      www.destructor.de
===============================================================================================
Source, Legals ("Licence")
--------------------------
The official site to get this parser is http://www.destructor.de/

Usage and Distribution of this Source Code is ruled by the
"Destructor.de Source code Licence" (DSL) which comes with this file or
can be downloaded at http://www.destructor.de/

IN SHORT: Usage and distribution of this source code is free.
          You use it completely on your own risk.

Postcardware
------------
If you like this code, please send a postcard of your city to my above address.
===============================================================================================
!!!  All parts of this code which are not finished or not conforming exactly to
     the XmlSpec are marked with three exclamation marks

-!-  Parts where the parser may be able to detect errors in the document's syntax are
     marked with the dash-exlamation mark-dash sequence.
===============================================================================================
Terminology:
------------
- Start:   Start of a buffer part
- Final:   End (last character) of a buffer part
- DTD:     Document Type Definition
- DTDc:    Document Type Declaration
- XMLSpec: The current W3C XML Recommendation (version 1.0 as of 1998-02-10), Chapter No.
- Cur*:    Fields concerning the "Current" part passed back by the "Scan" method
===============================================================================================
Scanning the XML document
-------------------------
- Create TXmlParser Instance                     MyXml := TXmlParser.Create;
- Load XML Document                              MyXml.LoadFromFile (Filename);
- Start Scanning                                 MyXml.StartScan;
- Scan Loop                                      WHILE MyXml.Scan DO
- Test for Part Type                               CASE MyXml.CurPartType OF
- Handle Parts                                       ... : ;;;
- Handle Parts                                       ... : ;;;
- Handle Parts                                       ... : ;;;
                                                     END;
- Destroy                                        MyXml.Free;
===============================================================================================
Loading the XML document
------------------------
You can load the XML document from a file with the "LoadFromFile" method.
It is beyond the scope of this parser to perform HTTP or FTP accesses. If you want your
application to handle such requests (URLs), you can load the XML via HTTP or FTP or whatever
protocol and hand over the data buffer using the "LoadFromBuffer" or "SetBuffer" method.
"LoadFromBuffer" loads the internal buffer of TXmlParser with the given null-terminated
string, thereby creating a copy of that buffer.
"SetBuffer" just takes the pointer to another buffer, which means that the given
buffer pointer must be valid while the document is accessed via TXmlParser.
===============================================================================================
Encodings:
----------
This XML parser kind of "understands" the following encodings:
- UTF-8
- ISO-8859-1
- Windows-1252

Any flavor of multi-byte characters (and this includes UTF-16) is not supported. Sorry.

Every string which has to be passed to the application passes the virtual method
"TranslateEncoding" which translates the string from the current encoding (stored in
"CurEncoding") into the encoding the application wishes to receive.
The "TranslateEncoding" method that is built into TXmlParser assumes that the application
wants to receive Windows ANSI (Windows-1252, about the same as ISO-8859-1) and is able
to convert UTF-8 and ISO-8859-1 encodings.
For other source and target encodings, you will have to override "TranslateEncoding".
===============================================================================================
Buffer Handling
---------------
- The document must be loaded completely into a piece of RAM
- All character positions are referenced by PChar pointers
- The TXmlParser instance can either "own" the buffer itself (then, FBufferSize is > 0)
  or reference the buffer of another instance or object (then, FBuffersize is 0 and
  FBuffer is not NIL)
- The Property DocBuffer passes back a pointer to the first byte of the document. If there
  is no document stored (FBuffer is NIL), the DocBuffer returns a pointer to a NULL character.
===============================================================================================
Whitespace Handling
-------------------
The TXmlParser property "PackSpaces" determines how Whitespace is returned in Text Content:
While PackSpaces is true, all leading and trailing whitespace characters are trimmed of, all
Whitespace is converted to Space #x20 characters and contiguous Whitespace characters are
compressed to one.
If the "Scan" method reports a ptContent part, the application can get the original text
with all whitespace characters by extracting the characters from "CurStart" to "CurFinal".
If the application detects an xml:space attribute, it can set "PackSpaces" accordingly or
use CurStart/CurFinal.
Please note that TXmlParser does _not_ normalize Line Breaks to single LineFeed characters
as the XmlSpec requires (XmlSpec 2.11).
The xml:space attribute is not handled by TXmlParser. This is on behalf of the application.
===============================================================================================
Non-XML-Conforming
------------------
TXmlParser does not conform 100 % exactly to the XmlSpec:
- UTF-16 is not supported (XmlSpec 2.2)
  (Workaround: Convert UTF-16 to UTF-8 and hand the buffer over to TXmlParser)
- As the parser only works with single byte strings, all Unicode characters > 255
  can currently not be handled correctly.
- Line breaks are not normalized to single Linefeed #x0A characters (XmlSpec 2.11)
  (Workaround: The Application can access the text contents on its own [CurStart, CurFinal],
  thereby applying every normalization it wishes to)
- The attribute value normalization does not work exactly as defined in the
  Second Edition of the XML 1.0 specification.
- See also the code parts marked with three consecutive exclamation marks. These are
  parts which are not finished in the current code release.

This list may be incomplete, so it may grow if I get to know any other points.
As work on the parser proceeds, this list may also shrink.
===============================================================================================
Things Todo
-----------
- Introduce a new event/callback which is called when there is an unresolvable
  entity or character reference
- Support Unicode
- Use Streams instead of reading the whole XML into memory
===============================================================================================
Change History, Version numbers
-------------------------------
The Date is given in ISO Year-Month-Day (YYYY-MM-DD) order.
Versions are counted from 1.0.0 beginning with the version from 2000-03-16.
Unreleased versions don't get a version number.

Date        Author Version Changes
-----------------------------------------------------------------------------------------------
2000-03-16  HeySt  1.0.0   Start
2000-03-28  HeySt  1.0.1   Initial Publishing of TXmlParser on the destructor.de Web Site
2000-03-30  HeySt  1.0.2   TXmlParser.AnalyzeCData: Call "TranslateEncoding" for CurContent
2000-03-31  HeySt  1.0.3   Deleted the StrPosE function (was not needed anyway)
2000-04-04  HeySt  1.0.4   TDtdElementRec modified: Start/Final for all Elements;
                           Should be backwards compatible.
                           AnalyzeDtdc: Set CurPartType to ptDtdc
2000-04-23  HeySt  1.0.5   New class TObjectList. Eliminated reference to the Delphi 5
                           "Contnrs" unit so LibXmlParser is Delphi 4 compatible.
2000-07-03  HeySt  1.0.6   TNvpNode: Added Constructor
2000-07-11  HeySt  1.0.7   Removed "Windows" from USES clause
                           Added three-exclamation-mark comments for Utf8ToAnsi/AnsiToUtf8
                           Added three-exclamation-mark comments for CHR function calls
2000-07-23  HeySt  1.0.8   TXmlParser.Clear: CurAttr.Clear; EntityStack.Clear;
                           (This was not a bug; just defensive programming)
2000-07-29  HeySt  1.0.9   TNvpList: Added methods: Node(Index), Value(Index), Name(Index);
2000-10-07  HeySt          Introduced Conditional Defines
                           Uses Contnrs unit and its TObjectList class again for
                           Delphi 5 and newer versions
2001-01-30  HeySt          Introduced Version Numbering
                           Made LoadFromFile and LoadFromBuffer BOOLEAN functions
                           Introduced FileMode parameter for LoadFromFile
                           BugFix: TAttrList.Analyze: Must add CWhitespace to ExtractName call
                           Comments worked over
2001-02-28  HeySt  1.0.10  Completely worked over and tested the UTF-8 functions
                           Fixed a bug in TXmlParser.Scan which caused it to start over when it
                           was called after the end of scanning, resulting in an endless loop
                           TEntityStack is now a TObjectList instead of TList
2001-07-03  HeySt  1.0.11  Updated Compiler Version IFDEFs for Kylix
2001-07-11  HeySt  1.0.12  New TCustomXmlScanner component (taken over from LibXmlComps.pas)
2001-07-14  HeySt  1.0.13  Bugfix TCustomXmlScanner.FOnTranslateEncoding
2001-10-22  HeySt          Don't clear CurName anymore when the parser finds a CDATA section.
2001-12-03  HeySt  1.0.14  TObjectList.Clear: Make call to INHERITED method (fixes a memory leak)
2001-12-05  HeySt  1.0.15  TObjectList.Clear: removed call to INHERITED method
                           TObjectList.Destroy: Inserted SetCapacity call.
                           Reduces need for frequent re-allocation of pointer buffer
                           Dedicated to my father, Theodor Heymann
2002-06-26  HeySt  1.0.16  TXmlParser.Scan: Fixed a bug with PIs whose name is beginning
                           with 'xml'. Thanks to Uwe Kamm for submitting this bug.
                           The CurEncoding property is now always in uppercase letters (the XML
                           spec wants it to be treated case independently so when it's uppercase
                           comparisons are faster)
2002-03-04  HeySt  1.0.17  Included an IFDEF for Delphi 7 (VER150) and Kylix
                           There is a new symbol HAS_CONTNRS_UNIT which is used now to
                           distinguish between IDEs which come with the Contnrs unit and
                           those that don't.
*)


// --- Delphi/Kylix Version Numbers
//     As this is no code, this does not blow up your object or executable code at all
       (*$IFDEF LINUX *)
         (*$DEFINE K1_OR_NEWER *)
       (*$ENDIF *)

       (*$IFDEF MSWINDOWS *)
         (*$DEFINE D1_OR_NEWER *)
         (*$IFNDEF VER80 *)
           (*$DEFINE D2_OR_NEWER *)
           (*$IFNDEF VER90 *)
             (*$DEFINE D3_OR_NEWER *)
             (*$IFNDEF VER100 *)
               (*$DEFINE D4_OR_NEWER *)
               (*$IFNDEF VER120 *)
                 (*$DEFINE D5_OR_NEWER *)
                 (*$IFNDEF VER130 *)
                   (*$IFNDEF VER140 *)
                     (*$IFNDEF VER150 *)
                       If the compiler gets stuck here,
                       you are using a compiler version unknown to this code.
                       You will probably have to change this code accordingly.
                       At first, try to comment out these lines and see what will happen.
                     (*$ENDIF *)
                   (*$ENDIF *)
                 (*$ENDIF *)
               (*$ENDIF *)
             (*$ENDIF *)
           (*$ENDIF *)
         (*$ENDIF *)
       (*$ENDIF *)

       (*$IFDEF D5_OR_NEWER *)
         (*$DEFINE HAS_CONTNRS_UNIT *)
       (*$ENDIF *)

       (*$IFDEF K1_OR_NEWER *)
         (*$DEFINE HAS_CONTNRS_UNIT *)
       (*$ENDIF *)


UNIT LibXmlParser;

INTERFACE

USES
  SysUtils, Classes,
  (*$IFDEF HAS_CONTNRS_UNIT *)  // The Contnrs Unit was introduced in Delphi 5
  Contnrs,
  (*$ENDIF*)
  Math;

CONST
  CVersion = '1.0.17';  // This variable will be updated for every release
                        // (I hope, I won't forget to do it everytime ...)

TYPE
  TPartType    = // --- Document Part Types
                 (ptNone,            // Nothing
                  ptXmlProlog,       // XML Prolog                  XmlSpec 2.8 / 4.3.1
                  ptComment,         // Comment                     XmlSpec 2.5
                  ptPI,              // Processing Instruction      XmlSpec 2.6
                  ptDtdc,            // Document Type Declaration   XmlSpec 2.8
                  ptStartTag,        // Start Tag                   XmlSpec 3.1
                  ptEmptyTag,        // Empty-Element Tag           XmlSpec 3.1
12 下一页
💿 文件大小 1224 K
👤 上传用户 tswccyt
📂 所属分类人工智能/神经网络
🏷️ 相关标签

#AIML #语言 #自动 #软件
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -