⭐ 欢迎来到虫虫下载站! | 📦 资源下载 📁 资源专辑 ℹ️ 关于我们
⭐ 虫虫下载站

📄 gptextfile.pas

📁 Advanced Data Import Component Suite for Borland Delphi and C++ Builder allows you to import your da
💻 PAS
📖 第 1 页 / 共 5 页
字号:
{$B-,H+,J+,Q-,T-,X+}

unit GpTextFile;

{$I QImport3VerCtrl.Inc}

(*:Interface to 8/16-bit text files and streams. Uses GpHugeF unit for file
   access.
   @author Primoz Gabrijelcic
   @desc <pre>

This software is distributed under the BSD license.

Copyright (c) 2006, Primoz Gabrijelcic
All rights reserved.

Redistribution and use in source and binary forms, with or without modification,
are permitted provided that the following conditions are met:
- Redistributions of source code must retain the above copyright notice, this
  list of conditions and the following disclaimer.
- Redistributions in binary form must reproduce the above copyright notice,
  this list of conditions and the following disclaimer in the documentation
  and/or other materials provided with the distribution.
- The name of the Primoz Gabrijelcic may not be used to endorse or promote
  products derived from this software without specific prior written permission.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR
ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
(INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON
ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

   Author           : Primoz Gabrijelcic
   Creation date    : 1999-11-01
   Last modification: 2006-08-16
   Version          : 4.0
   Requires         : GpHugeF 4.0, GpTextStream 1.04
   </pre>
*)(*
   History:
     4.0: 2006-08-14
       - TGpTextFileStream
         - Added new constructor CreateW that uses Unicode encoding for the file name.
         - FileName property changed to WideString.
     3.07a: 2006-03-30
       - Release 3.07 introduced an 'interesting' bug - when file size modulo 2048 was in
         range 1..5, some garbage was generated after last line of file (when reading the
         file with Readln).
     3.07: 2006-02-06
       - Added support for UCS-4 encoding in a very primitive form - all high-word values
         are stripped away on read and set to 0 on write.
     3.06: 2005-12-22
       - File flags exposed through the readonly FileFlags property.
       - Fixed minor memory leak in Rewrite[Safe].
     3.05b: 2005-10-26
       - Opening existing file in UTF8 was completely broken.
     3.05a: 2004-07-16
       - Bug fixed: If Append was called with the [cfUnicode] flag and file did not exist
         before the call, Unicode marker was not written to the newly created file.
     3.05: 2003-05-16
       - Made Delphi 7 compatible.
     3.04c: 2002-11-26
       - Fixed reading of LF-delimited files (broken since 3.04).
     3.04b: 2002-10-16
       - Fixed lots of problems with TGPTextFile.Readln.
     3.04a: 2002-10-15
       - Fixed TGpTextFile.Eof (broken in 3.04).
     3.04: 2002-10-11
       - Faster (two to three times) TGpTextFile.Readln.
       - Faster (10% to 50%) TGpTextFile.Writeln.
     3.03: 2002-06-30
       - Added ofNo8BitCPConversion to TOpenFlags. If set Readln will not perform codepage
         conversion to Unicode on 8 bit text files.
       - Added cfNo8BitCPConversion to TCreateFlags. If set Writeln will not perform
         codepage conversion from Unicode on 8 bit text files.
       - Added public functions StringToWideStringNoCP and WideStringToStringNoCP.
     3.02: 2002-04-24
       - Added TCreateFlag cfCompressed.
     3.01a: 2001-12-15
       - Updated to compile with Delphi 6.
     3.01: 2001-07-17
       - TGpTextStream class moved into a separate unit (GpTextStream).
     3.0b: 2001-06-22
       - WriteString was not always returning a value. Fixed.
     3.0a: 2001-06-18
       - Bug fixed: function WriteString crashed if its argument was empty string.
     3.0: 2001-05-15
       - TGpTextFileStream class split into two classes. Basic functionality (Unicode
         encoding/decoding) was moved into TGpTextStream that can work on any TStream
         descendant. TGpTextFileStream now only opens the file and forwards it to the
         TGpTextStream.
     2.06: 2001-03-22
       - Added overloaded TGpTextFile.Write(s: string), available on D4 and better.
     2.05: 2001-02-27
       - Added property AcceptedDelimiters to TGpTextFile and TGpTextFileStream. You can
         use it do specify what CR-LF combinations should be treated as a line delimiter.
         If not set (or set to []), classes will behave as before - TGpTextFile will use
         CR, LF, CRLF, /2028/, and /000D/000A/ for line delimiters and TGpTextFileStream
         will use LF, CRLF, /2028/, and /000D/000A/ for line delimiters.
     2.04: 2001-01-31
       - All raised exceptions now have HelpContext set. All possible HelpContext values
         are enumerated in 'const' section at the very beginning of the unit. Thanks to
         Peter Evans for the suggestion.
       - TGpTextFileStream.Win32Check made protected (was public by mistake).
     2.03a: 2000-12-11
       - Fixed Append and AppendSafe to create a file if it does not exist.
     2.03: 2000-10-25
       - Removed one FreeAndNil call preventing unit to compile under Delphi 2, 3, and 4.
     2.02: 2000-10-21
       - Added support for Unicode pseudo-codepage 1200 (CP_UNICODE).
     2.01: 2000-10-12
       - Added UTF-8 support (CP_UTF8 codepage) to TGpTextFile and TGpTextFileStream. To
         work with Unicode file in UTF-8 encoding, set cfUnicode flag and CP_UTF8 code
         page at the same time. When working in UTF-8 mode, TGpTextFile doesn't support
         lines longer than 2.147.483.647 bytes.
         - When opening files in unknown format, it is probably best to first Reset it
           with flags=[cfUnicode] and codepage=CP_UTF8. Then you can check if file is
           16-bit Unicode (Is16bit). If it is not, you can change the Codepage to some
           other value (1252, for example) or just leave it at CP_UTF8 (reasonable default
           for processing XML files).
       - UTF-8 support will read and skip Byte Order Mark (EF BB BF) but will only write
         BOM if cfWriteUTF8BOM is specified in flags.
       - New functions TGpTextFile.Is16bit and TGpTextFileStream.Is16bit return true when
         file is Unicode with 16-bit (UCS-2) encoding. IsUnicode returns true when file is
         Unicode with 16-bit (UCS-2) or 8-bit (UTF-8) encoding.
       - Added symbolic constants for ISO code pages not defined in Windows.pas:
         ISO-8859-1, ISO-8859-2, ISO-8859-3, ISO-8859-4, ISO-8859-5, ISO-8859-6,
         ISO-8859-7, ISO-8859-8 (symbols are named ISO_8859_n).
     2.0: 2000-10-06
       - Created TGpTextFileStream class, descendant of TStream, which offers similar
         functionality as TGpTextFile - automatic detection of Unicode files, automatic
         code page remapping etc.
       - TGpTextFile.SetCodepage made public so it is possible to change code page on the
         fly.
       - Added parameters 'flags', 'waitObject', and 'codePage' to TGpTextFile.AppendSafe.
       - Order of 'flags' and 'bufferSize' parameters reversed in most methods.
       - Position of 'diskLockTimeout' and 'diskRetryDelay' parameters changed in all
         methods.
       - Fully documented.
       - All language-dependant string constants moved to resourcestring section.
     1.08: 2000-10-05
       - Added codePage parameter to all Reset and Rewrite functions. Default value of 0
         specifies conversion according to current codepage and any other number specifies
         codepage that should be used for conversion.
     1.07: 2000-08-01
       - All exceptions generated in this unit were converted to EGpTextFile exceptions
         (descendant of EGpHugeFile).
       - All Windows-generated exceptions not caught in TGpHugeFile are now converted to
         EGpTextFile exception.
       - Added parameter bufferSize to Append and AppendSafe.
       - Append* opened file in non-buffered mode. Fixed.
     1.06: 2000-06-22
       - Added overloaded version of Writeln.
     1.05: 2000-05-15
       - Rewrite now opens file in buffered mode.
     1.04: 2000-04-19
       - Added new ResetSafe/RewriteSafe parameter - waitObject. It is forwarded to
         TGpHugeFile.ResetEx/RewriteEx.
     1.03: 2000-03-03
       - Added OpenFlags parameter to Reset and ResetSafe. In D4 or higher its value
         defaults to [] so no source code changes to old applications will be required.
         Currently only supported OpenFlag is opCloseOnEOF, which enables hfoCloseOnEOF
         flag in TGpHugeFile (see GpHugeF.pas for more details).
     1.02: 1999-11-24
       - Added bufferSize parameter to Reset* and Rewrite*. By default (bufferSize = 0),
         64 KB buffer is allocated.
     1.01a: 1999-11-03
       - Append fixed.
     1.01: 1999-11-02
       - Added ResetSafe and RewriteSafe methods.
     1.0: 1999-11-01
       - First published version.
*)

{$IFDEF VER100}{$DEFINE D3PLUS}{$ENDIF}
{$IFDEF VER120}{$DEFINE D3PLUS}{$DEFINE D4PLUS}{$ENDIF}
{$IFDEF VER130}{$DEFINE D3PLUS}{$DEFINE D4PLUS}{$ENDIF}
{$IFDEF CONDITIONALEXPRESSIONS}
  {$DEFINE D3PLUS}
  {$DEFINE D4PLUS}
{$ENDIF}

interface

uses
  Windows,
  Classes,
  GpHugeF,
  GpTextStream;

// HelpContext values for all raised exceptions. All EGpHugeFile exception are
// re-raised without modifying HelpContext (which was already assigned in
// GpHugeF unit).
const
  //:Exception was handled and converted to EGpTextFile but was not expected and is not categorised.
  hcTFUnexpected              = 2000;
  //:Failed to append file.
  hcTFFailedToAppend          = 2003;
  //:Failed to reset file.
  hcTFFailedToReset           = 2004;
  //:Failed to rewrite file.
  hcTFFailedToRewrite         = 2005;
  //:Cannot append reversed Unicode file - not supported.
  hcTFCannotAppendReversed    = 2006;
  //:Cannot write to reversed Unicode file - not supported.
  hcTFCannotWriteReversed     = 2007;
  //:Parameter to Write method is invalid.
  hcTFInvalidParameter        = 2008;

type
  {:Base exception class for exceptions created in TGpTextFile and descendants.
  }
  EGpTextFile       = class(EGpHugeFile);

  {:Base exception class for exceptions created in TGpTextFileStream.
  }
  EGpTextFileStream = class(EGpHugeFileStream);

  {:Text file creation flags.
    @enum cfUnicode            Create Unicode file.
    @enum cfReverseByteOrder   Create unicode file with reversed byte order
                               (Motorola format). Set only on Reset, readonly.
                               Currently ignored in Rewrite.
    @enum cfUse2028            Use standard /2028/ instead of /000D/000A/ for
                               line delimiter (MS Notepad and MS Word do not
                               understand $2028 delimiter). Applies to Unicode
                               files only.
    @enum cfUseLF              Use /LF/ instead of /CR/LF/ for line delimiter.
                               Applies to 8-bit files only.
    @enum cfWriteUTF8BOM       Write UTF-8 Byte Order Mark to the beginning of
                               file.
    @enum cfCompressed         Will try to set the "compressed" attribute (when
                               running on NT and file is on NTFS drive).
    @enum cfNo8BitCPConversion Disable 8-bit-to-Unicode conversion on Read and
                               Write.
  }
  TCreateFlag = (cfUnicode, cfReverseByteOrder, cfUse2028, cfUseLF,
    cfWriteUTF8BOM, cfCompressed, cfNo8BitCPConversion);

  {:Set of all creation flags.
  }
  TCreateFlags = set of TCreateFlag;

  {:Text file open (reset) flags.
    @enum ofCloseOnEOF         Remaps to TGpHugeFile hfCloseOnEOF.
    @enum ofNo8BitCPConversion Disable 8-bit-to-Unicode conversion on Read and
                               Write.
  }
  TOpenFlag = (ofCloseOnEOF, ofNo8BitCPConversion);

  {:Set of all open flags.
  }
  TOpenFlags = set of TOpenFlag;

  {:Line delimiters.
    @enum ldCR       Carriage return (Mac style).
    @enum ldLF       Line feed (Unix style).
    @enum ldCRLF     Carriage return + Line feed (DOS style).
    @enum ldLFCR     Line feed + Carriage return (very unusual combination).
    @enum ld2028     /2028/ Unicode delimiter.
    @enum ld000D000A /000D/000A/ Windows-style Unicode delimiter.
  }
  TLineDelimiter = (ldCR, ldLF, ldCRLF, ldLFCR, ld2028, ld000D000A);

  {:Set of all line delimiters.
  }
  TLineDelimiters = set of TLineDelimiter;

  {:Unified 8/16-bit text file access. All strings passed as Unicode, conversion
    to/from 8-bit is done automatically according to specified code page.
    Access is buffered but direct-access functions (FilePos, Seek) are supported
    nevertheless.
  }
  TGpTextFile = class(TGpHugeFile)
  private
    tfCFlags            : TCreateFlags;
    tfCodePage          : word;
    tfLeof              : boolean;
    tfLineDelimiter     : array [0..7] of byte;
    tfLineDelimiterSize : integer;
    tfLineDelims        : TLineDelimiters;
    tfNo8BitCPConversion: boolean;
    tfOverRead          : integer;
    tfReadlnBuf         : array [1..2048+6] of byte; // size must be even for Unicode; 6 sentinel bytes are used to simplify EOL delimiter detection
    tfReadlnBufPos      : cardinal;
    tfReadlnBufSize     : cardinal;
    tfSmallBuf          : pointer;
  protected
    function  AllocTmpBuffer(size: integer): pointer; virtual;
    procedure ConvertCodepage(delimPos, delimLen: cardinal; var utf8ln: AnsiString;
      var wideLn: WideString);
    procedure FetchBlock(out endOfFile: boolean); virtual;
    procedure FreeTmpBuffer(var buffer: pointer); virtual;
    function  IsAfterEndOfBlock: boolean; virtual;
    function  IsUnicodeCodepage(codepage: word): boolean;
    procedure LocateDelimiter(var delimPos, delimLen: cardinal); virtual;
    procedure PrepareBuffer; virtual;
    procedure RebuildNewline; virtual;
    procedure ReverseBlock; virtual;
    procedure SetCodepage(cp: word); virtual;
    procedure WriteString(ws: WideString); virtual;
  public
    destructor Destroy; override;
    procedure Append(
      flags: TCreateFlags {$IFDEF D4plus}= []{$ENDIF};
      bufferSize: integer {$IFDEF D4plus}= 0{$ENDIF};
      codePage: word      {$IFDEF D4plus}= 0{$ENDIF});
    function  AppendSafe(
      flags: TCreateFlags      {$IFDEF D4plus}= []{$ENDIF};
      bufferSize: integer      {$IFDEF D4plus}= 0{$ENDIF};
      diskLockTimeout: integer {$IFDEF D4plus}= 0{$ENDIF};
      diskRetryDelay: integer  {$IFDEF D4plus}= 0{$ENDIF};
      waitObject: THandle      {$IFDEF D4plus}= 0{$ENDIF};
      codePage: word           {$IFDEF D4plus}= 0{$ENDIF}): THFError;
    function  EOF: boolean;
    function  Is16bit: boolean;
    function  IsUnicode: boolean;
    function  Readln: WideString;
    procedure Reset(
      flags: TOpenFlags   {$IFDEF D4plus}= []{$ENDIF};
      bufferSize: integer {$IFDEF D4plus}= 0{$ENDIF};
      codePage: word      {$IFDEF D4plus}= 0{$ENDIF});
    function  ResetSafe(
      flags: TOpenFlags        {$IFDEF D4plus}= []{$ENDIF};
      bufferSize: integer      {$IFDEF D4plus}= 0{$ENDIF};
      diskLockTimeout: integer {$IFDEF D4plus}= 0{$ENDIF};
      diskRetryDelay: integer  {$IFDEF D4plus}= 0{$ENDIF};
      waitObject: THandle      {$IFDEF D4plus}= 0{$ENDIF};
      codePage: word           {$IFDEF D4plus}= 0{$ENDIF}): THFError;
    procedure Rewrite(
      flags: TCreateFlags {$IFDEF D4plus}= []{$ENDIF};
      bufferSize: integer {$IFDEF D4plus}= 0{$ENDIF};
      codePage: word      {$IFDEF D4plus}= 0{$ENDIF});
    function  RewriteSafe(
      flags: TCreateFlags      {$IFDEF D4plus}= []{$ENDIF};
      bufferSize: integer      {$IFDEF D4plus}= 0{$ENDIF};

⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -