📄 gptextfile.pas
字号:
{$B-,H+,J+,Q-,T-,X+}
unit GpTextFile;
{$I QImport3VerCtrl.Inc}
(*:Interface to 8/16-bit text files and streams. Uses GpHugeF unit for file
access.
@author Primoz Gabrijelcic
@desc <pre>
This software is distributed under the BSD license.
Copyright (c) 2006, Primoz Gabrijelcic
All rights reserved.
Redistribution and use in source and binary forms, with or without modification,
are permitted provided that the following conditions are met:
- Redistributions of source code must retain the above copyright notice, this
list of conditions and the following disclaimer.
- Redistributions in binary form must reproduce the above copyright notice,
this list of conditions and the following disclaimer in the documentation
and/or other materials provided with the distribution.
- The name of the Primoz Gabrijelcic may not be used to endorse or promote
products derived from this software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR
ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
(INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON
ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
Author : Primoz Gabrijelcic
Creation date : 1999-11-01
Last modification: 2006-08-16
Version : 4.0
Requires : GpHugeF 4.0, GpTextStream 1.04
</pre>
*)(*
History:
4.0: 2006-08-14
- TGpTextFileStream
- Added new constructor CreateW that uses Unicode encoding for the file name.
- FileName property changed to WideString.
3.07a: 2006-03-30
- Release 3.07 introduced an 'interesting' bug - when file size modulo 2048 was in
range 1..5, some garbage was generated after last line of file (when reading the
file with Readln).
3.07: 2006-02-06
- Added support for UCS-4 encoding in a very primitive form - all high-word values
are stripped away on read and set to 0 on write.
3.06: 2005-12-22
- File flags exposed through the readonly FileFlags property.
- Fixed minor memory leak in Rewrite[Safe].
3.05b: 2005-10-26
- Opening existing file in UTF8 was completely broken.
3.05a: 2004-07-16
- Bug fixed: If Append was called with the [cfUnicode] flag and file did not exist
before the call, Unicode marker was not written to the newly created file.
3.05: 2003-05-16
- Made Delphi 7 compatible.
3.04c: 2002-11-26
- Fixed reading of LF-delimited files (broken since 3.04).
3.04b: 2002-10-16
- Fixed lots of problems with TGPTextFile.Readln.
3.04a: 2002-10-15
- Fixed TGpTextFile.Eof (broken in 3.04).
3.04: 2002-10-11
- Faster (two to three times) TGpTextFile.Readln.
- Faster (10% to 50%) TGpTextFile.Writeln.
3.03: 2002-06-30
- Added ofNo8BitCPConversion to TOpenFlags. If set Readln will not perform codepage
conversion to Unicode on 8 bit text files.
- Added cfNo8BitCPConversion to TCreateFlags. If set Writeln will not perform
codepage conversion from Unicode on 8 bit text files.
- Added public functions StringToWideStringNoCP and WideStringToStringNoCP.
3.02: 2002-04-24
- Added TCreateFlag cfCompressed.
3.01a: 2001-12-15
- Updated to compile with Delphi 6.
3.01: 2001-07-17
- TGpTextStream class moved into a separate unit (GpTextStream).
3.0b: 2001-06-22
- WriteString was not always returning a value. Fixed.
3.0a: 2001-06-18
- Bug fixed: function WriteString crashed if its argument was empty string.
3.0: 2001-05-15
- TGpTextFileStream class split into two classes. Basic functionality (Unicode
encoding/decoding) was moved into TGpTextStream that can work on any TStream
descendant. TGpTextFileStream now only opens the file and forwards it to the
TGpTextStream.
2.06: 2001-03-22
- Added overloaded TGpTextFile.Write(s: string), available on D4 and better.
2.05: 2001-02-27
- Added property AcceptedDelimiters to TGpTextFile and TGpTextFileStream. You can
use it do specify what CR-LF combinations should be treated as a line delimiter.
If not set (or set to []), classes will behave as before - TGpTextFile will use
CR, LF, CRLF, /2028/, and /000D/000A/ for line delimiters and TGpTextFileStream
will use LF, CRLF, /2028/, and /000D/000A/ for line delimiters.
2.04: 2001-01-31
- All raised exceptions now have HelpContext set. All possible HelpContext values
are enumerated in 'const' section at the very beginning of the unit. Thanks to
Peter Evans for the suggestion.
- TGpTextFileStream.Win32Check made protected (was public by mistake).
2.03a: 2000-12-11
- Fixed Append and AppendSafe to create a file if it does not exist.
2.03: 2000-10-25
- Removed one FreeAndNil call preventing unit to compile under Delphi 2, 3, and 4.
2.02: 2000-10-21
- Added support for Unicode pseudo-codepage 1200 (CP_UNICODE).
2.01: 2000-10-12
- Added UTF-8 support (CP_UTF8 codepage) to TGpTextFile and TGpTextFileStream. To
work with Unicode file in UTF-8 encoding, set cfUnicode flag and CP_UTF8 code
page at the same time. When working in UTF-8 mode, TGpTextFile doesn't support
lines longer than 2.147.483.647 bytes.
- When opening files in unknown format, it is probably best to first Reset it
with flags=[cfUnicode] and codepage=CP_UTF8. Then you can check if file is
16-bit Unicode (Is16bit). If it is not, you can change the Codepage to some
other value (1252, for example) or just leave it at CP_UTF8 (reasonable default
for processing XML files).
- UTF-8 support will read and skip Byte Order Mark (EF BB BF) but will only write
BOM if cfWriteUTF8BOM is specified in flags.
- New functions TGpTextFile.Is16bit and TGpTextFileStream.Is16bit return true when
file is Unicode with 16-bit (UCS-2) encoding. IsUnicode returns true when file is
Unicode with 16-bit (UCS-2) or 8-bit (UTF-8) encoding.
- Added symbolic constants for ISO code pages not defined in Windows.pas:
ISO-8859-1, ISO-8859-2, ISO-8859-3, ISO-8859-4, ISO-8859-5, ISO-8859-6,
ISO-8859-7, ISO-8859-8 (symbols are named ISO_8859_n).
2.0: 2000-10-06
- Created TGpTextFileStream class, descendant of TStream, which offers similar
functionality as TGpTextFile - automatic detection of Unicode files, automatic
code page remapping etc.
- TGpTextFile.SetCodepage made public so it is possible to change code page on the
fly.
- Added parameters 'flags', 'waitObject', and 'codePage' to TGpTextFile.AppendSafe.
- Order of 'flags' and 'bufferSize' parameters reversed in most methods.
- Position of 'diskLockTimeout' and 'diskRetryDelay' parameters changed in all
methods.
- Fully documented.
- All language-dependant string constants moved to resourcestring section.
1.08: 2000-10-05
- Added codePage parameter to all Reset and Rewrite functions. Default value of 0
specifies conversion according to current codepage and any other number specifies
codepage that should be used for conversion.
1.07: 2000-08-01
- All exceptions generated in this unit were converted to EGpTextFile exceptions
(descendant of EGpHugeFile).
- All Windows-generated exceptions not caught in TGpHugeFile are now converted to
EGpTextFile exception.
- Added parameter bufferSize to Append and AppendSafe.
- Append* opened file in non-buffered mode. Fixed.
1.06: 2000-06-22
- Added overloaded version of Writeln.
1.05: 2000-05-15
- Rewrite now opens file in buffered mode.
1.04: 2000-04-19
- Added new ResetSafe/RewriteSafe parameter - waitObject. It is forwarded to
TGpHugeFile.ResetEx/RewriteEx.
1.03: 2000-03-03
- Added OpenFlags parameter to Reset and ResetSafe. In D4 or higher its value
defaults to [] so no source code changes to old applications will be required.
Currently only supported OpenFlag is opCloseOnEOF, which enables hfoCloseOnEOF
flag in TGpHugeFile (see GpHugeF.pas for more details).
1.02: 1999-11-24
- Added bufferSize parameter to Reset* and Rewrite*. By default (bufferSize = 0),
64 KB buffer is allocated.
1.01a: 1999-11-03
- Append fixed.
1.01: 1999-11-02
- Added ResetSafe and RewriteSafe methods.
1.0: 1999-11-01
- First published version.
*)
{$IFDEF VER100}{$DEFINE D3PLUS}{$ENDIF}
{$IFDEF VER120}{$DEFINE D3PLUS}{$DEFINE D4PLUS}{$ENDIF}
{$IFDEF VER130}{$DEFINE D3PLUS}{$DEFINE D4PLUS}{$ENDIF}
{$IFDEF CONDITIONALEXPRESSIONS}
{$DEFINE D3PLUS}
{$DEFINE D4PLUS}
{$ENDIF}
interface
uses
Windows,
Classes,
GpHugeF,
GpTextStream;
// HelpContext values for all raised exceptions. All EGpHugeFile exception are
// re-raised without modifying HelpContext (which was already assigned in
// GpHugeF unit).
const
//:Exception was handled and converted to EGpTextFile but was not expected and is not categorised.
hcTFUnexpected = 2000;
//:Failed to append file.
hcTFFailedToAppend = 2003;
//:Failed to reset file.
hcTFFailedToReset = 2004;
//:Failed to rewrite file.
hcTFFailedToRewrite = 2005;
//:Cannot append reversed Unicode file - not supported.
hcTFCannotAppendReversed = 2006;
//:Cannot write to reversed Unicode file - not supported.
hcTFCannotWriteReversed = 2007;
//:Parameter to Write method is invalid.
hcTFInvalidParameter = 2008;
type
{:Base exception class for exceptions created in TGpTextFile and descendants.
}
EGpTextFile = class(EGpHugeFile);
{:Base exception class for exceptions created in TGpTextFileStream.
}
EGpTextFileStream = class(EGpHugeFileStream);
{:Text file creation flags.
@enum cfUnicode Create Unicode file.
@enum cfReverseByteOrder Create unicode file with reversed byte order
(Motorola format). Set only on Reset, readonly.
Currently ignored in Rewrite.
@enum cfUse2028 Use standard /2028/ instead of /000D/000A/ for
line delimiter (MS Notepad and MS Word do not
understand $2028 delimiter). Applies to Unicode
files only.
@enum cfUseLF Use /LF/ instead of /CR/LF/ for line delimiter.
Applies to 8-bit files only.
@enum cfWriteUTF8BOM Write UTF-8 Byte Order Mark to the beginning of
file.
@enum cfCompressed Will try to set the "compressed" attribute (when
running on NT and file is on NTFS drive).
@enum cfNo8BitCPConversion Disable 8-bit-to-Unicode conversion on Read and
Write.
}
TCreateFlag = (cfUnicode, cfReverseByteOrder, cfUse2028, cfUseLF,
cfWriteUTF8BOM, cfCompressed, cfNo8BitCPConversion);
{:Set of all creation flags.
}
TCreateFlags = set of TCreateFlag;
{:Text file open (reset) flags.
@enum ofCloseOnEOF Remaps to TGpHugeFile hfCloseOnEOF.
@enum ofNo8BitCPConversion Disable 8-bit-to-Unicode conversion on Read and
Write.
}
TOpenFlag = (ofCloseOnEOF, ofNo8BitCPConversion);
{:Set of all open flags.
}
TOpenFlags = set of TOpenFlag;
{:Line delimiters.
@enum ldCR Carriage return (Mac style).
@enum ldLF Line feed (Unix style).
@enum ldCRLF Carriage return + Line feed (DOS style).
@enum ldLFCR Line feed + Carriage return (very unusual combination).
@enum ld2028 /2028/ Unicode delimiter.
@enum ld000D000A /000D/000A/ Windows-style Unicode delimiter.
}
TLineDelimiter = (ldCR, ldLF, ldCRLF, ldLFCR, ld2028, ld000D000A);
{:Set of all line delimiters.
}
TLineDelimiters = set of TLineDelimiter;
{:Unified 8/16-bit text file access. All strings passed as Unicode, conversion
to/from 8-bit is done automatically according to specified code page.
Access is buffered but direct-access functions (FilePos, Seek) are supported
nevertheless.
}
TGpTextFile = class(TGpHugeFile)
private
tfCFlags : TCreateFlags;
tfCodePage : word;
tfLeof : boolean;
tfLineDelimiter : array [0..7] of byte;
tfLineDelimiterSize : integer;
tfLineDelims : TLineDelimiters;
tfNo8BitCPConversion: boolean;
tfOverRead : integer;
tfReadlnBuf : array [1..2048+6] of byte; // size must be even for Unicode; 6 sentinel bytes are used to simplify EOL delimiter detection
tfReadlnBufPos : cardinal;
tfReadlnBufSize : cardinal;
tfSmallBuf : pointer;
protected
function AllocTmpBuffer(size: integer): pointer; virtual;
procedure ConvertCodepage(delimPos, delimLen: cardinal; var utf8ln: AnsiString;
var wideLn: WideString);
procedure FetchBlock(out endOfFile: boolean); virtual;
procedure FreeTmpBuffer(var buffer: pointer); virtual;
function IsAfterEndOfBlock: boolean; virtual;
function IsUnicodeCodepage(codepage: word): boolean;
procedure LocateDelimiter(var delimPos, delimLen: cardinal); virtual;
procedure PrepareBuffer; virtual;
procedure RebuildNewline; virtual;
procedure ReverseBlock; virtual;
procedure SetCodepage(cp: word); virtual;
procedure WriteString(ws: WideString); virtual;
public
destructor Destroy; override;
procedure Append(
flags: TCreateFlags {$IFDEF D4plus}= []{$ENDIF};
bufferSize: integer {$IFDEF D4plus}= 0{$ENDIF};
codePage: word {$IFDEF D4plus}= 0{$ENDIF});
function AppendSafe(
flags: TCreateFlags {$IFDEF D4plus}= []{$ENDIF};
bufferSize: integer {$IFDEF D4plus}= 0{$ENDIF};
diskLockTimeout: integer {$IFDEF D4plus}= 0{$ENDIF};
diskRetryDelay: integer {$IFDEF D4plus}= 0{$ENDIF};
waitObject: THandle {$IFDEF D4plus}= 0{$ENDIF};
codePage: word {$IFDEF D4plus}= 0{$ENDIF}): THFError;
function EOF: boolean;
function Is16bit: boolean;
function IsUnicode: boolean;
function Readln: WideString;
procedure Reset(
flags: TOpenFlags {$IFDEF D4plus}= []{$ENDIF};
bufferSize: integer {$IFDEF D4plus}= 0{$ENDIF};
codePage: word {$IFDEF D4plus}= 0{$ENDIF});
function ResetSafe(
flags: TOpenFlags {$IFDEF D4plus}= []{$ENDIF};
bufferSize: integer {$IFDEF D4plus}= 0{$ENDIF};
diskLockTimeout: integer {$IFDEF D4plus}= 0{$ENDIF};
diskRetryDelay: integer {$IFDEF D4plus}= 0{$ENDIF};
waitObject: THandle {$IFDEF D4plus}= 0{$ENDIF};
codePage: word {$IFDEF D4plus}= 0{$ENDIF}): THFError;
procedure Rewrite(
flags: TCreateFlags {$IFDEF D4plus}= []{$ENDIF};
bufferSize: integer {$IFDEF D4plus}= 0{$ENDIF};
codePage: word {$IFDEF D4plus}= 0{$ENDIF});
function RewriteSafe(
flags: TCreateFlags {$IFDEF D4plus}= []{$ENDIF};
bufferSize: integer {$IFDEF D4plus}= 0{$ENDIF};
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -