📄 encoding.3
字号:
.PP\fBTcl_GetEncodingName\fR is roughly the inverse of \fBTcl_GetEncoding\fR.Given an \fIencoding\fR, the return value is the \fIname\fR argument thatwas used to create the encoding. The string returned by \fBTcl_GetEncodingName\fR is only guaranteed to persist until the\fIencoding\fR is deleted. The caller must not modify this string..PP\fBTcl_SetSystemEncoding\fR sets the default encoding that should be usedwhenever the user passes a NULL value for the \fIencoding\fR argument toany of the other encoding functions. If \fIname\fR is NULL, the systemencoding is reset to the default system encoding, \fBbinary\fR. If thename did not refer to any known or loadable encoding, TCL_ERROR isreturned and an error message is left in \fIinterp\fR. Otherwise, thisprocedure increments the reference count of the new system encoding,decrements the reference count of the old system encoding, and returnsTCL_OK..PP\fBTcl_GetEncodingNames\fR sets the \fIinterp\fR result to a listconsisting of the names of all the encodings that are currently definedor can be dynamically loaded, searching the encoding path specified by\fBTcl_SetDefaultEncodingDir\fR. This procedure does not ensure that thedynamically-loadable encoding files contain valid data, but merely that theyexist..PP\fBTcl_CreateEncoding\fR defines a new encoding and registers the Cprocedures that are called back to convert between the encoding andUTF-8. Encodings created by \fBTcl_CreateEncoding\fR are thereaftervisible in the database used by \fBTcl_GetEncoding\fR. Just as with the\fBTcl_GetEncoding\fR procedure, the return value is a token thatrepresents the encoding and can be used in subsequent calls to otherencoding functions. \fBTcl_CreateEncoding\fR returns an encoding with areference count of 1. If an encoding with the specified \fIname\fRalready exists, then its entry in the database is replaced with the newencoding; the token for the old encoding will remain valid and continueto behave as before, but users of the new token will now call the newencoding procedures. .PPThe \fItypePtr\fR argument to \fBTcl_CreateEncoding\fR contains information about the name of the encoding and the procedures that will be called toconvert between this encoding and UTF-8. It is defined as follows:.PP.CStypedef struct Tcl_EncodingType { CONST char *\fIencodingName\fR; Tcl_EncodingConvertProc *\fItoUtfProc\fR; Tcl_EncodingConvertProc *\fIfromUtfProc\fR; Tcl_EncodingFreeProc *\fIfreeProc\fR; ClientData \fIclientData\fR; int \fInullSize\fR;} Tcl_EncodingType; .CE.PPThe \fIencodingName\fR provides a string name for the encoding, bywhich it can be referred in other procedures such as\fBTcl_GetEncoding\fR. The \fItoUtfProc\fR refers to a callbackprocedure to invoke to convert text from this encoding into UTF-8.The \fIfromUtfProc\fR refers to a callback procedure to invoke toconvert text from UTF-8 into this encoding. The \fIfreeProc\fR refersto a callback procedure to invoke when this encoding is deleted. The\fIfreeProc\fR field may be NULL. The \fIclientData\fR contains anarbitrary one-word value passed to \fItoUtfProc\fR, \fIfromUtfProc\fR,and \fIfreeProc\fR whenever they are called. Typically, this is apointer to a data structure containing encoding-specific informationthat can be used by the callback procedures. For instance, two verysimilar encodings such as \fBascii\fR and \fBmacRoman\fR may use thesame callback procedure, but use different values of \fIclientData\fRto control its behavior. The \fInullSize\fR specifies the number ofzero bytes that signify end-of-string in this encoding. It must be\fB1\fR (for single-byte or multi-byte encodings like ASCII orShift-JIS) or \fB2\fR (for double-byte encodings like Unicode).Constant-sized encodings with 3 or more bytes per character (such asCNS11643) are not accepted..PPThe callback procedures \fItoUtfProc\fR and \fIfromUtfProc\fR should match thetype \fBTcl_EncodingConvertProc\fR:.PP.CStypedef int Tcl_EncodingConvertProc( ClientData \fIclientData\fR, CONST char *\fIsrc\fR, int \fIsrcLen\fR, int \fIflags\fR, Tcl_Encoding *\fIstatePtr\fR, char *\fIdst\fR, int \fIdstLen\fR, int *\fIsrcReadPtr\fR, int *\fIdstWrotePtr\fR, int *\fIdstCharsPtr\fR);.CE.PPThe \fItoUtfProc\fR and \fIfromUtfProc\fR procedures are called by the\fBTcl_ExternalToUtf\fR or \fBTcl_UtfToExternal\fR family of functions toperform the actual conversion. The \fIclientData\fR parameter to theseprocedures is the same as the \fIclientData\fR field specified to\fBTcl_CreateEncoding\fR when the encoding was created. The remainingarguments to the callback procedures are the same as the arguments,documented at the top, to \fBTcl_ExternalToUtf\fR or\fBTcl_UtfToExternal\fR, with the following exceptions. If the\fIsrcLen\fR argument to one of those high-level functions is negative,the value passed to the callback procedure will be the appropriateencoding-specific string length of \fIsrc\fR. If any of the \fIsrcReadPtr\fR, \fIdstWrotePtr\fR, or \fIdstCharsPtr\fR arguments to one of the high-levelfunctions is NULL, the corresponding value passed to the callbackprocedure will be a non-NULL location..PPThe callback procedure \fIfreeProc\fR, if non-NULL, should match the type \fBTcl_EncodingFreeProc\fR:.CStypedef void Tcl_EncodingFreeProc( ClientData \fIclientData\fR);.CE.PPThis \fIfreeProc\fR function is called when the encoding is deleted. The\fIclientData\fR parameter is the same as the \fIclientData\fR fieldspecified to \fBTcl_CreateEncoding\fR when the encoding was created. .PP\fBTcl_GetDefaultEncodingDir\fR and \fBTcl_SetDefaultEncodingDir\fRaccess and set the directory to use when locating the default encodingfiles. If this value is not NULL, the \fBTclpInitLibraryPath\fR routineappends the path to the head of the search path, and uses this path asthe first place to look into when trying to locate the encoding file..SH "ENCODING FILES"Space would prohibit precompiling into Tcl every possible encodingalgorithm, so many encodings are stored on disk as dynamically-loadableencoding files. This behavior also allows the user to create additionalencoding files that can be loaded using the same mechanism. Theseencoding files contain information about the tables and/or escapesequences used to map between an external encoding and Unicode. Theexternal encoding may consist of single-byte, multi-byte, or double-bytecharacters. .PPEach dynamically-loadable encoding is represented as a text file. Theinitial line of the file, beginning with a ``#'' symbol, is a commentthat provides a human-readable description of the file. The next lineidentifies the type of encoding file. It can be one of the followingletters:.IP "[1] \fBS\fR"A single-byte encoding, where one character is always one byte long in theencoding. An example is \fBiso8859-1\fR, used by many European languages..IP "[2] \fBD\fR"A double-byte encoding, where one character is always two bytes long in theencoding. An example is \fBbig5\fR, used for Chinese text..IP "[3] \fBM\fR"A multi-byte encoding, where one character may be either one or two bytes long.Certain bytes are a lead bytes, indicating that another byte must followand that together the two bytes represent one character. Other bytes are notlead bytes and represent themselves. An example is \fBshiftjis\fR, used bymany Japanese computers..IP "[4] \fBE\fR"An escape-sequence encoding, specifying that certain sequences of bytesdo not represent characters, but commands that describe how following bytesshould be interpreted. .PPThe rest of the lines in the file depend on the type. .PPCases [1], [2], and [3] are collectively referred to as table-based encodingfiles. The lines in a table-based encoding file are in the sameformat as this example taken from the \fBshiftjis\fR encoding (this is notthe complete file):.CS# Encoding file: shiftjis, multi-byteM003F 0 40000000000100020003000400050006000700080009000A000B000C000D000E000F0010001100120013001400150016001700180019001A001B001C001D001E001F0020002100220023002400250026002700280029002A002B002C002D002E002F0030003100320033003400350036003700380039003A003B003C003D003E003F0040004100420043004400450046004700480049004A004B004C004D004E004F0050005100520053005400550056005700580059005A005B005C005D005E005F0060006100620063006400650066006700680069006A006B006C006D006E006F0070007100720073007400750076007700780079007A007B007C007D203E007F008000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000FF61FF62FF63FF64FF65FF66FF67FF68FF69FF6AFF6BFF6CFF6DFF6EFF6FFF70FF71FF72FF73FF74FF75FF76FF77FF78FF79FF7AFF7BFF7CFF7DFF7EFF7FFF80FF81FF82FF83FF84FF85FF86FF87FF88FF89FF8AFF8BFF8CFF8DFF8EFF8FFF90FF91FF92FF93FF94FF95FF96FF97FF98FF99FF9AFF9BFF9CFF9DFF9EFF9F00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000810000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000300030013002FF0CFF0E30FBFF1AFF1BFF1FFF01309B309C00B4FF4000A8FF3EFFE3FF3F30FD30FE309D309E30034EDD30053006300730FC20152010FF0F005C301C2016FF5C2026202520182019201C201DFF08FF0930143015FF3BFF3DFF5BFF5D30083009300A300B300C300D300E300F30103011FF0B221200B100D7000000F7FF1D2260FF1CFF1E22662267221E22342642264000B0203220332103FFE5FF0400A200A3FF05FF03FF06FF0AFF2000A72606260525CB25CF25CE25C725C625A125A025B325B225BD25BC203B301221922190219121933013000000000000000000000000000000000000000000002208220B2286228722822283222A2229000000000000000000000000000000002227222800AC21D221D42200220300000000000000000000000000000000000000000000222022A523122202220722612252226A226B221A223D221D2235222B222C0000000000000000000000000000212B2030266F266D266A2020202100B6000000000000000025EF000000000000.CE.PPThe third line of the file is three numbers. The first number is thefallback character (in base 16) to use when converting from UTF-8 to thisencoding. The second number is a \fB1\fR if this file represents theencoding for a symbol font, or \fB0\fR otherwise. The last number (in base10) is how many pages of data follow. .PPSubsequent lines in the example above are pages that describe how to mapfrom the encoding into 2-byte Unicode. The first line in a page identifiesthe page number. Following it are 256 double-byte numbers, arranged as 16rows of 16 numbers. Given a character in the encoding, the high byte ofthat character is used to select which page, and the low byte of thatcharacter is used as an index to select one of the double-byte numbers inthat page \- the value obtained being the corresponding Unicode character.By examination of the example above, one can see that the characters 0x7Eand 0x8163 in \fBshiftjis\fR map to 203E and 2026 in Unicode, respectively..PPFollowing the first page will be all the other pages, each in the sameformat as the first: one number identifying the page followed by 256double-byte Unicode characters. If a character in the encoding maps to theUnicode character 0000, it means that the character doesn't actually exist.If all characters on a page would map to 0000, that page can be omitted..PPCase [4] is the escape-sequence encoding file. The lines in an this type offile are in the same format as this example taken from the \fBiso2022-jp\fRencoding:.CS.ta 1.5i# Encoding file: iso2022-jp, escape-drivenEinit {}final {}iso8859-1 \\x1b(Bjis0201 \\x1b(Jjis0208 \\x1b$@jis0208 \\x1b$Bjis0212 \\x1b$(Dgb2312 \\x1b$Aksc5601 \\x1b$(C.CE.PPIn the file, the first column represents an option and the second columnis the associated value. \fBinit\fR is a string to emit or expect beforethe first character is converted, while \fBfinal\fR is a string to emitor expect after the last character. All other options are names oftable-based encodings; the associated value is the escape-sequence thatmarks that encoding. Tcl syntax is used for the values; in the aboveexample, for instance, ``\fB{}\fR'' represents the empty string and``\fB\\x1b\fR'' represents character 27..PPWhen \fBTcl_GetEncoding\fR encounters an encoding \fIname\fR that has notbeen loaded, it attempts to load an encoding file called \fIname\fB.enc\fRfrom the \fBencoding\fR subdirectory of each directory specified in thelibrary path \fB$tcl_libPath\fR. If the encoding file exists, but ismalformed, an error message will be left in \fIinterp\fR..SH KEYWORDSutf, encoding, convert
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -