📄 00000001.htm
字号:
section of this document. Also see the "Sourcing Scripts in Different <BR>Encodings" section of this document for special instructions for <BR>sourcing files in formats other than the system encoding. <BR> <BR>General String Manipulation <BR>Beginning in Tcl 8.1, all Tcl string manipulation functions expect and <BR>return Unicode strings encoded in UTF-8 format. Because the use of <BR>Unicode/UTF-8 encoding is internal to Tcl, you should see no <BR>difference in Tcl 8.0 and 8.1 string handling in your scripts. <BR> <BR>The Tcl string functions properly handle multi-byte UTF-8 characters <BR>as single characters. For example in the following commands, Tcl <BR>treats the string "Café" as a four-character string, even though the <BR>internal representation in UTF-8 format requires five bytes. (As with <BR>previous versions of Tcl, string indexes start with "0"; that is, the <BR>first character is index "0", the second character is index "1", etc.) <BR> <BR>% set unistr "Café" <BR>Café <BR>% string length $unistr <BR>4 <BR>% string index $unistr 3 <BR>é <BR>Furthermore, the new regular expression implementation introduced in Tcl <BR> 8.1 handles the full range of Unicode characters. <BR> <BR>The "\uxxxx" escape sequence allows you to specify a Unicode character <BR>by its four-digit, hexadecimal Unicode code value. For example, the <BR>following assigns to a variable two ideograph characters corresponding <BR>to the Chinese transliteration of "Tcl" (TAI-KU): <BR> <BR>set tclstr "\u592a\u9177" <BR>Channel Input/Output <BR>When reading and writing data on a channel, you need to ensure that <BR>Tcl uses the proper character encoding for that channel. The default <BR>encoding for newly opened channels (both files and sockets) is the <BR>same as the platform- and locale-dependent system encoding used for <BR>interfacing with the operating system. (See the "Character Encodings and <BR> the Operating System" section of this document for more information.) <BR>In most cases, you don't need to do anything special to read or write <BR>data because most text files are created in the system encoding. You <BR>need to take special steps only when accessing files in an encoding <BR>other than the system encoding (for example, reading a file encoded in <BR>Shift-JIS format when your system encoding is ISO 8859-1). <BR> <BR>The fconfigure -encoding option allows you to specify the encoding for a <BR> channel. Thus, to read from a file encoded in Shift-JIS format, you <BR>should execute the following commands: <BR> <BR>set fd [open $file r] <BR>fconfigure $fd -encoding shiftjis <BR>Tcl then automatically converts any text you read from the file into <BR>standard UTF-8 format. <BR> <BR>Similarly, if you are writing to a channel, you can use fconfigure <BR>-encoding to specify the target character encoding and Tcl automatically <BR> converts strings from UTF-8 to that encoding on output. <BR> <BR>Note: The Tcl source command always reads files using the system <BR>encoding. For a tip on sourcing files in different encodings, see the <BR>"Sourcing Scripts in Different Encodings" section of this document. <BR> <BR>Sourcing Scripts in Different Encodings <BR>The Tcl source command always reads files using the system encoding. <BR>Therefore, Ajuba Solutions recommends that whenever possible, you author <BR> scripts in the native system encoding. <BR> <BR>A difficulty arises when distributing scripts internationally, as you <BR>don't necessarily know what the system encoding will be. Fortunately, <BR>most common character encodings include the standard 7-bit ASCII <BR>characters as a subset. Therefore, you are usually safe if your script <BR>contains only 7-bit ASCII characters. <BR> <BR>If you need to use an extended character set for your scripts that you <BR>distribute, you can provide a small "bootstrap" script written in <BR>7-bit ASCII. The bootstrap script can then load and execute scripts in <BR>any encoding that you choose. <BR> <BR>You can execute a script written in an encoding other than the system <BR>encoding by opening the file, setting the proper encoding using the <BR>fconfigure -encoding command, reading the file into a variable, and then <BR> evaluating the string with the eval command. For example, the following <BR> reads and executes a Tcl script encoded in EUC-JP: <BR> <BR>set fd [open "app.tcl" r] <BR>fconfigure $fd 聳encoding euc-jp <BR>set jpscript [read $fd] <BR>close $fd <BR>eval $jpscript <BR> <BR>Note: This technique works only if the file contains actual EUC-JP <BR>encoded characters (for example, you created the file with a EUC-JP text <BR> editor). This technique doesn't work if you build the EUC-JP encoded <BR>characters using the "\x" or octal digit escape sequences. Tcl 8.1 <BR>interprets each "\x" or octal digit escape sequence as a single <BR>Unicode character with the upper bits set to 0. For example, if the <BR>script app.tcl above contained the line: <BR> <BR>set ha "\xA4\xCF" <BR>then the variable ha would contain two characters, "陇?" (Unicode <BR>characters "CURRENCY SIGN" and "LATIN CAPITAL LETTER I WITH DIAERESIS"), <BR> not the Unicode HA character. <BR> <BR>Converting Strings to Different Encodings <BR>You can convert a string to a different encoding using the encoding <BR>convertfrom and encoding convertto commands. The encoding convertfrom <BR>command converts a string from a specified encoding into UTF-8 Unicode <BR>characters; the encoding convertto command converts a string from <BR>UTF-8 Unicode into a specified encoding. In either case, if you omit the <BR> encoding argument, the command uses the current system encoding. <BR> <BR>As an example, the following command converts a string representing <BR>the Hiragana letter HA from EUC-JP encoding into a Unicode string: <BR> <BR>set ha [encoding convertfrom euc-jp "\xA4\xCF"] <BR>(In Tcl 8.1, the "\x" and octal digit escape sequences specify the lower <BR> 8 bits of a Unicode character with the upper 8 bits set to 0. The <BR>thus the string "\xA4\xCF" still specifies two characters in Tcl 8.1, <BR>just as it did in Tcl 8.0; however Tcl 8.1 stores those characters in <BR>four bytes, whereas Tcl 8.0 stored them in two bytes.) <BR> <BR>Fonts, Encodings, and Tk Widgets <BR>Tk widgets that display text now require text strings in Unicode/UTF-8 <BR>encoding. Tk automatically handles any encoding conversion necessary <BR>to display the characters in a particular font. <BR>
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -