📄 xim.txt
字号:
X Window System, Version 11 Input Method Specifications Public Review Draft - November 1990 (Send comments to i18n@expo.lcs.mit.edu) Vania Joloboff Open Software Foundation Bill McMahon Hewlett Packard Company ABSTRACT This chapter addresses the portability and interoperability of programs in different countries. It describes specifications pro- viding to clients of the X Window System Version 11, an interface for input handling of characters in various languages. The specifications make it possible to develop portable applications independent of a particular language or a particular encoding of characters. The specifications are consistent with related specifications from X/Open Portability Guide, Release 3, and ANSI-C. The reader is assumed to be familiar with those, particu- larly with the notion of locale in the C language, therefore they will not be detailed here. Copyright c 1990 by the Massachusetts Institute of Technology.Permission to use, copy, modify, and distribute this documentation for any pur-pose and without fee is hereby granted, provided that the above copyrightnotice and this permission notice appear in all copies. MIT makes no represen-tations about the suitability for any purpose of the information in this docu-ment. It is provided "as is" without express or implied warranty. This docu-ment is only a draft standard of the MIT X Consortium and is therefore subjectto change. 1XIM Public Review DraftX Window System is a trademark of the Massachusetts Institute of Technology. 2XIM Public Review Draft1. Input Method OverviewThe next paragraphs provide definitions for terms and concepts used in thespecification, and a brief overview of the intended use of the abstractionsdeveloped for Xlib internationalization.1.1. What are Input Methods ?A large number of languages in the world rely on an alphabet, a small set ofsymbols (letters) used to form words. To enter text into a computer in analphabetic language a user usually has a keyboard on which there exists keysymbols corresponding to the alphabet. Sometimes, a few characters of analphabetic language are missing on the keyboard. Many computer users, whospeak a Latin alphabet based language only have a English-based keyboard. Theyneed to hit a combination of keystrokes in order to enter a character that doesnot exist directly on the keyboard. A number of algorithms have been developedfor entering such characters, known as European input methods, or compose inputmethod, or dead-keys input method.In some alphabetic languages, the rendering of characters strings is contextsensitive. When entering characters in those languages, a keystroke does notsystematically mean appending a new symbol at the end of the string. It maymodify the existing strings. Both input and output methods may be used in suchlanguages.With an ideographic writing system, rather than taking a small set of symbolsand combining them in different ways to create words, each word consists of oneunique symbol (or, occasionally, several symbols). The number of symbols maybe very large: 150 000 have been identified in Hanzi, the Chinese ideographicsystem.There are two major aspects of ideographics system for their computer usage.First, the standard computer character sets in Japan, China, and Korea includeroughly 8 000 characters, while sets in Taiwan have between 15 000 and 30 000characters, which make it necessary to use more than one byte to represent acharacter. Second, it obviously is impractical to have a keyboard thatincludes all of a given language's ideographic symbols. Therefore a specificmechanism is required for entering characters so that a keyboard with a reason-able number of keys can be used. Those input methods are usually based on thelanguage's phonetics, but there also exist methods based on the graphics pro-perties of characters.In addition to the ideographic characters, a number of languages often alsoinclude a phonetic (alphabetic-based) writing system. The phonetic signs arethen engraved on the keyboard and the keystrokes are transformed to theirappropriate ideographic counterparts. Here's a brief description of theJapanese and Korean phonetic systems:o Japanese: There are two phonetic symbol sets: katakana and hiragana. In gen- eral, you use katakana for words that are of foreign origin, and hiragana for writing native Japanese words. Collectively, the two systems are called kana. Each set consists of approximately 50 characters. You type either kana or English characters and define the region that you want to convert to 3XIM Public Review Draft kanji. Several kanji characters may have the same phonetic representation. If that's the case with your string, you get a menu of characters and choose the appropriate one. If no choice is necessary, the input method does the substitution directly. When Latin characters are converted to kana or Kanji, it is called a romaji conversion.o Korean: Hangul is a writing system that actually straddles the line between phonetic and ideographic. It's phonetic in the sense that each of the roughly 25 characters represents a specific sound. But between two and five of the characters are combined to form syllables, and these syllables are the basic units on which text processing is done. For example, a delete operation works on a syllable rather than the individual characters within it. And Korean code sets include several thousands of these syllables. You type the hangul characters that make up the syllables of the words you're entering. The display changes as you enter each hangul letter. That is, when you enter the first letter, it fills the entire space that the final syll- able will take up. When you enter the second, the first shrinks to about half its size to make room for the second. When you enter the third, the first two shrink again. And so on, up to the maximum of five letters in a syllable. It's usually acceptable to keep Korean text in hangul form, but some words are more commonly written in hanja. If you want to change hangul to hanja, you define the region to be converted, and follow the same basic method as described for Japanese.Probably because there are well-accepted phonetic writing systems for Japaneseand Korean, computer input methods for those languages are fairly standard.Keyboard keys have both English characters and the local language's phoneticsymbols engraved on them. You can then switch the keyboard from English tolocal mode and vice versa.The situation is different for Chinese. While there is a phonetic system calledPinyin promoted by authorities, there is no consensus for entering Chinesetext. Some vendors use a phonetic decomposition (Pinyin or another), othersuse ideographic decomposition of Chinese words, with various implementationsand keyboard layouts. There are about 16 known methods, none of which is aclear standard.Also, there are actually two ideographic sets used: Traditional Chinese, (theoriginal written Chinese) and Simplified Chinese. Several years back, thePeople's Republic Of China launched a campaign to simplify some ideographiccharacters and eliminate redundancies all together. Under the plan, characterswould be streamlined every five years. Characters have been revised severaltimes now, resulting in the smaller, simpler set that makes up SimplifiedChinese.1.1.1. Input Method ArchitectureAs shown in the previous paragraphs, there are many different input methods inuse today, varying with language, culture, and history. A common feature ofmany input methods is that the user may type multiple keystrokes in order tocompose a single character (or set of characters). The process of composingcharacters from keystrokes is called pre-editing. It may require complex 4XIM Public Review Draftalgorithms and large dictionaries involving substantial computer resources.Input methods may require one or more areas in which to show the feedback ofthe actual keystrokes, to propose disambiguation to the user, to list dic-tionaries, and so on. The input method areas with which we are concerned inthis specification are as follows. The Status area is intended to be a logical extension of the LED's that exist on the physical keyboard. It is an output-only window which is intended to present the internal state of the input method that is criti- cal to the user. The status area may consist of text data and bitmaps or some combination. The PreEdit area is intended to display the intermediate text for those languages that are composing prior to the client handling the data. The Auxiliary area is used for pop-up menus and customizing dialogs that may be required for an input method. There may be multiple Auxiliary areas for any input method. Auxiliary areas are managed by the input method independent of the client. Auxiliary areas are assumed to be a separate dialog which is maintained by the input method.There are various user interaction styles used for pre-editing. The ones thatthis specification addresses are as follows. For on-the-spot input methods, pre-editing data will be displayed in the application window. Application data is moved to allow pre-edit data to be displayed at the point of insertion. Over-the-spot pre-editing means that the data is displayed in an input method window that is placed over the point of insertion. Off-the-spot pre-editing means that the pre-edit window is inside the client window, but not at the point of insertion. Often this type of win- dow is placed at the bottom of the client window. Root-window pre-editing refers to input methods that use a pre-edit window that is the child of RootWindow.It would require a lot of computing resources if portable applications had toinclude input methods for all the languages in the world. To avoid this, thegoal of these specifications is to allow an application to communicate with aninput method placed in a separate process. Such a process is called an inputserver. The server to which the application should connect is dependent uponthe environment when the application is started up: what is the user language,the actual encoding to be used for it. We will say that input method connec-tion is locale dependent. It is also user dependent: for a given language, theuser can choose to some extent the user interface style of input method (ifchoice is possible among several).Using an input server implies communication overhead, but applications can bemigrated without relinking. Specifications in this document have been designedso input methods can be implemented either as a stub communicating to an inputserver or as a local library. 5XIM Public Review DraftAn input method may be based on a front-end or a back-end architecture. Infront-end, there are two separate connections to the X server: keystrokes godirectly from X server to the input method on one connection, other events tothe regular client connection. The input method is then acting as a filter,and sends composed strings to the client. Front-end requires synchronization
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -