multibyte.txt

来自「MSYS在windows下模拟了一个类unix的终端」· 文本代码 · 共 582 行 · 第 1/2 页
TXT
582 行
*multibyte.txt* For Vim version 5.8.  Last change: 2000 Jun 07		  VIM REFERENCE MANUAL	  by Bram Moolenaar et al.Multi-byte support				*multibyte* *multi-byte*						*Chinese* *Japanese* *Korean*There are languages which have many characters that can not be representedusing one byte (one octet).  These are Chinese (simplified or traditional),Japanese and Korean.  These languages uses more than one byte to represent acharacter.This is limited information on the support in Vim to edit files that use morethan one byte per character.  Actually, only two-byte codes are currentlysupported.Also see |+multi_byte| and |'fileencoding'|.1. Introduction				|multibyte-intro|2. Compiling				|multibyte-compiling|3. Display (X fontset support)		|multibyte-display|4. Input (XIM support)			|multibyte-input|5. UTF-8 in XFree86 xterm		|UTF8-xterm|==============================================================================1. Introduction						*multibyte-intro*LOCALE							*locale-multibyte*There are a number of languages in the world.  And there are differentcultures and environments at least as much as the number of languages.	Alinguistic environment corresponding to an area is called "|locale|".  ThePOSIX standard defines a concept of |locale|, which includes a lot ofinformation about |charset|, collating order for sorting, date format,currency format and so on.Your system need to support the |locale| system and the language |locale| ofyour choice.  Some system has a few language |locale|s, so the |locale| of thelanguage which you want to use may not be on your system.  If so, you have toadd the language |locale|.  But on some systems, it is not possible to addother |locale|s.  In this case, install X |locale|s by installing X compiledwith X_LOCALE.  Add "-DX_LOCALE" to the CFLAGS if your X lib support X_LOCALE.For example, When you are using Linux system and you want to use Japanese, setup your system one of the followings.    - libc5     + X compiled with X_LOCALE    - glibc-2.0 + libwcsmbs + X compiled without X_LOCALE    - glibc-2.1 + locale-ja + X compiled without X_LOCALEThe location in which the |locale|s are installed varies system to system.For example, "/usr/share/locale", "/usr/lib/locale", etc.  See your system'ssetlocale() man page.					*locale-name* *$LANG-multibyte*The format of |locale| name is:    language[_territory[. codeset]]Territory means the country, codeset means the |charset|.  For example, the|locale| name "ja_JP.eucJP" means the language is Japanese, the country isJapan, the codeset is EUC-JP.  But it also could be "ja", "ja_JP.EUC","ja_JP.ujis", etc.  And unfortunately, the |locale| name for a specificlanguage, territory and codeset is not unified and depends on your system.This name is used for the LANG environment value.  When you want to use Koreanand the |locale| name is "ko", do this:    sh:  export LANG=ko    csh: setenv LANG koExamples of locale name:    |charset|	    language		  |locale-name|    GB2312	    Chinese (simplified)  zh_CN.EUC, zh_CN.GB2312    Big5	    Chinese (traditional) zh_TW.BIG5, zh_TW.Big5    CNS-11643	    Chinese (traditional) zh_TW    EUC-JP	    Japanese		  ja, ja_JP.EUC, ja_JP.ujis, ja_JP.eucJP    Shift_JIS	    Japanese		  ja_JP.SJIS, ja_JP.Shift_JIS    EUC-KR	    Korean		  ko, ko_KR.EUCEven if your system does not have the multibyte language |locale| of yourchoice, or does not have a enough implementation of the locale, Vim cansomehow handle the multibyte languages.  Add "--enable-broken-locale" flag atcompile time.CODED CHARACTER SET (CCS)					*coded-character-set* *CCS*|CCS| is a mapping from a set of characters to a set of integers.  Forexample, ((65, A), (66, B), (67, C)) is a |CCS| and ((0x41, A), (0x42, B),(0x43, C)) is also a |CCS|.  Examples of |CCS| are ISO 10646, US-ASCII,ISO-8859 series, JIS X 0208, JIS X 0201, KS C 5601 (KS X 1001) and KS C 5636(KS X 1003).The term "integer" means code point or character number and is different fromoctets or bit combination.Typically, a |CCS| is a character table.  Representing the column/line ashexadecimal number becomes the code point of the character.  For example,US-ASCII CCS has 8x16 character table, the column number start with 0 and endwith 7, the line number start with 0 end with F.  The code point of thecharacter at 4/1 is 0x41.CHARACTER ENCODING SCHEME (CES)					*character-encode-scheme* *CES*|CES| is a mapping from a sequence of elements in one or more |CCS|es to asequence of octets.  Examples of |CES| are EUC-JP, EUC-KR, EUC-CN (GB 2312),EUC-TW (CNS-11643), ISO-2022-JP, ISO-2022-KR, ISO-2022-CN, UTF-8, etc.CHARSET							*charset*|charset| is a method of converting a sequence of octets into a sequence ofcharacters, the combination of one or more |CCS|es and a |CES|.  For example,ISO-2022-JP |charset| is the combination of ASCII, JIS X 0201, JIS X 0208|CCS|es and ISO-2022-JP |CES|.  Examples of |charset| are US-ASCII, ISO-8859series, GB2312, EUC-JP, EUC-KR, Shift_JIS, Big5, UTF-8, etc.Note that this is not a term used by other standards bodies, such as ISO, buta term defined in RFC 2130.  The term "codeset" in POSIX has the same meaningas |charset| here.  |charset| does not mean character set (a set ofcharacters) and the term "character repertoire" means a collection of distinctcharacters.  There are historical reasons, see RFC 2130.						*charset-conversion*One language could have some |charset|s.  For example, Japanese hasISO-2022-JP, EUC-JP and Shift_JIS |charset|s.  ISO-2022-JP |charset| is usedmainly for internet messages, because it is encoded in 7-bit scheme.  EUC-JPis mainly used on Unix, Shift_JIS is mainly used on Windows and MacOS.Vim does not convert automatically to the locale's |charset| at display time.So, if a file's |charset| differs from your locale's |charset|, the file isnot displayed correctly.  So, you must know the file's |charset| by any way:guessing, using some utilities, etc, and convert the |charset| to the locale's|charset| manually.Useful utilities for converting the |charset|:    Japanese:	    nkf	Nkf is "Network Kanji code conversion Filter".  One of the most unique	facility of nkf is the guess of the input Kanji code.  So, you don't	need to know what the inputting file's |charset| is.  When convert to	EUC-JP from ISO-2022-JP or Shift_JIS, simply do the following command	in Vim:	    :%!nkf -e	Nkf can be found at:	http://www.sfc.wide.ad.jp/~max/FreeBSD/ports/distfiles/nkf-1.62.tar.gz    Chinese:	    hc	Hc is "Hanzi Converter".  Hc convert a GB file to a Big5 file, or Big5	file to GB file.  Hc can be found at:	ftp://ftp.cuhk.hk/pub/chinese/ifcss/software/unix/convert/hc-30.tar.gz    Korean:	    hmconv	Hmconv is Korean code conversion utility especially for E-mail. It can	convert between EUC-KR and ISO-2022-KR.  Hmconv can be found at:	ftp://ftp.kaist.ac.kr/pub/hangul/code/hmconv/hmconv1.0pl3    Multilingual:   lv	Lv is a Powerful Multilingual File Viewer.  And it can be worked as	|charset| converter.  Supported |charset|: ISO-2022-CN, ISO-2022-JP,	ISO-2022-KR, EUC-CN, EUC-JP, EUC-KR, EUC-TW, UTF-7, UTF-8, ISO-8859	series, Shift_JIS, Big5 and HZ. Lv can be found at:	http://www.ff.iij4u.or.jp/~nrt/freeware/lv4493.tar.gzX LOGICAL FONT DESCRIPTION (XLFD)							*XLFD*XLFD is the X font name and contains the information about the font size,|CCS|, etc.  The name is in this format:FOUNDRY-FAMILY-WEIGHT-SLANT-WIDTH-STYLE-PIXEL-POINT-X-Y-SPACE-AVE-CR-CEEach field means:- FOUNDRY:  FOUNDRY field.  The company that created the font.- FAMILY:   FAMILY_NAME field.  Basic font family name.  (helvetica, gothic,	    times, etc)- WEIGHT:   WEIGHT_NAME field.  How thick the letters are.  (light, medium,	    bold, etc)- SLANT:    SLANT field.		r:  Roman		i:  Italic		o:  Oblique		ri: Reverse Italic		ro: Reverse Oblique		ot: Other		number:	Scaled font- WIDTH:    SETWIDTH_NAME field.  Width of characters.  (normal, condensed,	    narrow, double wide)- STYLE:    ADD_STYLE_NAME field.  Extra info to describe font.  (Serif, Sans	    Serif, Informal, Decorated, etc)- PIXEL:    PIXEL_SIZE field.  Height, in pixels, of characters.- POINT:    POINT_SIZE field.  Ten times height of characters in points.- X:	    RESOLUTION_X field.  X resolution (dots per inch).- Y:	    RESOLUTION_Y field.  Y resolution (dots per inch).- SPACE:    SPACING field.		p:  Proportional		m:  Monospaced		c:  CharCell- AVE:	    AVERAGE_WIDTH field.  Ten times average width in pixels.- CR:	    CHARSET_REGISTRY field.  Indicates the name of the font |CCS| name.- CE:	    CHARSET_ENCODING field.  In some CCSes, such as ISO-8859 series,	    this field is the part of |CCS| name.  In other CCSes, such as JIS	    X 0208, if this field is 0, code points has the same value as GL,	    and GR if 1.For example, in case of a 14 dots font corresponding to JIS X 0208, it iswritten like:    -misc-fixed-medium-r-normal--16-110-100-100-c-160-jisx0208.1990-0X FONTSET						*fontset* *xfontset*A |CCS| typically associated with one font.  The languages which must managemultiple |CCS|es needs to manage multiple font.  In X11R5, for theinternationalization of output API, FontSet was introduced.  By using this,Xlib takes care of switching of fonts and the display.  Till X11R4, theapplication themselves had to manage this.|locale| database has the information about the |charset| of the |locale|,which |CCS|(es) is needed and which |CES| the locale uses.  When you use thelocale which must manage multiple |CCS|es, you have to specify the each|CCS|'s font in 'guifontset' option.Example:    |charset| language		    |CCS|es    GB2312    Chinese (simplified)  ISO-8859-1 and GB 2312    Big5      Chinese (traditional) ISO-8859-1 and Big5    CNS-11643 Chinese (traditional) ISO-8859-1, CNS 11643-1 and CNS 11643-2    EUC-JP    Japanese		    JIS X 0201 and JIS X 0208    EUC-KR    Korean		    ISO-8859-1 and KS C 5601 (KS X 1001)The |XLFD| contains the information of |CCS|.  So, by searching in fonts.dir,you can find the |CCS|'s font.  The fonts.dir is in the fonts directory (e.g./usr/X11R6/lib/X11/fonts/*), the format of the file is:    First line:	the number of fonts which are contained in this fonts.dir    other line:	FILENAME  |XLFD|Or, you can search fonts using xlsfonts command.  For example, when you'researching for the font for KS C 5601:>   xlsfonts | grep ksc5601will show you the list of it.						*base_font_name_list*In 'guifontset' option and ~/.Xdefaults, you specify the|base_font_name_list|, which is a list of |XLFD| font names that Xlib uses toload the fonts needed for the |locale|.  The base font names are acomma-separated list.For example, when you use the ja_JP.eucJP |locale|, which require JIS X 0201and JIS X 0208 |CCS|es.  You could supply a |base_font_name_list| thatexplicitly specifies the charsets, like:guifontset=-misc-fixed-medium-r-normal--14-130-75-75-c-140-jisx0208.1983-0,    \-misc-fixed-medium-r-normal--14-130-75-75-c-70-jisx0201.1976-0Alternatively, the user could supply a base font name list that omits the|CCS| name, letting Xlib select font characters required for the locale. Forexample:guifontset=-misc-fixed-medium-r-normal--14-130-75-75-c-140,    \-misc-fixed-medium-r-normal--14-130-75-75-c-70Alternatively, the user could supply a single base font name that allows Xlibto select from all available fonts.  For example:guifontset=-misc-fixed-medium-r-normal--14-*Alternatively, the user could specify the alias name.  See fonts.alias inthe fonts directory.guifontset=k14,r14Note that in East Asian fonts, the standard character cell is square.  Whenmixing Latin font and East Asian font, East Asian font width should be twicethe Latin font width.  And GVIM needs fixed width font.X INPUT METHOD (XIM)				*XIM* *xim* *x-input-method*XIM (X Input Method) is an international input module for X.  There are twokind of structures, Xlib unit type and |IM-server| (Input-Method server) type.|IM-server| type is suitable for complex inputting, like CJK inputting.- IM-server							*IM-server*  In |IM-server| type input structures, the input event is handled by either  of the two ways: FrontEnd system and BackEnd system.  In the FrontEnd  system, input events are snatched by the |IM-server| first, then |IM-server|  give the application the result of input.  On the other hand, the BackEnd  system works reverse order.  MS Windows adopt BackEnd system.  In X, most of  |IM-server|s adopt FrontEnd system.  The demerit of BackEnd system is the  large overhead in communication, but it provides safe synchronization with  no restrictions on applications.  For example, there are xwnmo and kinput2 Japanese |IM-server|, both are  FrontEnd system.  Xwnmo is distributed with Wnn (see below), kinput2 can be
multibyte.txt - 源码说明

本页面展示了「MSYS在windows下模拟了一个类unix的终端」中的 multibyte.txt 源码文件，采用文本编程语言编写，共 582 行代码。您可以在线阅读完整代码内容，也可以返回资源详情页下载完整源码包进行本地学习和开发。
虫虫下载站收录了大量与windows相关的技术资源，包括源代码、技术文档、电路图等，是电子工程师和嵌入式开发者的专业学习平台。
⌨️ 快捷键说明

复制代码Ctrl + C
搜索代码Ctrl + F
全屏模式F11
增大字号Ctrl + =
减小字号Ctrl + -
显示快捷键?