⭐ 欢迎来到虫虫下载站! | 📦 资源下载 📁 资源专辑 ℹ️ 关于我们
⭐ 虫虫下载站

📄 int_7592.htm

📁 C++标准库 C++标准库 C++标准库 C++标准库
💻 HTM
📖 第 1 页 / 共 2 页
字号:
<HTML><HEAD><TITLE>1.2 Internationalization and Localization</TITLE></HEAD><BODY><A HREF="ug2.htm"><IMG SRC="images/banner.gif"></A><BR><A HREF="how_1948.htm"><IMG SRC="images/prev.gif"></A><A HREF="booktoc2.htm"><IMG SRC="images/toc.gif"></A><A HREF="sta_9169.htm"><IMG SRC="images/next.gif"></A><BR><STRONG>Click on the banner to return to the user guide home page.</STRONG><H2>1.2 Internationalization and Localization</H2><P>Computer users all over the world prefer to interact with their systems using their own local languages and cultural conventions.  As a developer aiming for high international acceptance of your products, you need to provide users the flexibility for modifying output conventions to comply with local requirements, such as different currency and numeric representations.  You must also provide the capability for translating interfaces and messages without necessitating many different language versions of your software.</P><P>Two processes that enhance software for worldwide use are <I>internationalization</I> and <I>localization</I>.  <I>Internationalization</I> is the process of building into software the potential for worldwide use.  It is the result of efforts by programmers and software designers during software development.  </P><P>Internationalization requires that developers consciously design and implement software for adaptation to various languages and cultural conventions, and avoid hard-coding elements that can be localized, like screen positions and file names.  For example, developers should never embed in their code any messages, prompts, or other kind of displayed text, but rather store the messages externally, so they can be translated and exchanged.  A developer of internationalized software should never assume specific conventions for formatting numeric or monetary values, or for displaying date and time.</P><P><I>Localization</I> is the process of actually adapting internationalized software to the needs of users in a particular geographical or cultural area.  It includes translation of messages by software translators.  It requires the creation and availabity of appropriate tables containing relevant local data for use in a given system.  This typically is the function of system administrators, who build facilities for these functions into their operating systems .  Users of internationalized software are involved in the process of localization in that they select the local conventions they prefer.</P><P>The <I>Standard C++ Library</I> offers a number of classes that support internationalization of your programs.  We will describe them in detail in this chapter.  Before we do, however, we would like to define some of the cultural conventions that impact software internationalization, and are supported by the programming languages C and C++ and their respective standard libraries.  Of course, there are many issues outside our list that need to be addressed, like orientation, sizing and positioning of screen displays, vertical writing and printing, selection of font tables, handling international keyboards, and so on.  But let us begin here.</P><A NAME="1.2.1"><H3>1.2.1 Localizing Cultural Conventions</H3></A><P>The need for localizing software arises from differences in cultural conventions.  These differences involve:  language itself; representation of numbers and currency; display of time and date; and ordering or sorting of characters and strings.</P><A NAME="1.2.1.1"><H4>1.2.1.1 Language</H4></A><P>Of course, <I>language</I> itself varies from country to country, and even within a country.  Your program may require output messages in English, German, French, Italian, or any number of languages commonly used in the world today.</P><P>Languages may also differ in the <I>alphabet</I> they use.  Examples of different languages with their respective alphabets are given below:</P><CENTER><IMG SRC="images/table1.gif"></CENTER><A NAME="1.2.1.2"><H4>1.2.1.2 Numbers</H4></A><P>The representation of numbers depends on local customs, which vary from country to country.  For example, consider the <I>radix character</I>, the symbol used to separate the integer portion of a number from the fractional portion.  In American English, this character is a period; in much of Europe, it is a comma.  Conversely, the thousands separator that separates numbers larger than three digits is a comma in American English, and a period in much of Europe.</P><P>The convention for grouping digits also varies.  In American English, digits are grouped by threes, but there are many other possibilities.  In the example below, the same number is written as it would be locally in three different countries:</P><CENTER><TABLE CELLSPACING=3 CELLPADDING=3><TR VALIGN=top><TD>1,000,000.55<BR></TD><TD>US<BR></TD></TR><TR VALIGN=top><TD>1.000.000,55<BR></TD><TD>Germany<BR></TD></TR><TR VALIGN=top><TD>10,00,000.55<BR></TD><TD>Nepal<BR></TD></TR></TABLE></CENTER><A NAME="1.2.1.3"><H4>1.2.1.3 Currency</H4></A><P>We are all aware that countries use different currencies.  However, not everyone realizes the many different ways we can represent units of currency.  For example, the symbol for a currency can vary.  Here are two different ways of representing the same amount in US dollars:</P><CENTER><TABLE CELLSPACING=3 CELLPADDING=3><TR VALIGN=top><TD>$24.99<BR></TD><TD>US<BR></TD></TR><TR VALIGN=top><TD>USD 24.99<BR></TD><TD>International currency symbol for the US<BR></TD></TR></TABLE></CENTER><P>The placement of the currency symbol varies for different currencies, too, appearing before, after, or even within the numeric value:</P><CENTER><IMG SRC="images/table2.gif"></CENTER><P>The format of negative currency values differs:</P><CENTER><IMG SRC="images/table3.gif"></CENTER><A NAME="1.2.1.4"><H4>1.2.1.4 Time and Date</H4></A><P>Local conventions also determine how time and date are displayed.  Some countries use a 24-hour clock; others use a 12-hour clock.  Names and abbreviations for days of the week and months of the year vary by language.</P><P>Customs dictate the ordering of the year, month, and day, as well as the separating delimiters for their numeric representation.  To designate years, some regions use seasonal, astronomical, or historical criteria, instead of the Western Gregorian calendar system.  For example, the official Japanese calendar is based on the year of reign of the current Emperor.</P><P>The following example shows short and long representations of the same date in different countries: </P><CENTER><IMG SRC="images/table4.gif"></CENTER><P>The following example shows different representations of the same time:</P><CENTER><TABLE CELLSPACING=3 CELLPADDING=3><TR VALIGN=top><TD>4:55 pm<BR></TD><TD>US time<BR></TD></TR><TR VALIGN=top><TD>16:55 Uhr<BR></TD><TD>German time<BR></TD></TR></TABLE></CENTER><P>And the following example shows different representations of the same time:</P><CENTER><IMG SRC="images/table5.gif"></CENTER><A NAME="1.2.1.5"><H4>1.2.1.5 Ordering</H4></A><P>Languages may vary regarding collating sequence; that is, their rules for ordering or sorting characters or strings.  The following example shows the same list of words ordered alphabetically by different collating sequences:</P><CENTER><IMG SRC="images/table6.gif"></CENTER><A HREF="endnote2.htm#fn1">[1]</A><P>The ASCII collation orders elements according to the numeric value of bytes, which does not meet the requirements of English language dictionary sorting.  This is because lexicographical order sorts <SAMP>a</SAMP> after <SAMP>A</SAMP> and before <SAMP>B</SAMP>, whereas ASCII-based order sorts <SAMP>a</SAMP> after the entire set of uppercase letters.</P><P>The German alphabet sorts <IMG SRC="images/inline1.gif"> before <SAMP>b</SAMP>, whereas the ASCII order sorts an umlaut after all other letters.</P><P>In addition to specifying the ordering of individual characters, some languages specify that certain groups of characters should be clustered and treated as a single character.  The following example shows the difference this can make in an ordering:</P><CENTER><IMG SRC="images/table7.gif"></CENTER><P>The word <SAMP>llava</SAMP> is sorted after <SAMP>loro</SAMP> and before <IMG SRC="images/inline2.gif">, because in Spanish <SAMP>ll</SAMP> is a digraph<A HREF="endnote2.htm#fn2">[2]</A>, i.e., it is treated as a single character that is sorted after <SAMP>l</SAMP> and before <SAMP>m</SAMP>.  Similarly, the digraph <SAMP>ch</SAMP> in Spanish is treated as a single character to be sorted after <SAMP>c</SAMP>, but before <SAMP>d</SAMP>.  Two characters that are paired and treated as a single character are referred to as a two-to-one <I>character code pair</I>.</P><P>In other cases, one character is treated as if it were actually two characters.  The German single character <IMG SRC="images/inline3.gif">, called the <I>sharp s</I>, is treated as <SAMP>ss</SAMP>.  This treatment makes a difference in the ordering, as shown in the example below:</P><CENTER><IMG SRC="images/table8.gif"></CENTER><A NAME="1.2.2"><H3>1.2.2 Character Encodings for Localizing Alphabets</H3></A><P>We know that different languages can have different alphabets.  The first step in localizing an alphabet is to find a way to represent, or <I>encode</I>, all its characters.  In general, alphabets may have different <I>character encodings</I>.</P><P>The 7-bit ASCII codeset is the traditional code on UNIX systems.</P><P>The 8-bit codesets permit the processing of many Eastern and Western European, Middle Eastern, and Asian Languages.  Some are strictly extensions of the 7-bit ASCII codeset; these include the 7-bit ASCII codes and additionally support 128-character codes beyond those of ASCII.  Such extensions meet the needs of Western European users.  To support languages that have completely different alphabets, such as Arabic and Greek, larger 8-bit codesets have been designed.</P><P>Multibyte character codes are required for alphabets of more than 256 characters, such as kanji, which consists of Japanese ideographs based on Chinese characters.  Kanji has tens of thousands of characters, each of which is represented by two bytes.  To ensure backward compatibility with ASCII, a multibyte codeset is a superset of the ASCII codeset and consists of a mixture of one- and two-byte characters.</P><P>For such languages, several encoding schemes have been defined.  These encoding schemes provide a set of rules for parsing a byte stream into a group of coded characters.</P><A NAME="1.2.2.1"><H4>1.2.2.1 Multibyte Encodings</H4></A><P>Handling multibyte character encodings is a challenging task.  It involves parsing multibyte character sequences, and in many cases requires conversions between multibyte characters and wide characters.</P><P>Understanding multibyte encoding schemes is easier when explained by means of a typical example.  One of the earliest and probably biggest markets for multibyte character support is in Japan.  Therefore, the following examples are based on encoding schemes for Japanese text processing.</P><P>In Japan, a single text message can be composed of characters from four different writing systems.  <I>Kanji</I> has tens of thousands of characters, which are represented by pictures.  <I>Hiragana</I> and <I>katakana</I> are syllabaries, each containing about 80 sounds, which are also represented as ideographs.  The <I>Roman</I> characters include some 95 letters, digits, and punctuation marks.  </P><P>Figure 1 gives an example of an encoded Japanese sentence composed of these four writing systems:</P><H4>Figure 1.  A Japanese sentence mixing four writing systems</H4><BR><IMG SRC="images/image1.gif"><P>The sentence means:  "Encoding methods such as JIS can support texts that mix Japanese and English."</P><P>A number of Japanese character sets are common:</P><CENTER><TABLE CELLSPACING=3 CELLPADDING=3><TR VALIGN=top><TD><UL><P>JIS C 6226-1978<BR></TD><TD><UL><P>JIS X 0208-1983<BR></TD></TR><TR VALIGN=top>

⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -