📄 tpj13.pod
字号:
# This document contains text in Perl "POD" format.# Use a POD viewer like perldoc or perlman to render it.=head1 NAMELocale::Maketext::TPJ13 -- article about software localization=head1 SYNOPSIS # This an article, not a module.=head1 DESCRIPTIONThe following article by Sean M. Burke and Jordan Lachlerfirst appeared in I<The Perl Journal> #13and is copyright 1999 The Perl Journal. It appearscourtesy of Jon Orwant and The Perl Journal. This document may bedistributed under the same terms as Perl itself.=head1 Localization and Perl: gettext breaks, Maketext fixesby Sean M. Burke and Jordan LachlerThis article points out cases where gettext (a common system forlocalizing software interfaces -- i.e., making them work in the user'slanguage of choice) fails because of basic differences between humanlanguages. This article then describes Maketext, a new system capableof correctly treating these differences.=head2 A Localization Horror Story: It Could Happen To You=over"There are a number of languages spoken by human beings in thisworld."-- Harald Tveit Alvestrand, in RFC 1766, "Tags for theIdentification of Languages"=backImagine that your task for the day is to localize a piece of software-- and luckily for you, the only output the program emits is twomessages, like this: I scanned 12 directories. Your query matched 10 files in 4 directories.So how hard could that be? You look at the code thatproduces the first item, and it reads: printf("I scanned %g directories.", $directory_count);You think about that, and realize that it doesn't even work right forEnglish, as it can produce this output: I scanned 1 directories.So you rewrite it to read: printf("I scanned %g %s.", $directory_count, $directory_count == 1 ? "directory" : "directories", );...which does the Right Thing. (In case you don't recall, "%g" is forlocale-specific number interpolation, and "%s" is for stringinterpolation.)But you still have to localize it for all the languages you'reproducing this software for, so you pull Locale::gettext off of CPANso you can access the C<gettext> C functions you've heard are standardfor localization tasks.And you write: printf(gettext("I scanned %g %s."), $dir_scan_count, $dir_scan_count == 1 ? gettext("directory") : gettext("directories"), );But you then read in the gettext manual (Drepper, Miller, and Pinard 1995)that this is not a good idea, since how a single word like "directory"or "directories" is translated may depend on context -- and this istrue, since in a case language like German or Russian, you'd may needthese words with a different case ending in the first instance (where theword is the object of a verb) than in the second instance, which you haven't evengotten to yet (where the word is the object of a preposition, "in %gdirectories") -- assuming these keep the same syntax when translatedinto those languages.So, on the advice of the gettext manual, you rewrite: printf( $dir_scan_count == 1 ? gettext("I scanned %g directory.") : gettext("I scanned %g directories."), $dir_scan_count );So, you email your various translators (the boss decides that thelanguages du jour are Chinese, Arabic, Russian, and Italian, so youhave one translator for each), asking for translations for "I scanned%g directory." and "I scanned %g directories.". When they reply,you'll put that in the lexicons for gettext to use when it localizesyour software, so that when the user is running under the "zh"(Chinese) locale, gettext("I scanned %g directory.") will return theappropriate Chinese text, with a "%g" in there where printf can theninterpolate $dir_scan.Your Chinese translator emails right back -- he says both of thesephrases translate to the same thing in Chinese, because, in linguisticjargon, Chinese "doesn't have number as a grammatical category" --whereas English does. That is, English has grammatical rules thatrefer to "number", i.e., whether something is grammatically singularor plural; and one of these rules is the one that forces nouns to takea plural suffix (generally "s") when in a plural context, as they are whenthey follow a number other than "one" (including, oddly enough, "zero").Chinese has no such rules, and so has just the one phrase where Englishhas two. But, no problem, you can have this one Chinese phrase appearas the translation for the two English phrases in the "zh" gettextlexicon for your program.Emboldened by this, you dive into the second phrase that your softwareneeds to output: "Your query matched 10 files in 4 directories.". You noticethat if you want to treat phrases as indivisible, as the gettextmanual wisely advises, you need four cases now, instead of two, tocover the permutations of singular and plural on the two items,$dir_count and $file_count. So you try this: printf( $file_count == 1 ? ( $directory_count == 1 ? gettext("Your query matched %g file in %g directory.") : gettext("Your query matched %g file in %g directories.") ) : ( $directory_count == 1 ? gettext("Your query matched %g files in %g directory.") : gettext("Your query matched %g files in %g directories.") ), $file_count, $directory_count, );(The case of "1 file in 2 [or more] directories" could, I suppose,occur in the case of symlinking or something of the sort.)It occurs to you that this is not the prettiest code you've everwritten, but this seems the way to go. You mail off to thetranslators asking for translations for these four cases. TheChinese guy replies with the one phrase that these all translate to inChinese, and that phrase has two "%g"s in it, as it should -- butthere's a problem. He translates it word-for-word back: "In %gdirectories contains %g files match your query." The %gslots are in an order reverse to what they are in English. You wonderhow you'll get gettext to handle that.But you put it aside for the moment, and optimistically hope that theother translators won't have this problem, and that their languageswill be better behaved -- i.e., that they will be just like English.But the Arabic translator is the next to write back. First off, yourcode for "I scanned %g directory." or "I scanned %g directories."assumes there's only singular or plural. But, to use linguisticjargon again, Arabic has grammatical number, like English (but unlikeChinese), but it's a three-term category: singular, dual, and plural.In other words, the way you say "directory" depends on whether there'sone directory, or I<two> of them, or I<more than two> of them. Yourtest of C<($directory == 1)> no longer does the job. And it meansthat where English's grammatical category of number necessitatesonly the two permutations of the first sentence based on "directory[singular]" and "directories [plural]", Arabic has three -- and,worse, in the second sentence ("Your query matched %g file in %gdirectory."), where English has four, Arabic has nine. You sensean unwelcome, exponential trend taking shape.Your Italian translator emails you back and says that "I searched 0directories" (a possible English output of your program) is stilted,and if you think that's fine English, that's your problem, but thatI<just will not do> in the language of Dante. He insists that where$directory_count is 0, your program should produce the Italian textfor "I I<didn't> scan I<any> directories.". And ditto for "I didn'tmatch any files in any directories", although he says the last partabout "in any directories" should probably just be left off.You wonder how you'll get gettext to handle this; to accomodate theways Arabic, Chinese, and Italian deal with numbers in just these fewvery simple phrases, you need to write code that will ask gettext fordifferent queries depending on whether the numerical values inquestion are 1, 2, more than 2, or in some cases 0, and you still haven'tfigured out the problem with the different word order in Chinese.Then your Russian translator calls on the phone, to I<personally> tellyou the bad news about how really unpleasant your life is about tobecome:Russian, like German or Latin, is an inflectional language; that is, nounsand adjectives have to take endings that depend on their case(i.e., nominative, accusative, genitive, etc...) -- which is roughly a matter ofwhat role they have in syntax of the sentence --as well as on the grammatical gender (i.e., masculine, feminine, neuter)and number (i.e., singular or plural) of the noun, as well as on thedeclension class of the noun. But unlike with most other inflected languages,putting a number-phrase (like "ten" or "forty-three", or their Arabicnumeral equivalents) in front of noun in Russian can change the case andnumber that noun is, and therefore the endings you have to put on it.He elaborates: In "I scanned %g directories", you'd I<expect>"directories" to be in the accusative case (since it is the directobject in the sentence) and the plural number,except where $directory_count is 1, then you'd expect the singular, ofcourse. Just like Latin or German. I<But!> Where $directory_count %10 is 1 ("%" for modulo, remember), assuming $directory count is aninteger, and except where $directory_count % 100 is 11, "directories"is forced to become grammatically singular, which means it gets theending for the accusative singular... You begin to visualize the codeit'd take to test for the problem so far, I<and still work for Chineseand Arabic and Italian>, and how many gettext items that'd take, buthe keeps going... But where $directory_count % 10 is 2, 3, or 4(except where $directory_count % 100 is 12, 13, or 14), the word for"directories" is forced to be genitive singular -- which means anotherending... The room begins to spin around you, slowly at first... Butwith I<all other> integer values, since "directory" is an inanimatenoun, when preceded by a number and in the nominative or accusativecases (as it is here, just your luck!), it does stay plural, but it isforced into the genitive case -- yet another ending... Andyou never hear him get to the part about how you're going to run intosimilar (but maybe subtly different) problems with other Slaviclanguages like Polish, because the floor comes up to meet you, and youfade into unconsciousness.The above cautionary tale relates how an attempt at localization canlead from programmer consternation, to program obfuscation, to a needfor sedation. But careful evaluation shows that your choice of toolsmerely needed further consideration.=head2 The Linguistic View=over"It is more complicated than you think." -- The Eighth Networking Truth, from RFC 1925=backThe field of Linguistics has expended a great deal of effort over thepast century trying to find grammatical patterns which hold acrosslanguages; it's been a constant processof people making generalizations that should apply to all languages,only to find out that, all too often, these generalizations fail --sometimes failing for just a few languages, sometimes whole classes oflanguages, and sometimes nearly every language in the world exceptEnglish. Broad statistical trends are evident in what the "averagelanguage" is like as far as what its rules can look like, must looklike, and cannot look like. But the "average language" is just asunreal a concept as the "average person" -- it runs up against thefact no language (or person) is, in fact, average. The wisdom of pastexperience leads us to believe that any given language can do whateverit wants, in any order, with appeal to any kind of grammatical
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -