⭐ 欢迎来到虫虫下载站! | 📦 资源下载 📁 资源专辑 ℹ️ 关于我们
⭐ 虫虫下载站

📄 tpj13.pod

📁 source of perl for linux application,
💻 POD
📖 第 1 页 / 共 3 页
字号:
categories wants -- case, number, tense, real or metaphoriccharacteristics of the things that words refer to, arbitrary orpredictable classifications of words based on what endings or prefixesthey can take, degree or means of certainty about the truth ofstatements expressed, and so on, ad infinitum.Mercifully, most localization tasks are a matter of finding ways totranslate whole phrases, generally sentences, where the context isrelatively set, and where the only variation in content is I<usually>in a number being expressed -- as in the example sentences above.Translating specific, fully-formed sentences is, in practice, fairlyfoolproof -- which is good, because that's what's in the phrasebooksthat so many tourists rely on.  Now, a given phrase (whether in aphrasebook or in a gettext lexicon) in one language I<might> have agreater or lesser applicability than that phrase's translation intoanother language -- for example, strictly speaking, in Arabic, the"your" in "Your query matched..." would take a different formdepending on whether the user is male or female; so the Arabictranslation "your[feminine] query" is applicable in fewer cases thanthe corresponding English phrase, which doesn't distinguish the user'sgender.  (In practice, it's not feasable to have a program know theuser's gender, so the masculine "you" in Arabic is usually used, bydefault.)But in general, such surprises are rare when entire sentences arebeing translated, especially when the functional context is restrictedto that of a computer interacting with a user either to convey a factor to prompt for a piece of information.  So, for purposes oflocalization, translation by phrase (generally by sentence) is both thesimplest and the least problematic.=head2 Breaking gettext=over"It Has To Work."-- First Networking Truth, RFC 1925=backConsider that sentences in a tourist phrasebook are of two types: oneslike "How do I get to the marketplace?" that don't have any blanks tofill in, and ones like "How much do these ___ cost?", where there'sone or more blanks to fill in (and these are usually linked to alist of words that you can put in that blank: "fish", "potatoes","tomatoes", etc.)  The ones with no blanks are no problem, but thefill-in-the-blank ones may not be really straightforward. If it's aSwahili phrasebook, for example, the authors probably didn't bother totell you the complicated ways that the verb "cost" changes itsinflectional prefix depending on the noun you're putting in the blank.The trader in the marketplace will still understand what you're saying ifyou say "how much do these potatoes cost?" with the wronginflectional prefix on "cost".  After all, I<you> can't speak proper Swahili,I<you're> just a tourist.  But while tourists can be stupid, computersare supposed to be smart; the computer should be able to fill in theblank, and still have the results be grammatical.In other words, a phrasebook entry takes some values as parameters(the things that you fill in the blank or blanks), and provides a valuebased on these parameters, where the way you get that final value fromthe given values can, properly speaking, involve an arbitrarilycomplex series of operations.  (In the case of Chinese, it'd be not atall complex, at least in cases like the examples at the beginning ofthis article; whereas in the case of Russian it'd be a rather complexseries of operations.  And in some languages, thecomplexity could be spread around differently: while the act ofputting a number-expression in front of a noun phrase might not becomplex by itself, it may change how you have to, for example, inflecta verb elsewhere in the sentence.  This is what in syntax is called"long-distance dependencies".)This talk of parameters and arbitrary complexity is just another wayto say that an entry in a phrasebook is what in a programming languagewould be called a "function".  Just so you don't miss it, this is thecrux of this article: I<A phrase is a function; a phrasebook is abunch of functions.>The reason that using gettext runs into walls (as in the abovesecond-person horror story) is that you're trying to use a string (orworse, a choice among a bunch of strings) to do what you really need afunction for -- which is futile.  Preforming (s)printf interpolationon the strings which you get back from gettext does allow you to do I<some>common things passably well... sometimes... sort of; but, to paraphrasewhat some people say about C<csh> script programming, "it fools youinto thinking you can use it for real things, but you can't, and youdon't discover this until you've already spent too much time trying,and by then it's too late."=head2 Replacing gettextSo, what needs to replace gettext is a system that supports lexiconsof functions instead of lexicons of strings.  An entry in a lexiconfrom such a system should I<not> look like this:  "J'ai trouv\xE9 %g fichiers dans %g r\xE9pertoires"[\xE9 is e-acute in Latin-1.  Some pod renderers wouldscream if I used the actual character here. -- SB]but instead like this, bearing in mind that this is just a first stab:  sub I_found_X1_files_in_X2_directories {    my( $files, $dirs ) = @_[0,1];    $files = sprintf("%g %s", $files,      $files == 1 ? 'fichier' : 'fichiers');    $dirs = sprintf("%g %s", $dirs,      $dirs == 1 ? "r\xE9pertoire" : "r\xE9pertoires");    return "J'ai trouv\xE9 $files dans $dirs.";  }Now, there's no particularly obvious way to store anything but stringsin a gettext lexicon; so it looks like we just have to start over andmake something better, from scratch.  I call my shot at agettext-replacement system "Maketext", or, in CPAN terms,Locale::Maketext.When designing Maketext, I chose to plan its main features in terms of"buzzword compliance".  And here are the buzzwords:=head2 Buzzwords: Abstraction and EncapsulationThe complexity of the language you're trying to output a phrase in isentirely abstracted inside (and encapsulated within) the Maketext modulefor that interface.  When you call:  print $lang->maketext("You have [quant,_1,piece] of new mail.",                       scalar(@messages));you don't know (and in fact can't easily find out) whether this willinvolve lots of figuring, as in Russian (if $lang is a handle to theRussian module), or relatively little, as in Chinese.  That kind ofabstraction and encapsulation may encourage other pleasant buzzwordslike modularization and stratification, depending on what designdecisions you make.=head2 Buzzword: Isomorphism"Isomorphism" means "having the same structure or form"; in discussionsof program design, the word takes on the special, specific meaning thatyour implementation of a solution to a problem I<has the samestructure> as, say, an informal verbal description of the solution, ormaybe of the problem itself.  Isomorphism is, all things considered,a good thing -- it's what problem-solving (and solution-implementing)should look like.What's wrong the with gettext-using code like this...  printf( $file_count == 1 ?    ( $directory_count == 1 ?     "Your query matched %g file in %g directory." :     "Your query matched %g file in %g directories." ) :    ( $directory_count == 1 ?     "Your query matched %g files in %g directory." :     "Your query matched %g files in %g directories." ),   $file_count, $directory_count,  );is first off that it's not well abstracted -- these ways of testingfor grammatical number (as in the expressions like C<foo == 1 ?singular_form : plural_form>) should be abstracted to each languagemodule, since how you get grammatical number is language-specific.But second off, it's not isomorphic -- the "solution" (i.e., thephrasebook entries) for Chinese maps from these four English phrases tothe one Chinese phrase that fits for all of them.  In other words, theinformal solution would be "The way to say what you want in Chinese iswith the one phrase 'For your question, in Y directories you wouldfind X files'" -- and so the implemented solution should be,isomorphically, just a straightforward way to spit out that onephrase, with numerals properly interpolated.  It shouldn't have to mapfrom the complexity of other languages to the simplicity of this one.=head2 Buzzword: InheritanceThere's a great deal of reuse possible for sharing of phrases betweenmodules for related dialects, or for sharing of auxiliary functionsbetween related languages.  (By "auxiliary functions", I meanfunctions that don't produce phrase-text, but which, say, return ananswer to "does this number require a plural noun after it?".  Suchauxiliary functions would be used in the internal logic of functionsthat actually do produce phrase-text.)In the case of sharing phrases, consider that you have an interfacealready localized for American English (probably by having beenwritten with that as the native locale, but that's incidental).Localizing it for UK English should, in practical terms, be just amatter of running it past a British person with the instructions toindicate what few phrases would benefit from a change in spelling orpossibly minor rewording.  In that case, you should be able to put inthe UK English localization module I<only> those phrases that areUK-specific, and for all the rest, I<inherit> from the AmericanEnglish module.  (And I expect this same situation would apply withBrazilian and Continental Portugese, possbily with some I<very>closely related languages like Czech and Slovak, and possibly with theslightly different "versions" of written Mandarin Chinese, as I hear exist inTaiwan and mainland China.)As to sharing of auxiliary functions, consider the problem of Russiannumbers from the beginning of this article; obviously, you'd want towrite only once the hairy code that, given a numeric value, wouldreturn some specification of which case and number a given quanitifiednoun should use.  But suppose that you discover, while localizing aninterface for, say, Ukranian (a Slavic language related to Russian,spoken by several million people, many of whom would be relieved tofind that your Web site's or software's interface is available intheir language), that the rules in Ukranian are the same as in Russianfor quantification, and probably for many other grammatical functions.While there may well be no phrases in common between Russian andUkranian, you could still choose to have the Ukranian module inheritfrom the Russian module, just for the sake of inheriting all thevarious grammatical methods.  Or, probably better organizationally,you could move those functions to a module called C<_E_Slavic> orsomething, which Russian and Ukranian could inherit useful functionsfrom, but which would (presumably) provide no lexicon.=head2 Buzzword: ConcisionOkay, concision isn't a buzzword.  But it should be, so I decree thatas a new buzzword, "concision" means that simple common things shouldbe expressible in very few lines (or maybe even just a few characters)of code -- call it a special case of "making simple things easy andhard things possible", and see also the role it played in theMIDI::Simple language, discussed elsewhere in this issue [TPJ#13].Consider our first stab at an entry in our "phrasebook of functions":  sub I_found_X1_files_in_X2_directories {    my( $files, $dirs ) = @_[0,1];    $files = sprintf("%g %s", $files,      $files == 1 ? 'fichier' : 'fichiers');    $dirs = sprintf("%g %s", $dirs,      $dirs == 1 ? "r\xE9pertoire" : "r\xE9pertoires");    return "J'ai trouv\xE9 $files dans $dirs.";  }You may sense that a lexicon (to use a non-committal catch-all term for acollection of things you know how to say, regardless of whether they'rephrases or words) consisting of functions I<expressed> as above wouldmake for rather long-winded and repetitive code -- even if you wiselyrewrote this to have quantification (as we call adding a numberexpression to a noun phrase) be a function called like:  sub I_found_X1_files_in_X2_directories {    my( $files, $dirs ) = @_[0,1];    $files = quant($files, "fichier");    $dirs =  quant($dirs,  "r\xE9pertoire");    return "J'ai trouv\xE9 $files dans $dirs.";  }And you may also sense that you do not want to bother your translatorswith having to write Perl code -- you'd much rather that they spendtheir I<very costly time> on just translation.  And this is to saynothing of the near impossibility of finding a commercial translatorwho would know even simple Perl.In a first-hack implementation of Maketext, each language-module'slexicon looked like this:

⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -