📄 langtags.html
字号:
<?xml version="1.0" ?>
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<html>
<head>
<!-- saved from url=(0017)http://localhost/ -->
<script language="JavaScript" src="../../displayToc.js"></script>
<script language="JavaScript" src="../../tocParas.js"></script>
<script language="JavaScript" src="../../tocTab.js"></script>
<link rel="stylesheet" type="text/css" href="../../scineplex.css">
<title>I18N::LangTags - functions for dealing with RFC3066-style language tags</title>
<link rel="stylesheet" href="../../Active.css" type="text/css" />
<meta http-equiv="content-type" content="text/html; charset=utf-8" />
<link rev="made" href="mailto:" />
</head>
<body>
<script>writelinks('__top__',2);</script>
<h1><a>I18N::LangTags - functions for dealing with RFC3066-style language tags</a></h1>
<p><a name="__index__"></a></p>
<!-- INDEX BEGIN -->
<ul>
<li><a href="#name">NAME</a></li>
<li><a href="#synopsis">SYNOPSIS</a></li>
<li><a href="#description">DESCRIPTION</a></li>
<li><a href="#about_lowercasing">ABOUT LOWERCASING</a></li>
<li><a href="#about_unicode_plaintext_language_tags">ABOUT UNICODE PLAINTEXT LANGUAGE TAGS</a></li>
<li><a href="#see_also">SEE ALSO</a></li>
<li><a href="#copyright">COPYRIGHT</a></li>
<li><a href="#author">AUTHOR</a></li>
</ul>
<!-- INDEX END -->
<hr />
<p>
</p>
<h1><a name="name">NAME</a></h1>
<p>I18N::LangTags - functions for dealing with RFC3066-style language tags</p>
<p>
</p>
<hr />
<h1><a name="synopsis">SYNOPSIS</a></h1>
<pre>
<span class="keyword">use</span> <span class="variable">I18N::LangTags</span><span class="operator">();</span>
</pre>
<p>...or specify whichever of those functions you want to import, like so:</p>
<pre>
<span class="keyword">use</span> <span class="variable">I18N::LangTags</span> <span class="string">qw(implicate_supers similarity_language_tag)</span><span class="operator">;</span>
</pre>
<p>All the exportable functions are listed below -- you're free to import
only some, or none at all. By default, none are imported. If you
say:</p>
<pre>
use I18N::LangTags qw(:ALL)</pre>
<p>...then all are exported. (This saves you from having to use
something less obvious like <code>use I18N::LangTags qw(/./)</code>.)</p>
<p>If you don't import any of these functions, assume a <code>&I18N::LangTags::</code>
in front of all the function names in the following examples.</p>
<p>
</p>
<hr />
<h1><a name="description">DESCRIPTION</a></h1>
<p>Language tags are a formalism, described in RFC 3066 (obsoleting
1766), for declaring what language form (language and possibly
dialect) a given chunk of information is in.</p>
<p>This library provides functions for common tasks involving language
tags as they are needed in a variety of protocols and applications.</p>
<p>Please see the "See Also" references for a thorough explanation
of how to correctly use language tags.</p>
<ul>
<li><strong><a name="item_is_language_tag">the function <code>is_language_tag($lang1)</code></a></strong>
<p>Returns true iff $lang1 is a formally valid language tag.</p>
<pre>
is_language_tag("fr") is TRUE
is_language_tag("x-jicarilla") is FALSE
(Subtags can be 8 chars long at most -- 'jicarilla' is 9)</pre>
<pre>
is_language_tag("sgn-US") is TRUE
(That's American Sign Language)</pre>
<pre>
is_language_tag("i-Klikitat") is TRUE
(True without regard to the fact noone has actually
registered Klikitat -- it's a formally valid tag)</pre>
<pre>
is_language_tag("fr-patois") is TRUE
(Formally valid -- altho descriptively weak!)</pre>
<pre>
<span class="variable">is_language_tag</span><span class="operator">(</span><span class="string">"Spanish"</span><span class="operator">)</span> <span class="variable">is</span> <span class="variable">FALSE</span>
<span class="variable">is_language_tag</span><span class="operator">(</span><span class="string">"french-patois"</span><span class="operator">)</span> <span class="variable">is</span> <span class="variable">FALSE</span>
<span class="operator">(</span><span class="variable">No</span> <span class="variable">good</span> <span class="operator">--</span> <span class="variable">first</span> <span class="variable">subtag</span> <span class="variable">has</span> <span class="variable">to</span> <span class="variable">match</span>
<span class="operator">/^(</span><span class="operator">[</span><span class="variable">xXiI</span><span class="operator">]</span><span class="operator">|</span><span class="operator">[</span><span class="variable">a</span><span class="operator">-</span><span class="variable">zA</span><span class="operator">-</span><span class="variable">Z</span><span class="operator">]</span><span class="operator">{</span><span class="number">2</span><span class="operator">,</span><span class="number">3</span><span class="operator">})</span><span class="variable">$/</span> <span class="operator">--</span> <span class="variable">see</span> <span class="variable">RFC3066</span><span class="operator">)</span>
</pre>
<pre>
is_language_tag("x-borg-prot2532") is TRUE
(Yes, subtags can contain digits, as of RFC3066)</pre>
</li>
<li><strong><a name="item_extract_language_tags">the function <code>extract_language_tags($whatever)</code></a></strong>
<p>Returns a list of whatever looks like formally valid language tags
in $whatever. Not very smart, so don't get too creative with
what you want to feed it.</p>
<pre>
extract_language_tags("fr, fr-ca, i-mingo")
returns: ('fr', 'fr-ca', 'i-mingo')</pre>
<pre>
extract_language_tags("It's like this: I'm in fr -- French!")
returns: ('It', 'in', 'fr')
(So don't just feed it any old thing.)</pre>
<p>The output is untainted. If you don't know what tainting is,
don't worry about it.</p>
</li>
<li><strong><a name="item_same_language_tag">the function same_language_tag($lang1, $lang2)</a></strong>
<p>Returns true iff $lang1 and $lang2 are acceptable variant tags
representing the same language-form.</p>
<pre>
same_language_tag('x-kadara', 'i-kadara') is TRUE
(The x/i- alternation doesn't matter)
same_language_tag('X-KADARA', 'i-kadara') is TRUE
(...and neither does case)
same_language_tag('en', 'en-US') is FALSE
(all-English is not the SAME as US English)
same_language_tag('x-kadara', 'x-kadar') is FALSE
(these are totally unrelated tags)
same_language_tag('no-bok', 'nb') is TRUE
(no-bok is a legacy tag for nb (Norwegian Bokmal))</pre>
<p><a href="#item_same_language_tag"><code>same_language_tag</code></a> works by just seeing whether
<a href="#item_encode_language_tag"><code>encode_language_tag($lang1)</code></a> is the same as
<a href="#item_encode_language_tag"><code>encode_language_tag($lang2)</code></a>.</p>
<p>(Yes, I know this function is named a bit oddly. Call it historic
reasons.)</p>
</li>
<li><strong><a name="item_similarity_language_tag">the function similarity_language_tag($lang1, $lang2)</a></strong>
<p>Returns an integer representing the degree of similarity between
tags $lang1 and $lang2 (the order of which does not matter), where
similarity is the number of common elements on the left,
without regard to case and to x/i- alternation.</p>
<pre>
similarity_language_tag('fr', 'fr-ca') is 1
(one element in common)
similarity_language_tag('fr-ca', 'fr-FR') is 1
(one element in common)</pre>
<pre>
similarity_language_tag('fr-CA-joual',
'fr-CA-PEI') is 2
similarity_language_tag('fr-CA-joual', 'fr-CA') is 2
(two elements in common)</pre>
<pre>
similarity_language_tag('x-kadara', 'i-kadara') is 1
(x/i- doesn't matter)</pre>
<pre>
similarity_language_tag('en', 'x-kadar') is 0
similarity_language_tag('x-kadara', 'x-kadar') is 0
(unrelated tags -- no similarity)</pre>
<pre>
similarity_language_tag('i-cree-syllabic',
'i-cherokee-syllabic') is 0
(no B<leftmost> elements in common!)</pre>
</li>
<li><strong><a name="item_is_dialect_of">the function is_dialect_of($lang1, $lang2)</a></strong>
<p>Returns true iff language tag $lang1 represents a subform of
language tag $lang2.</p>
<p><strong>Get the order right! It doesn't work the other way around!</strong></p>
<pre>
is_dialect_of('en-US', 'en') is TRUE
(American English IS a dialect of all-English)</pre>
<pre>
is_dialect_of('fr-CA-joual', 'fr-CA') is TRUE
is_dialect_of('fr-CA-joual', 'fr') is TRUE
(Joual is a dialect of (a dialect of) French)</pre>
<pre>
is_dialect_of('en', 'en-US') is FALSE
(all-English is a NOT dialect of American English)</pre>
<pre>
is_dialect_of('fr', 'en-CA') is FALSE</pre>
<pre>
is_dialect_of('en', 'en' ) is TRUE
is_dialect_of('en-US', 'en-US') is TRUE
(B<Note:> these are degenerate cases)</pre>
<pre>
is_dialect_of('i-mingo-tom', 'x-Mingo') is TRUE
(the x/i thing doesn't matter, nor does case)</pre>
<pre>
is_dialect_of('nn', 'no') is TRUE
(because 'nn' (New Norse) is aliased to 'no-nyn',
as a special legacy case, and 'no-nyn' is a
subform of 'no' (Norwegian))</pre>
</li>
<li><strong><a name="item_super_languages">the function <code>super_languages($lang1)</code></a></strong>
<p>Returns a list of language tags that are superordinate tags to $lang1
-- it gets this by removing subtags from the end of $lang1 until
nothing (or just "i" or "x") is left.</p>
<pre>
super_languages("fr-CA-joual") is ("fr-CA", "fr")</pre>
<pre>
super_languages("en-AU") is ("en")</pre>
<pre>
super_languages("en") is empty-list, ()</pre>
<pre>
super_languages("i-cherokee") is empty-list, ()
...not ("i"), which would be illegal as well as pointless.</pre>
<p>If $lang1 is not a valid language tag, returns empty-list in
a list context, undef in a scalar context.</p>
<p>A notable and rather unavoidable problem with this method:
"x-mingo-tom" has an "x" because the whole tag isn't an
IANA-registered tag -- but <a href="#item_super_languages"><code>super_languages('x-mingo-tom')</code></a> is
('x-mingo') -- which isn't really right, since 'i-mingo' is
registered. But this module has no way of knowing that. (But note
that same_language_tag('x-mingo', 'i-mingo') is TRUE.)</p>
<p>More importantly, you assume <em>at your peril</em> that superordinates of
$lang1 are mutually intelligible with $lang1. Consider this
carefully.</p>
</li>
<li><strong><a name="item_locale2language_tag">the function <code>locale2language_tag($locale_identifier)</code></a></strong>
<p>This takes a locale name (like "en", "en_US", or "en_US.ISO8859-1")
and maps it to a language tag. If it's not mappable (as with,
notably, "C" and "POSIX"), this returns empty-list in a list context,
or undef in a scalar context.</p>
<pre>
locale2language_tag("en") is "en"</pre>
<pre>
locale2language_tag("en_US") is "en-US"</pre>
<pre>
locale2language_tag("en_US.ISO8859-1") is "en-US"</pre>
<pre>
locale2language_tag("C") is undef or ()</pre>
<pre>
locale2language_tag("POSIX") is undef or ()</pre>
<pre>
locale2language_tag("POSIX") is undef or ()</pre>
<p>I'm not totally sure that locale names map satisfactorily to language
tags. Think REAL hard about how you use this. YOU HAVE BEEN WARNED.</p>
<p>The output is untainted. If you don't know what tainting is,
don't worry about it.</p>
</li>
<li><strong><a name="item_encode_language_tag">the function <code>encode_language_tag($lang1)</code></a></strong>
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -