📄 rfc4790.txt

📁 广泛使用的邮件服务器！同时
💻 TXT
📖 第 1 页 / 共 4 页
字号:
   not actually be present).  The "collation-auri" form is an abstract   name for an ordering, a collation pattern or a vendor private   collator.     collation-uri   =  "http://www.iana.org/assignments/collation/"                        collation-id ".xml"     collation-auri  =  ( "http://www.iana.org/assignments/collation/"                        collation-order ".xml" ) / other-uri     other-uri       =  <absoluteURI>                     ;  excluding the IANA collation namespace.3.5.  Naming Guidelines   While this specification makes no absolute requirements on the   structure of collation identifiers, naming consistency is important,   so the following initial guidelines are provided.   Collation identifiers with an international audience typically begin   with "i;".  Collation identifiers intended for a particular language   or locale typically begin with a language tag [5] followed by a ";".   After the first ";" is normally the name of the general collation   algorithm, followed by a series of algorithm modifications separated   by the ";" delimiter.  Parameterized modifications will use "=" toNewman, et al.              Standards Track                     [Page 7]RFC 4790                   Collation Registry                 March 2007   delimit the parameter from the value.  The version numbers of any   lookup tables used by the algorithm SHOULD be present as   parameterized modifications.   Collation identifiers of the form *;vnd-hostname;* are reserved for   vendor-specific collations created by the owner of the hostname   following the "vnd-" prefix (e.g., vnd-example.com for the vendor   example.com).  Registration of such collations (or the name space as   a whole), with intended use of the "Vendor", is encouraged when a   public specification or open-source implementation is available, but   is not required.4.  Collation Specification Requirements4.1.  Collation/Server Interface   The collation itself defines what it operates on.  Most collations   are expected to operate on character strings.  The i;octet   (Section 9.3) collation operates on octet strings.  The i;ascii-   numeric (Section 9.1) operation operates on numbers.   This specification defines the collation interface in terms of octet   strings.  However, implementations may choose to use character   strings instead.  Such implementations may not be able to implement   e.g., i;octet.  Since i;octet is not currently mandatory to implement   for any protocol, this should not be a problem.4.2.  Operations Supported   A collation specification MUST state which of the three basic   operations are supported (equality, substring, ordering) and how to   perform each of the supported operations on any two input character   strings, including empty strings.  Collations must be deterministic,   i.e., given a collation with a specific identifier, and any two fixed   input strings, the result MUST be the same for the same operation.   In general, collation operations should behave as their names   suggest.  While a collation may be new, the operations are not, so   the new collation's operations should be similar to those of older   collations.  For example, a date/time collation should not provide a   "substring" operation that would morph IMAP substring SEARCH into   e.g., a date-range search.   A non-obvious consequence of the rules for each collation operation   is that, for any single collation, either none or all of the   operations can return "undefined".  For example, it is not possible   to have an equality operation that never returns "undefined", and a   substring operation that occasionally does.Newman, et al.              Standards Track                     [Page 8]RFC 4790                   Collation Registry                 March 20074.2.1.  Validity   The validity test takes one string as argument.  It returns valid if   its input string is a valid input to the collation's other   operations, and invalid if not.  (In other words, a string is valid   if it is equal to itself according to the collation's equality   operation.)   The validity test is provided by all collations.  It MUST NOT be   listed separately in the collation registration.4.2.2.  Equality   The equality test always returns "match" or "no-match" when it is   supplied valid input, and MAY return "undefined" if one or both input   strings are not valid.   The equality test MUST be reflexive and symmetric.  For valid input,   it MUST be transitive.   If a collation provides either a substring or an ordering test, it   MUST also provide an equality test.  The substring and/or ordering   tests MUST be consistent with the equality test.   The return values of the equality test are called "match", "no-match"   and "undefined" in this document.4.2.3.  Substring   The substring matching operation determines if the first string is a   substring of the second string, i.e., if one or more substrings of   the second string is equal to the first, as defined by the   collation's equality operation.   A collation that supports substring matching will automatically   support two special cases of substring matching: prefix and suffix   matching, if those special cases are supported by the application   protocol.  It returns "match" or "no-match" when it is supplied valid   input and returns "undefined" when supplied invalid input.   Application protocols MAY return position information for substring   matches.  If this is done, the position information SHOULD include   both the starting offset and the ending offset for each match.  This   is important because more sophisticated collations can match strings   of unequal length (for example, a pre-composed accented character can   match a decomposed accented character).  In general, overlapping   matches SHOULD be reported (as when "ana" occurs twice within   "banana"), although there are cases where a collation may decide notNewman, et al.              Standards Track                     [Page 9]RFC 4790                   Collation Registry                 March 2007   to.  For example, in a collation which treats all whitespace   sequences as identical, the substring operation could be defined such   that " 1 " (SP "1" SP) is reported just once within "  1  " (SP SP   "1" SP SP), not four times (SP SP "1" SP, SP "1" SP, SP "1" SP SP and   SP SP "1" SP SP), since the four matches are, in a sense, the same   match.   A string is a substring of itself.  The empty string is a substring   of all strings.   Note that the substring operation of some collations can match   strings of unequal length.  For example, a pre-composed accented   character can match a decomposed accented character.  The Unicode   Collation Algorithm [7] discusses this in more detail.   The return values of the substring operation are called "match", "no-   match", and "undefined" in this document.4.2.4.  Ordering   The ordering operation determines how two strings are ordered.  It   MUST be reflexive.  For valid input, it MUST be transitive and   trichotomous.   Ordering returns "less" if the first string is listed before the   second string, according to the collation; "greater", if the second   string is listed before the first string; and "equal", if the two   strings are equal, as defined by the collation's equality operation.   If one or both strings are invalid, the result of ordering is   "undefined".   When the collation is used with a "+" prefix, the behavior is the   same as when used with no prefix.  When the collation is used with a   "-" prefix, the result of the ordering operation of the collation   MUST be reversed.   The return values of the ordering operation are called "less",   "equal", "greater", and "undefined" in this document.4.3.  Sort Keys   A collation specification SHOULD describe the internal transformation   algorithm to generate sort keys.  This algorithm can be applied to   individual strings, and the result can be stored to potentially   optimize future comparison operations.  A collation MAY specify that   the sort key is generated by the identity function.  The sort key may   have no meaning to a human.  The sort key may not be valid input to   the collation.Newman, et al.              Standards Track                    [Page 10]RFC 4790                   Collation Registry                 March 20074.4.  Use of Lookup Tables   Some collations use customizable lookup tables, e.g., because the   tables depend on locale, and may be modified after shipping the   software.  Collations that use more than one customizable lookup   table in a documented format MUST assign numbers to the tables they   use.  This permits an application protocol command to access the   tables used by a server collation, so that clients and servers use   the same tables.5.  Application Protocol Requirements   This section describes the requirements and issues that an   application protocol needs to consider if it offers searching,   substring matching and/or sorting, and permits the use of characters   outside the US-ASCII charset.5.1.  Character Encoding   The protocol specification has to make sure that it is clear on which   characters (rather than just octets) the collations are used.  This   can be done by specifying the protocol itself in terms of characters   (e.g., in the case of a query language), by specifying a single   character encoding for the protocol (e.g., UTF-8 [3]), or by   carefully describing the relevant issues of character encoding   labeling and conversion.  In the later case, details to consider   include how to handle unknown charsets, any charsets that are   mandatory-to-implement, any issues with byte-order that might apply,   and any transfer encodings that need to be supported.5.2.  Operations   The protocol must specify which of the operations defined in this   specification (equality matching, substring matching, and ordering)   can be invoked in the protocol, and how they are invoked.  There may   be more than one way to invoke an operation.   The protocol MUST provide a mechanism for the client to select the   collation to use with equality matching, substring matching, and   ordering.   If a protocol needs a total ordering and the collation chosen does   not provide it because the ordering operation returns "undefined" at   least once, the recommended fallback is to sort all invalid strings   after the valid ones, and use i;octet to order the invalid strings.   Although the collation's substring function provides a list of   matches, a protocol need not provide all that to the client.  It mayNewman, et al.              Standards Track                    [Page 11]RFC 4790                   Collation Registry                 March 2007   provide only the first matching substring, or even just the   information that the substring search matched.  In this way,   collations can be used with protocols that are defined such that "x   is a substring of y" returns true-false.   If the protocol provides positional information for the results of a   substring match, that positional information SHOULD fully specify the   substring(s) in the result that matches, independent of the length of   the search string.  For example, returning both the starting and   ending offset of the match would suffice, as would the starting   offset and a length.  Returning just the starting offset is not   acceptable.  This rule is necessary because advanced collations can   treat strings of different lengths as equal (for example, pre-   composed and decomposed accented characters).5.3.  Wildcards   The protocol MUST specify whether it allows the use of wildcards in   collation identifiers.  If the protocol allows wildcards, then:      The protocol MUST specify how comparisons behave in the absence of      explicit collation negotiation, or when a collation of "default"      is requested.  The protocol MAY specify that the default collation      used in such circumstances is sensitive to server configuration.      The protocol SHOULD provide a way to list available collations      matching a given wildcard pattern, or patterns.5.4.  String Comparison   If a protocol compares strings in any nontrivial way, using a   collation may be appropriate.  As an example, many protocols use   case-independent strings.  In many cases, a simple ASCII mapping to   upper/lower case works well.  In other cases, it may be better to use   a specifiable collation; for example, so that a server can treat "i"   and "I" as equivalent in Italy, and different in Turkey (Turkish also   has a dotted upper-case" I" and a dotless lower-case "i").   Protocol designers should consider, in each case, whether to use a   specifiable collation.  Keywords often have other needs than user   variables, and search arguments may be different again.5.5.  Disconnected Clients   If the protocol supports disconnected clients, and a collation is   used that can use configurable tables (e.g., to support   locale-specific extensions), then the client may not be able to   reproduce the server's collation operations while offline.Newman, et al.              Standards Track                    [Page 12]RFC 4790                   Collation Registry                 March 2007   A mechanism to download such tables has been discussed.  Such a   mechanism is not included in the present specification, since the   problem is not yet well understood.5.6.  Error Codes   The protocol specification should consider assigning protocol error   codes for the following circumstances:   o  The client requests the use of a collation by identifier or      pattern, but no implemented collation matches that pattern.   o  The client attempts to use a collation for an operation that is      not supported by that collation -- for example, attempting to use      the "i;ascii-numeric" collation for substring matching.   o  The client uses an equality or substring matching collation, and      the result is an error.  It may be appropriate to distinguish      between the two input strings, particularly when one is supplied      by the client and the other is stored by the server.  It might      also be appropriate to distinguish the specific case of an invalid      UTF-8 string.5.7.  Octet Collation   The i;octet (Section 9.3) collation is only usable with protocols   based on octet-strings.  Clients and servers MUST NOT use i;octet   with other protocols.   If the protocol permits the use of collations with data structures   other than strings, the protocol MUST describe the default behavior   for a collation with those data structures.6.  Use by Existing Protocols   This section is informative.   Both ACAP [11] and Sieve [14] are standards track specifications that   used collations prior to the creation of this specification and   registry.  Those standards do not meet all the application protocol   requirements described in Section 5.   These protocols allow the use of the i;octet (Section 9.3) collation   working directly on UTF-8 data, as used in these protocols.Newman, et al.              Standards Track                    [Page 13]
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -