rfc1439.txt

来自「RFC 的详细文档!」· 文本 代码 · 共 619 行 · 第 1/2 页

TXT
619
字号






Network Working Group                                         C. Finseth
Request for Comments: 1439                       University of Minnesota
                                                              March 1993


                  The Uniqueness of Unique Identifiers

Status of this Memo

   This memo provides information for the Internet community.  It does
   not specify an Internet standard.  Distribution of this memo is
   unlimited.

Abstract

   This RFC provides information that may be useful when selecting a
   method to use for assigning unique identifiers to people.

1. The Issue

   Computer systems require a way to identify the people associated with
   them.  These identifiers have been called "user names" or "account
   names."  The identifers are typically short, alphanumeric strings.
   In general, these identifiers must be unique.

   The uniqueness is usually achieved in one of three ways:

   1) The identifiers are assigned in a unique manner without using
   information associated with the individual.  Example identifiers are:

           ax54tv
           cs00034

   This method was often used by large timesharing systems.  While it
   achieved the uniqueness property, there was no way of guessing the
   identifier without knowing it through other means.

   2) The identifiers are assigned in a unique manner where the bulk of
   the identifier is algorithmically derived from the individual's name.
   Example identifers are:

           Craig.A.Finseth-1
           Finseth1
           caf-1
           fins0001

   3) The identifiers are in general not assigned in a unique manner:
   the identifier is algorithmically derived from the individual's name



Finseth                                                         [Page 1]

RFC 1439            Uniqueness of Unique Identifiers          March 1993


   and duplicates are handled in an ad-hoc manner.  Example identifiers
   are:

           Craig.Finseth
           caf

   Now that we have widespread electronic mail, an important feature of
   an identifier system is the ability to predict the identifier based
   on other information associated with the individual.  This other
   information is typically the person's name.

   Methods two and three make such predictions possible, especially if
   you have one example mapping from a person's name to the identifier.
   Method two relies on using some or all of the name and
   algorithmically varying it to ensure uniqueness (for example, by
   appending an integer).  Method three relies on using some or all of
   the name and selects an alternate identifier in the case of a
   duplication.

   For both methods, it is important to minimize the need for making the
   adjustments required to ensure uniqueness (i.e., an integer that is
   not 1 or an alternate identifier).  The probability that an
   adjustment will be required depends on the format of the identifer
   and the size of the organization.

2. Identifier Formats

   There are a number of popular identifier formats.  This section will
   list some of them and supply both typical and maximum values for the
   number of possible identifiers.  A "typical" value is the number that
   you are likely to run into in real life.  A "maximum" value is the
   largest number of possible (without getting extreme about it) values.
   All ranges are expressed as a number of bits.

2.1 Initials

   There are three popular formats based on initials: those with one,
   two, or three letters.  (The number of people with more than three
   initials is assumed to be small.)  Values:

           format                  typical         maximum

           I                       4               5
           II                      8               10
           III                     12              15






Finseth                                                         [Page 2]

RFC 1439            Uniqueness of Unique Identifiers          March 1993


   You can also think of these as first, middle, and last initials:

           I                       4               5
           F L                     8               10
           F M L                   12              15

2.2 Names

   Again, there are three popular formats based on using names: those
   with the first name, last name, and both first and last names.
   Values:

           format                  typical         maximum

           First                   8               14
           Last                    9               13
           First Last              17              27

2.3 Combinations

   I have seen these combinations in use ("F" is first initial, "M" is
   middle initial, and "L" is last initial):

           format                  typical         maximum

           F Last                  13              18
           F M Last                17              23
           First L                 12              19
           First M Last            21              32

2.4 Complete List

   Here are all possible combinations of nothing, initial, and full name
   for first, middle, and last.  The number of Middle names is assumed
   to be the same as the number of First names.  Values:

           format                  typical         maximum

           _ _ _                   0               0
           _ _ L                   4               5
           _ _ Last                9               13

           _ M _                   4               5
           _ M L                   5               10
           _ M Last                13              18

           _ Middle _              8               14
           _ Middle L              12              19



Finseth                                                         [Page 3]

RFC 1439            Uniqueness of Unique Identifiers          March 1993


           _ Middle Last           17              27

           F _ _                   4               5
           F _ L                   5               10
           F _ Last                13              18

           F M _                   5               10
           F M L                   12              15
           F M Last                17              23

           F Middle _              12              19
           F Middle L              16              24
           F Middle Last           21              32

           First _ _               8               14
           First _ L               12              19
           First _ Last            17              27

           First M _               12              19
           First M L               16              24
           First M Last            21              32

           First Middle _          16              28
           First Middle L          20              33
           First Middle Last       26              40

3. Probabilities of Duplicates

   As can be seen, the information content in these identifiers in no
   case exceeds 40 bits and the typical information content never
   exceeds 26 bits.  The content of most of them is in the 8 to 20 bit
   range.  Duplicates are thus not only possible but likely.

   The method used to compute the probability of duplicates is the same
   as that of the well-known "birthday" problem.  For a universe of N
   items, the probability of duplicates in X members is expressed by:

           N   N-1   N-2         N-(X-1)
           - x --- x --- x ... x -------
           N    N     N             N

   A program to compute this function for selected values of N is given
   in the appendix, as is its complete output.

   The "1%" column is the number of items (people) before an
   organization of that (universe) size has a 1% chance of a duplicate.
   Similarly for 2%, 5%, 10%, and 20%.




Finseth                                                         [Page 4]

RFC 1439            Uniqueness of Unique Identifiers          March 1993


           bits       universe     1%      2%      5%      10%     20%

            6                 64   2       3       4       5       6
            7                128   3       3       5       6       8
            8                256   3       4       6       8       12
            9                512   4       6       8       11      16
           10              1,024   6       7       11      16      22
           11              2,048   7       10      15      22      31
           12              4,096   10      14      21      30      44
           13              8,192   14      19      30      43      61
           14             16,384   19      27      42      60      86
           15             32,768   27      37      59      84      122
           16             65,536   37      52      83      118     172
           17            131,072   52      74      117     167     243
           18            262,144   74      104     165     236     343
           19            524,288   104     147     233     333     485
           20          1,048,576   146     207     329     471     685
           21          2,097,152   206     292     465     666     968
           22          4,194,304   291     413     657     941     1369
           23          8,388,608   412     583     929     1330    1936
           24         16,777,216   582     824     1313    1881    2737
           25         33,554,432   822     1165    1856    2660    3871
           26         67,108,864   1162    1648    2625    3761    5474
           27        134,217,728   1644    2330    3712    5319    7740
           28        268,435,456   2324    3294    5249    7522    10946
           29        536,870,912   3286    4659    7422    10637   15480
           30      1,073,741,824   4647    6588    10496   15043   21891
           31      2,147,483,648   6571    9316    14844   21273   30959

   For example, assume an organization were to select the "First Last"
   form.  This form has 17 bits (typical) and 27 bits (maximum) of
   information.  The relevant line is:

           17            131,072   52      74      117     167     243

   For an organization with 100 people, the probability of a duplicate
   would be between 2% and 5% (probably around 4%).  If the organization
   had 1,000 people, the probability of a duplicate would be much
   greater than 20%.

Appendix: Reuse of Identifiers and Privacy Issues

   Let's say that an organization were to select the format:

           First.M.Last-#

   as my own organization has.  Is the -# required, or can one simply
   do:



Finseth                                                         [Page 5]

RFC 1439            Uniqueness of Unique Identifiers          March 1993


           Craig.A.Finseth

   for the first one and

           Craig.A.Finseth-2

   (or -1) for the second?  The answer is "no," although for non-obvious
   reasons.

   Assume that the organization has made this selection and a third
   party wants to send e-mail to Craig.A.Finseth.  Because of the
   Electronic Communications Privacy Act of 1987, an organization must
   treat electronic mail with care.  In this case, there is no way for
   the third party user to reliably know that sending to Craig.A.Finseth
   is (may be) the wrong party.  On the other hand, if the -# suffix is
   always present and attempts to send mail to the non-suffix form are
   rejected, the third party user will realize that they must have the
   suffix in order to have a unique identifier.

   For similar reasons, identifiers in this form should not be re-used
   in the life of the mail system.

Appendix: Perl Program to Compute Probabilities

⌨️ 快捷键说明

复制代码Ctrl + C
搜索代码Ctrl + F
全屏模式F11
增大字号Ctrl + =
减小字号Ctrl + -
显示快捷键?