rfc1439.txt
来自「RFC 的详细文档!」· 文本 代码 · 共 619 行 · 第 1/2 页
TXT
619 行
Network Working Group C. Finseth
Request for Comments: 1439 University of Minnesota
March 1993
The Uniqueness of Unique Identifiers
Status of this Memo
This memo provides information for the Internet community. It does
not specify an Internet standard. Distribution of this memo is
unlimited.
Abstract
This RFC provides information that may be useful when selecting a
method to use for assigning unique identifiers to people.
1. The Issue
Computer systems require a way to identify the people associated with
them. These identifiers have been called "user names" or "account
names." The identifers are typically short, alphanumeric strings.
In general, these identifiers must be unique.
The uniqueness is usually achieved in one of three ways:
1) The identifiers are assigned in a unique manner without using
information associated with the individual. Example identifiers are:
ax54tv
cs00034
This method was often used by large timesharing systems. While it
achieved the uniqueness property, there was no way of guessing the
identifier without knowing it through other means.
2) The identifiers are assigned in a unique manner where the bulk of
the identifier is algorithmically derived from the individual's name.
Example identifers are:
Craig.A.Finseth-1
Finseth1
caf-1
fins0001
3) The identifiers are in general not assigned in a unique manner:
the identifier is algorithmically derived from the individual's name
Finseth [Page 1]
RFC 1439 Uniqueness of Unique Identifiers March 1993
and duplicates are handled in an ad-hoc manner. Example identifiers
are:
Craig.Finseth
caf
Now that we have widespread electronic mail, an important feature of
an identifier system is the ability to predict the identifier based
on other information associated with the individual. This other
information is typically the person's name.
Methods two and three make such predictions possible, especially if
you have one example mapping from a person's name to the identifier.
Method two relies on using some or all of the name and
algorithmically varying it to ensure uniqueness (for example, by
appending an integer). Method three relies on using some or all of
the name and selects an alternate identifier in the case of a
duplication.
For both methods, it is important to minimize the need for making the
adjustments required to ensure uniqueness (i.e., an integer that is
not 1 or an alternate identifier). The probability that an
adjustment will be required depends on the format of the identifer
and the size of the organization.
2. Identifier Formats
There are a number of popular identifier formats. This section will
list some of them and supply both typical and maximum values for the
number of possible identifiers. A "typical" value is the number that
you are likely to run into in real life. A "maximum" value is the
largest number of possible (without getting extreme about it) values.
All ranges are expressed as a number of bits.
2.1 Initials
There are three popular formats based on initials: those with one,
two, or three letters. (The number of people with more than three
initials is assumed to be small.) Values:
format typical maximum
I 4 5
II 8 10
III 12 15
Finseth [Page 2]
RFC 1439 Uniqueness of Unique Identifiers March 1993
You can also think of these as first, middle, and last initials:
I 4 5
F L 8 10
F M L 12 15
2.2 Names
Again, there are three popular formats based on using names: those
with the first name, last name, and both first and last names.
Values:
format typical maximum
First 8 14
Last 9 13
First Last 17 27
2.3 Combinations
I have seen these combinations in use ("F" is first initial, "M" is
middle initial, and "L" is last initial):
format typical maximum
F Last 13 18
F M Last 17 23
First L 12 19
First M Last 21 32
2.4 Complete List
Here are all possible combinations of nothing, initial, and full name
for first, middle, and last. The number of Middle names is assumed
to be the same as the number of First names. Values:
format typical maximum
_ _ _ 0 0
_ _ L 4 5
_ _ Last 9 13
_ M _ 4 5
_ M L 5 10
_ M Last 13 18
_ Middle _ 8 14
_ Middle L 12 19
Finseth [Page 3]
RFC 1439 Uniqueness of Unique Identifiers March 1993
_ Middle Last 17 27
F _ _ 4 5
F _ L 5 10
F _ Last 13 18
F M _ 5 10
F M L 12 15
F M Last 17 23
F Middle _ 12 19
F Middle L 16 24
F Middle Last 21 32
First _ _ 8 14
First _ L 12 19
First _ Last 17 27
First M _ 12 19
First M L 16 24
First M Last 21 32
First Middle _ 16 28
First Middle L 20 33
First Middle Last 26 40
3. Probabilities of Duplicates
As can be seen, the information content in these identifiers in no
case exceeds 40 bits and the typical information content never
exceeds 26 bits. The content of most of them is in the 8 to 20 bit
range. Duplicates are thus not only possible but likely.
The method used to compute the probability of duplicates is the same
as that of the well-known "birthday" problem. For a universe of N
items, the probability of duplicates in X members is expressed by:
N N-1 N-2 N-(X-1)
- x --- x --- x ... x -------
N N N N
A program to compute this function for selected values of N is given
in the appendix, as is its complete output.
The "1%" column is the number of items (people) before an
organization of that (universe) size has a 1% chance of a duplicate.
Similarly for 2%, 5%, 10%, and 20%.
Finseth [Page 4]
RFC 1439 Uniqueness of Unique Identifiers March 1993
bits universe 1% 2% 5% 10% 20%
6 64 2 3 4 5 6
7 128 3 3 5 6 8
8 256 3 4 6 8 12
9 512 4 6 8 11 16
10 1,024 6 7 11 16 22
11 2,048 7 10 15 22 31
12 4,096 10 14 21 30 44
13 8,192 14 19 30 43 61
14 16,384 19 27 42 60 86
15 32,768 27 37 59 84 122
16 65,536 37 52 83 118 172
17 131,072 52 74 117 167 243
18 262,144 74 104 165 236 343
19 524,288 104 147 233 333 485
20 1,048,576 146 207 329 471 685
21 2,097,152 206 292 465 666 968
22 4,194,304 291 413 657 941 1369
23 8,388,608 412 583 929 1330 1936
24 16,777,216 582 824 1313 1881 2737
25 33,554,432 822 1165 1856 2660 3871
26 67,108,864 1162 1648 2625 3761 5474
27 134,217,728 1644 2330 3712 5319 7740
28 268,435,456 2324 3294 5249 7522 10946
29 536,870,912 3286 4659 7422 10637 15480
30 1,073,741,824 4647 6588 10496 15043 21891
31 2,147,483,648 6571 9316 14844 21273 30959
For example, assume an organization were to select the "First Last"
form. This form has 17 bits (typical) and 27 bits (maximum) of
information. The relevant line is:
17 131,072 52 74 117 167 243
For an organization with 100 people, the probability of a duplicate
would be between 2% and 5% (probably around 4%). If the organization
had 1,000 people, the probability of a duplicate would be much
greater than 20%.
Appendix: Reuse of Identifiers and Privacy Issues
Let's say that an organization were to select the format:
First.M.Last-#
as my own organization has. Is the -# required, or can one simply
do:
Finseth [Page 5]
RFC 1439 Uniqueness of Unique Identifiers March 1993
Craig.A.Finseth
for the first one and
Craig.A.Finseth-2
(or -1) for the second? The answer is "no," although for non-obvious
reasons.
Assume that the organization has made this selection and a third
party wants to send e-mail to Craig.A.Finseth. Because of the
Electronic Communications Privacy Act of 1987, an organization must
treat electronic mail with care. In this case, there is no way for
the third party user to reliably know that sending to Craig.A.Finseth
is (may be) the wrong party. On the other hand, if the -# suffix is
always present and attempts to send mail to the non-suffix form are
rejected, the third party user will realize that they must have the
suffix in order to have a unique identifier.
For similar reasons, identifiers in this form should not be re-used
in the life of the mail system.
Appendix: Perl Program to Compute Probabilities
⌨️ 快捷键说明
复制代码Ctrl + C
搜索代码Ctrl + F
全屏模式F11
增大字号Ctrl + =
减小字号Ctrl + -
显示快捷键?