📄 thesis.txt
字号:
taking advantage of as many of the following as are available to the
cryptanalyst: (1) taking advantage of the redundancy in any natural
language to determine the validity of assumptions, (2) clues gained
from corresponding plain and cipher text, (3) information that might
be known about the algorithm(s) used, (4) the general expected content
of the cryptograms, (5) all of the cipher text that is available in
the same system & key, (6) compromised keys, (7) as much computational
and analytical power as can be obtained, and (8) mistakes made in
the users of the cryptographic system. The cryptographer can make
life as difficult as possible for the cryptanalyst by depriving him
of some of these things by (1) using redundancy reduction before encryption,
(2) using an algorithm that is resistant to the known plain text attack,
(3) & (4) using a strong enough algorithm that these clues aren't
really useful, (5) changing keys often and selecting them properly,
(6) guarding keys as closely as the data they protect justifies, (7)
ensuring that there aren't enough computers in the world to do a brute
force attack on the algorithm, and (8) making sure users of the system
understand how to properly use it.
This thesis proposes one solution to the above challenge by proposing
(1) an approach to linguistically based redundancy reduction, and
(2) proposing a new data encryption algorithm (MPJ) that can be used
where DES is in use now, but is more secure.
A. Motivation
There has been a great deal of discussion of the security of DES in
the open literature. Most of it has been favorable to DES [MEY],
but there are a few indications that it would be wise to supplement
the aging DES and eventually replace it.
DES has been in use since 1977, and has been used in a large number
of applications where people have had many possible motivations for
trying to break this cipher. During this time, it is possible that
someone has discovered a computationally feasible method for doing
so [KRU]. Under such circumstances, it is highly unlikely that such
a discovery would be made known. There is no way to really know this
for sure, unless you are one of the ones who has broken DES. The
closest thing that I have found to an open admission of breaking DES
is a story of the FBI successfully decrypting a file of drug transaction
records that were encrypted on a PC using a DES board. [MCL] The DES
board that the criminal used has an algorithm to generate a key from
a word or phrase. By an exhaustive search of an English dictionary
and key names from the criminal's family & friends using a supercomputer,
the file was solved. This indicates some weakness in DES, but even
more weakness in the way the key was chosen. The key was not chosen
randomly, so the FBI's job was much easier.
DES is subject to attacks that require precomputation that could tie
up a supercomputer for a few years, after which it would take only
a few days to solve a DES cryptogram. This is becoming less of a
barrier as the price of computers drops and the speed and storage
capacity of computers increase.
The U. S. National Security Agency (NSA) is acting to release some of its
own algorithms for ``sensitive but unclassified'' applications, such
as communications between government contractors. The NSA also releases
some classified algorithms to chip manufacturers for use in classified
devices, but under strict controls [ULT]. This will take some of
the load off of the DES algorithm, but there is a catch. They intend
to keep the algorithms secret and control the keys. Since the security
of the algorithm is dependent on the key in a way that only the NSA knows,
this gives the NSA the exclusive ability to read everybody's' communications. That
is not all bad, as it discourages the use of one of those systems
by someone engaged in spying or other illegal activity. Unfortunately,
these systems are not an option for protection of a corporations proprietary
data that may have a great deal to do with profits and losses, but
are not really in the domain of the NSA.
There are other alternatives to DES now, but none of them in the public
domain are even as good, let alone better, for general cryptography. For
example, RSA encryption has great advantages in the authentication
of digital signatures, but the complications of selecting good keys
and the fact that the security of RSA relies heavily on the failure
of the state of the art in mathematics to progress makes it at least
inconvenient to use and at most insecure.
Because of the above considerations, it is my intentions to suggest
a better algorithm (MPJ) for general use in the private sector. Although
the MPJ algorithm might be useful for government applications, too,
the design requirements for the two areas of application vary [CHA]
and the latter is kept shrouded in secrecy, so we shall leave it to
the NSA.
B. Approach
First, the problem is defined. As explained in the above section,
the basic problem is to create an algorithm to supplement DES. Design
criteria are then conceived such that an algorithm that meets those
criteria will solve the problem as defined. The design criteria chosen
for the MPJ algorithm are discussed in chapter VI. To avoid repetition
of one or more of the many mistakes that have been made throughout
history with cryptography, it is, of course, necessary to research
what was done in the past, both distant and recent. From this, many
good ideas can be gleaned that apply to the problem at hand. These
ideas, along with a knowledge of current technology are then applied
to design criteria with a bit of creativity and a lot of hard work
to define the new algorithm.
In the process of doing all of the above, it became apparent that
the security of encrypted data was vastly improved by first removing
as much redundancy from the data as possible. Therefore, as sort
of a bonus, a linguistic approach to data compression is also presented
that can either be used in conjunction with the MPJ encryption algorithm,
another encryption method, or just by itself for the savings that
it gains in communications channel capacity and/or data storage space.
II. HISTORY OF CRYPTOGRAPHY
One of the best works on the history of cryptography is <MI>The Codebreakers,
by David Kahn [KAH]. In that book, David Kahn discusses cryptography
from the prehistory to World War II. The first codes and ciphers
were in written form, and used to protect the privacy of communications
sent by courier or mail through hostile or unknown territory. Some
of these were reasonably good, but most were not difficult to break
using manual methods, provided that the interceptor had sufficient
cipher text and perhaps some probable text. The use of radio, especially
by the military, increased the need for cryptography, as well as increasing
the rewards for those who could break the encryption schemes in use. Kahn's
documentation of the efforts of those who broke some very complex
encryption schemes, like the German Enigma and the Japanese Purple
ciphers, lend great insight to the kind of process cryptanalysis really
is.
Kahn points out the kinds of mistakes the inventors and users of cryptographic
algorithms tend to make that reduce the security of their communications. For
example, German users of Enigma tended to choose a three-letter indicator
for their messages that consisted of three consecutive letters on
the keyboard. This substantially reduced the number of keys that
had to be searched to determine the one that they were using. While
the designer of an algorithm may calculate the great number of combinations
of keys that there are, the cryptanalyst looks at ways to isolate
parts of the key so that the difficulty of a solution is much less
than the size of the key space indicates. The difference in mind
set between the concealer of secrets and the one who prys into them
has caused many an inventor of an encryption algorithm to be overconfident.
The job of the cryptanalyst is a tedious one. He tries all kinds
of things to try to unscramble the cipher text in front of him. Sometimes
the search is fruitless. Sometimes the search yields something that
looks like a meaningful language. It is this ability to recognize
a meaningful message when it comes out of the various operations that
the cryptanalyst tries that makes the whole process possible. It
is also helpful for the cryptanalyst to know some probable plain text
that is contained in a message. This is almost always the case. For
example, military messages even now have a very stereotyped format,
with the from, to, and date indicators in the same places in the message.
The cryptanalyst almost always knows what language to expect a message
to be written in, and this is a great help. Natural languages contain
a great deal of redundancy. A message that is only 90% recovered
is usually readable. Natural languages also have very consistent
statistical properties that are very useful in cryptanalysis, especially
when the cryptanalysis is automated. The only time that these things
don't help the cryptanalyst is in the ``one-time pad.''
III. ELEMENTS OF ENCRYPTION
Several basic elements make up the basis of a multitude of possible
encryption algorithms, either alone or in combination [BEK][BAL]. Although
most of these elements may be used to form the basis of an encryption
method all by itself, the most secure methods of encryption will several
of them. For example, substitution and permutation are basic elements
of both DES and MPJ encryption algorithms. These algorithms, in turn,
are best used with some form of feedback.
A. Substitution
A substitution cipher simply substitutes one symbol from the plain
text alphabet for another one from a cipher text alphabet. The plain
text alphabet can be a natural language alphabet, ASCII, EBCDIC, Baudot,
the set {0,1}, or any other finite set. Cipher text alphabets, likewise,
may be any finite set. To be able to easily apply computer methods
to the manipulation of these alphabets, it is preferable to use alphabets
that are groups of binary digits (bits). This is not a severe limitation,
since a correspondence can be set up between any finite set and a
set of binary numbers.
1. Monoalphabetic
A monoalphabetic substitution cipher is one in which each letter of
the plain text alphabet is always substituted for by the same cipher
text alphabet. The cipher text alphabet need not be in any particular
order. A subset of the monoalphabetic substitution cipher is the
Caesar cipher. This cipher replaces each letter with a letter that
is n letters later in the alphabet. This cipher is so named because
it was reportedly first used by Caesar. [KAH] With only 26 possible
keys, this is obviously not very secure. In fact, any substitution
cipher that substitutes one letter for another is vulnerable to solution
by frequency analysis, and can be solved, given enough cipher text
with the same key. How much cipher text is enough depends on the
nature of the plain text, but for plain English, as many characters
as there are letters in the alphabet is sufficient for the probabilities
of occurrences of letters to betray enough letter identities to make
solution probable. While ciphers of this type no longer have any
value for serious cryptography, they do make fun puzzles for elementary
school children. For example, the following cryptogram is presented
for your solution:
UIJT JT B DBFTBS TVCTUJUVUO DJQIFS/ XPVME ZPV USVTU VPVS TFDSFST
UP JU@
Substitution ciphers can be made more secure by operating on substitutions
of more than one letter at a time. For example, the substitution
could be made on digraphs or trigraphs (groups of two or three letters). A
more practical example is the Data Encryption Standard. DES is actually
a substitution cipher with an effective alphabet size of 2^64,
or about 1.84 x 10^19, and the MPJ Encryption Algorithm is a
substitution cipher with an effective alphabet size of 2^128
or about 3.4 x 10^38. Not only is it difficult to get that
large of a sample to analyze for statistics, but the memory required
to store the substitution table is impractical even for computer systems
using large optical storage disks.
2. Polyalphabetic
A polyalphabetic substitution cipher is one in which each letter of
the plain text alphabet may be replaced a letter from one of many
cipher alphabets. The selection of which alphabet to take the substitution
from may be determined in many ways. The most usual method is by
selecting the cipher alphabet by a combination of a message indicator
and the number of characters from the beginning of the message. The
classic example of this type of cipher is the rotor-based machine. The
substitution occurs by physical wire crossings within rotors. After
each character is enciphered, one or more rotors are rotated by a
ratchet mechanism. The message indicator, which is part of the key,
determines the starting position of the rotors. This effectively
allows the use of more cipher alphabets than there are letters in
the message. Because of this, the use of statistical analysis on
any one message to guess at the substitutions is useless. There is,
however a good way to attack this kind of cipher when it is used heavily.
This is by analyzing the statistics of the first character in each
message, then the second, etc. This is called analyzing the messages
``in depth.''
Although the rotor-based polyalphabetic ciphers used by Germany and
Japan in World War II contributed greatly to their losses because
their messages were read regularly by the allies, it is possible to
create a more secure polyalphabetic substitution with computer technology.
The primary weakness of the mechanical rotor machines was the infrequent
changing of many of the parts of the key such as rotor wiring and
ratchet construction. A more general method of using a different
set of alphabets for each message would be very secure. Unfortunately,
it would result in keys larger than the message. Since the minimum
key length for absolute, provable cryptographic security, as determined
by C. E. Shannon [SHA], is exactly the length of the message, this
solution is not very efficient with respect to key management.
B. Permutation
Permutation is taking the same symbols and writing them in a different
order to confuse an interceptor. There are many ways of doing this
geometrically. For example, words may be written vertically then transcribed
horizontally, or they may be written on a strip of paper wrapped along
the length of a rod of a given diameter, then unwrapped for delivery.
In general, a permutation operation may be considered as an operation
on a block of symbols, where each symbol's position is changed to
another position within the block according to some rule. The blocks
may overlap each other if the permutation is undone from the end of
the message to the beginning.
The rule(s) determining how the permutation is done may be fixed,
or they may depend in some way on a key. Although historical permutation
ciphers tended to be rather simple, computer methods allow them to
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -