📄 phelix.tex
字号:
\documentclass[10pt]{llncs}
\usepackage{amsmath}
\usepackage{amssymb}
\usepackage{graphicx}
\usepackage{xspace}
\usepackage{url}
\usepackage[latin1]{inputenc}
\usepackage[a4paper,dvips,hmargin=4cm,vmargin=3.5cm,twosideshift=0cm]{geometry}
\setcounter{topnumber}{4}
\renewcommand{\topfraction}{0.9}
\setcounter{bottomnumber}{4}
\renewcommand{\bottomfraction}{0.9}
\setcounter{totalnumber}{4}
\renewcommand{\textfraction}{0.1}
\setcounter{dbltopnumber}{4}
\renewcommand{\dbltopfraction}{0.9}
\newcommand{\purl}{\protect\url}
\newcommand{\xor}{\ifmmode\oplus\else\textsc{xor}\xspace\fi}
\newcommand{\xorComma}{\textsc{xor},\xspace}
\newcommand{\xored}{\textsc{xor}ed\xspace}
\newcommand{\xoring}{\textsc{xor}ing\xspace}
\newcommand{\xors}{\textsc{xor}s\xspace}
\long\def\comment#1{} \pagestyle{plain}
\title{Phelix}
\subtitle{Fast Encryption and Authentication\\
in a Single Cryptographic Primitive}
\author{%
Doug Whiting\inst{1} \and %
Bruce Schneier\inst{2} \and %
Stefan Lucks\inst{3} \and
Fr閐閞ic Muller\inst{4} %
}
\institute{ %
HiFn, \purl{dwhiting@hifn.com} \and %
Counterpane Internet Security, \purl{schneier@counterpane.com} \and %
Universit鋞 Mannheim, \purl{lucks@th.informatik.uni-mannheim.de} \and%
DCSSI Crypto Lab, \purl{frederic.muller@sgdn.pm.gouv.fr}
}
\date{}
\begin{document}
\maketitle
\begin{abstract}
Phelix\footnote{Pronounced ``felix'' (rhymes with ``helix'').} is a
high-speed stream cipher with a built-in MAC functionality. It is
efficient in both hardware and software. On current Pentium CPUs,
Phelix has a per-packet overhead of less than 900 clocks, plus a
per-byte cost well under 8 clocks per byte, comparing very favorably
with the best AES (encryption-only) implementations, even for small
packets.
\end{abstract}
Keywords: Stream cipher, MAC, authentication, encryption.
\section{Introduction}
Securing data in transmission is the most common real-life cryptographic
problem. Basic security services require both encryption and
authentication. This is almost always done using a symmetric
cipher---public-key systems are only used to set up symmetric keys---and a
Message Authentication Code (MAC).
The AES process provided a number of very good block cipher designs, as well
as a new block cipher standard. The cryptographic community learned a lot
during the selection process about the engineering criteria for a good
cipher. AES candidates were compared in performance and cost in many
different implementation settings. We learned more about the importance of
fast rekeying and tiny-memory implementations, the cost of S-boxes and
circuit-depth for hardware implementations, the slowness of multiplication
on some platforms, and other performance considerations.
The community also learned about the differences between cryptanalysis in theory
and cryptanalysis in practice. Many block cipher modes restrict the types
of attack that can be performed on the underlying block cipher. Yet the
generally accepted attack model for block ciphers is very liberal. Any
method that distinguishes the block cipher from a random permutation is
considered an attack. Each block cipher operation must protect against all
types of attack. The resulting overengineering leads to inefficiencies.
Computer network properties like synchronization and error correction have
eliminated the traditional synchronization problems of stream-cipher modes
like OFB. Furthermore, stream ciphers have different implementation
properties that restrict the cryptanalyst. They only receive their inputs
once (a key and a nonce) and then produce a long stream of pseudorandom
data. A stream cipher can start with a strong cryptographic operation to
thoroughly mix the key and nonce into a state, and then use that state and a
simpler mixing operation to produce the keystream. If the attacker tries to
manipulate the inputs to the cipher he encounters the strong cryptographic
operation. Alternatively, he can analyze the keystream, but this is a static
analysis only. As far as we know, static attacks are much less powerful than
dynamic attacks. As there are fewer cryptographic requirements to fulfill,
we believe that the keystream generation function can be made significantly
faster, per message byte, than a block cipher can be. Given the suitability
of stream ciphers for many practical tasks and the potential for faster
implementations, we believe that stream ciphers are a fruitful area of
research.
Additionally, a stream cipher is often implemented---and from a
cryptographic point of view, should always be implemented---together with a
MAC. Encryption and authentication go hand in hand, and significant
vulnerabilities can result if encryption is implemented without
authentication. Outside the cryptographic literature, not using a proper MAC
is one of the commonly encountered errors in stream cipher systems. A stream
cipher with a built-in MAC is much more likely to be used correctly, because
it provides a MAC without the associated performance penalties.
Phelix is an attempt to combine all these lessons. It is closely related
to a prior cipher, Helix \cite{FWSKLK:Helix03}, proposed by some of the same authors.
\section{An Overview of Phelix}
Phelix is a combined stream cipher and MAC function, and directly provides
authenticated encryption functionality. By incorporating the plaintext
into the stream cipher state, Phelix can provide authentication
functionality without extra cost \cite{Golic:StreamModes01}.
The design strength of Phelix is 128 bits, which means we expect that no
attack on the cipher exists that requires fewer than $2^{128}$ Phelix block
function evaluations to be carried out. Phelix can process data in less than
7 clock cycles per byte on a Pentium M CPU, more than twice as fast as the
best known AES implementation.
Phelix uses a 256-bit key and a 128-bit nonce. The key is secret, and the
nonce is typically public knowledge. Phelix is optimized for 32-bit
platforms; all operations are on 32-bit words. The only operations used are
addition modulo $2^{32}$, exclusive or, and rotation by fixed numbers of
bits. The design philosophy of Phelix can be summarized as ``many simple
rounds.''
Phelix has a state that consists of nine words of 32 bits each. The state is
broken up into two groups: 5 ``active'' state words, which participate in the
block update function, and 4 ``old'' state words that are only used in the
keystream output function. A single round of Phelix consists of adding (or
\xoring) one active state word into the next, and rotating the first word.
This is shown in Figure~\ref{fig:round}, where the active state words are shown
as vertical lines.
%
\begin{figure}[tbp]
\begin{center}
\includegraphics{phelix.2}
\end{center}
\caption{A single round of Phelix}\label{fig:round}
\end{figure}
%
Multiple rounds are applied in a cyclical pattern to the active state. The
horizontal lines of the rounds wind themselves in helical fashion through
the five active state words. Twenty rounds make up one block (see
Figure~\ref{fig:block}).
%
\begin{figure}[tbp]
\begin{center}
\includegraphics[height=7.5in]{phelix.1}
\end{center}
\caption{One block of Phelix encryption}\label{fig:block}
\end{figure}
%
Phelix actually uses two intertwined helices; a single block contains two
full turns of each of the helices. The name Phelix was derived by a
contraction of sorts, from the fact that there are five ``strands'' in the
helices, so that the structure can be thought of as a penta-helix.
During each block, several other activities occur. During block $i$, one word
of keystream is generated ($S_i$), two words of key material are added
($X_{i,0}$ and $X_{i,1}$), and one word of plaintext is added ($P_i$). The
output state of one block is used as input to the next, so the computations
shown in figure~\ref{fig:block} are all that is required to process 4 bytes
of the message. As with any stream cipher, the ciphertext is created by
\xoring the plaintext with the keystream (not shown in the figure).
At the start of an encryption, a starting state is derived from the key and
nonce. The key words $X_{i,j}$ depend on the key, the length of the input
key, the nonce, and the block number $i$. State-guessing attacks are made
more difficult by adding key material at double the rate at which key stream
material is extracted. At the end of the message, some extra processing is
done, after which a 128-bit MAC tag is produced to authenticate the message.
The structure of Phelix is very similar to that of Helix \cite{FWSKLK:Helix03}.
In fact, the block
functions of the two ciphers are identical, except for the keystream output
function. The enlarged internal state of Phelix, along
with moving the keystream output point to force larger plaintext diffusion,
would seem to increase the security margin significantly. These changes were
made largely in response to \cite{Muller:Helix04}.
\section{Definition of Phelix}\label{sec:definition}
The Phelix encryption function takes as input a variable-length key $U$ of up
to 256 bits (32 bytes), a 128-bit (16-byte) nonce $N$, and a plaintext $P$. It produces a
ciphertext message and a tag that provides authentication. The decryption
function takes the key, nonce, ciphertext, and tag, and produces either the
plaintext message or an error if the authentication failed.
\subsection{Preliminaries}
Phelix operates on 32-bit words, while the inputs and outputs are 8-bit
bytes. In all situations, Phelix uses the least-significant-byte-first
convention. A sequence of bytes $x_i$ is identified with a sequence of words
$X_j$ by the relations
\begin{flalign*}
&&X_j &:= \sum_{k=0}^3 x_{(4j+k)} \cdot 2^{8k} & x_i &:= \left\lfloor
\frac{X_{\lfloor i/4 \rfloor}}{2^{8(i \bmod 4)}} \right\rfloor \bmod 2^8&&
\end{flalign*}
These two equations are complementary and show the conversion both ways.
Let $\ell(x)$ denote the length of a string of bytes $x$. The input key $U$
consists of a sequence of bytes $u_0, u_1, \ldots, u_{\ell(U)-1}$ with $0
\leq \ell(U) \leq 32$. The key is processed through the key mixing function,
defined in section~\ref{sec:keymixing}, to produce the working key which
consists of 8 words $K_0,\ldots,K_7$.
The nonce $N$ consists of 16 bytes, interpreted as 4 words $N_0,\ldots,N_3$.
Applications which use shorter nonces should zero-pad their nonce
to the full 16-byte length.
The plaintext $P$ and ciphertext $C$ are both sequences of bytes of the same
length, with the restriction that $0 \leq \ell(P) < 2^{64}$. Both are
manipulated as a sequence of words, $P_i$ and $C_i$, respectively. The last
word of the plaintext and ciphertext might be only partially used. The
``extra'' plaintext bytes in the last word are taken to be zero. The ``extra''
ciphertext bytes are irrelevant and never used. Note that the cipher is
specified for zero-length plaintexts; in this case, no
data is encrypted and only a MAC is generated.
\subsection{A Block}
Phelix consists of a sequence of blocks. The blocks are numbered sequentially,
which assigns each block a unique number $i$. At the start of block $i$, the
active state consists of five words: $Z^{(i)}_0,\ldots,Z^{(i)}_4$; at the end
of the block, the active state consists of $Z^{(i+1)}_0,\ldots,Z^{(i+1)}_4$,
which forms the input to the next block with number $i+1$. Block $i$ also uses
as input two key words $X_{i,0}$ and $X_{i,1}$, the plaintext word $P_i$, and
a previous state word $Z^{(i-4)}_4$.
The complete block function is illustrated in figure~\ref{fig:block}. All values
are 32-bit words; exclusive or is denoted by $\oplus$, addition modulo
$2^{32}$ is denoted by $\boxplus$, and rotation by $\lll$. The block function
actually consists of two applications of a ``half-block'' function $H$, defined as:
\begin{tabbing}
\hspace{1.0in} \= Function \= $H(w_0,w_1,w_2,w_3,w_4,$ \= $K_0,K_1)$ \+ \\
Begin\= \+ \\
$ w_0 := w_0 \boxplus (w_3 \oplus K_0); $ \> $ w_3 := w_3 \lll 15; $ \\
$ w_1 := w_1 \boxplus \;\,w_4; $ \> $ w_4 := w_4 \lll 25; $ \\
$ w_2 := w_2 \oplus \;\,w_0; $ \> $ w_0 := w_0 \lll\;\:9; $ \\
$ w_3 := w_3 \oplus \;\,w_1; $ \> $ w_1 := w_1 \lll 10; $ \\
$ w_4 := w_4 \boxplus \;\,w_2; $ \> $ w_2 := w_2 \lll 17; $ \\
\\
$ w_0 := w_0 \oplus (w_3 \boxplus K_1);$\> $ w_3 := w_3 \lll 30; $ \\
$ w_1 := w_1 \oplus \;\,w_4; $ \> $ w_4 := w_4 \lll 13; $ \\
$ w_2 := w_2 \boxplus \;\,w_0; $ \> $ w_0 := w_0 \lll 20; $ \\
$ w_3 := w_3 \boxplus \;\,w_1; $ \> $ w_1 := w_1 \lll 11; $ \\
$ w_4 := w_4 \oplus \;\,w_2; $ \> $ w_2 := w_2 \lll\;\:5; $ \\
Return $(w_0,w_1,w_2,w_3,w_4);$ \- \\
End.
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -