📄 rfc1810.txt
字号:
Network Working Group J. TouchRequest for Comments: 1810 ISICategory: Informational June 1995 Report on MD5 PerformanceStatus of this Memo This memo provides information for the Internet community. This memo does not specify an Internet standard of any kind. Distribution of this memo is unlimited.Abstract MD5 is an authentication algorithm, which has been proposed as the default authentication option in IPv6. When enabled, the MD5 algorithm operates over the entire data packet, including header. This RFC addresses how fast MD5 can be implemented in software and hardware, and whether it supports currently available IP bandwidth. MD5 can be implemented in existing hardware technology at 256 Mbps, and in software at 87 Mbps. These rates cannot support current IP rates, e.g., 100 Mbps TCP and 130 Mbps UDP over ATM. If MD5 cannot support existing network bandwidth using existing technology, it will not scale as network speeds increase in the future. This RFC is intended to alert the IP community about the performance limitations of MD5, and to suggest that alternatives be considered for use in high speed IP implementations.Introduction MD5 is an authentication algorithm, which has been proposed as one authentication option in IPv6 [1]. RFC 1321 describes the MD5 algorithm and gives a reference implementation [3]. When enabled, the MD5 algorithm operates over the entire data packet, including header (with dummy values for volatile fields). This RFC addresses how fast MD5 can be implemented in software and hardware, and whether it supports currently available IP bandwidth. This RFC considers the general issue of checksumming and security at high speed in IPv6. IPv6 has no header checksum (which IPv4 has [5]), but proposes an authentication digest over the entire body of the packet (including header where volatile fields are zeroed) [1]. This RFC specifically addresses the performance of that authentication mechanism.Touch Informational [Page 1]RFC 1810 Report on MD5 Performance June 1995Measurements The performance of MD5 was measured. The code was an optimized version of the MD5 reference implementation from the RFC [3], and is available for anonymous FTP [7]. The following are the results of the performance test "md5 -t", modified to prohibit on-chip caching of the data block: 87 Mbps DEC Alpha (190 Mhz) 33 Mbps HP 9000/720 48 Mbps IBM RS/6000 7006 (PPC 601 @80 Mhz) 31 Mbps Intel i486/66 NetBSD 44 Mbps Intel Pentium/90 NeXTStep 52 Mbps SGI/IP-20 IRIX 5.2 37 Mbps Sun SPARC-10/51, SPARC-20/50 SunOS 4.1.3 57 Mbps Sun SPARC-20/71 SunOS 4.1.3 These rates do not keep up with currently available IP bandwidth, e.g., 100 Mbps TCP and 130 Mbps UDP over a Fore SBA-200 ATM host interface in a Sun SPARC-20/71. Values as high as 100 Mbps have been reported for the DEC Alpha (190 Mhz). These values reflect on-chip caching of the data. It is not clear at this time whether in-memory, off-chip cache, or on-chip cache performance measures are more relevant to IP performance.Analysis of the MD5 Algorithm The MD5 algorithm is a block-chained hashing algorithm. The first block is hashed with an initial seed, resulting in a hash. The hash is summed with the seed, and that result becomes the seed for the next block. When the last block is computed, it's "next-seed' value becomes the hash for the entire stream. Thus, the seed for block depends on both the hash and the seed of its preceding block. As a result, blocks cannot be hashed in parallel. Each 16-word (64-byte) block is hashed via 64 basic steps, using a 4-word intermediate hash, and collapsing the intermediate hash at the end. The 64 steps are 16 groups of 4 steps, one step per intermediate hash word. This RFC uses the following notation (as from RFC-1321 [3]): A,B,C,D intermediate hash words X[i] input data block T[i] sine table lookup << i rotate i bits F logical functions of 3 argsTouch Informational [Page 2]RFC 1810 Report on MD5 Performance June 1995 The subscripts to X, I, and << are fixed for each step, and are omitted here. There are four different logical functions, also omitted. Each 4-step group looks like: A = B + ((A + F(B,C,D) + X[i] + T[i]) << i) D = A + ((D + F(A,B,C) + X[i] + T[i]) << i) C = D + ((C + F(D,A,B) + X[i] + T[i]) << i) B = C + ((B + F(C,D,A) + X[i] + T[i]) << i) Note that this has the general form shown below. Due to the complexity of the function 'f', these equations cannot be transformed into a less serial set. A = f(D); B = f(A); C = f(B); D = f(C) Each steps is composed of two table lookups, one rotation, a 3- component logical operation, and 4 additions. The best parallelization possible leaves F(x,y,z) to the last step, waiting as long as possible for the result from the previous step. The resulting tree is shown below. (t0) B* C C D X T | | | | | | | | | | | | \/ \/ \ / t1 op op A + X T \ / \ / | | \ / \ / | | \/ \/ \ / t2 op + (t0) B* C C D A + \ / | | | | \ / \ / \ | | / \ / \ / \\// \/ t3 + t1 op + | \ / | \ / | \ / t4 << B* t2 + B* \ / \ / \ / << / \ / \ / t5 + t3 + | | | | | | A** A** Binary operation tree Optimized hardware treeTouch Informational [Page 3]RFC 1810 Report on MD5 Performance June 1995 This diagram assumes that each operation takes one unit time. The tree shows the items that depend on the previous step as B*, and the item that the next step depends on as A**. Sequences of the binary operation tree cannot be overlapped, but the optimized hardware tree can (by one time step). There are 4 steps processed per word of input, ignoring inter-block processing. The speed of the overall algorithm depends on how fast we can process these 4 steps, vs. the bandwidth of one word of input being processed. The binary tree takes 5 time units per step of the algorithm, and permits at best 3-way parallelism (at time t1). In software, this means it takes 5 * 4 = 20 instructions per word input. A computer capable of M MIPS can support a data bandwidth of M/20 * 32 Mbps, i.e., bits per second equal to 1.6x its MIPS rate. Therefore, a 100 MIPS machine can support a 160 Mbps stream. Parallel software rate in Mbps = 1.6 * MIPS rate This assumes that register reads and writes are overlapped with computation entirely. Without any parallelism, there are 8 operations per step, and 4 steps per word, so 32 operations per word, i.e., the data rate in Mbps would be identical to the MIPS rate:
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -