⭐ 欢迎来到虫虫下载站! | 📦 资源下载 📁 资源专辑 ℹ️ 关于我们
⭐ 虫虫下载站

📄 ideafast.txt

📁 加密解密算法的资料 加密解密算法的资料 加密解密算法的资料
💻 TXT
字号:
From: olson@umbc.edu (Bryan G. Olson)Newsgroups: sci.cryptSubject: A Quick IDEA, was: Speed of DES/IDEA implementationsDate: 7 Dec 1993 21:49:41 -0500A while ago I posted a message claiming a speed of 238,000bytes/sec for an implementation of IDEA on a 33Mh 486.  Below isan explanation and some code to show how it works.  The basictrick should be useful on many (but not all) processors.  Iexpect only those familiar with IDEA and its referenceimplementation will be able to follow the discussion.  See:Lai, Xueja and Massey, James L.  A Proposal for a New BlockEncryption Standard, Eurocrypt 90For those who have been asking for the code, sorry I keptputting it off.  I wanted to get it out of Turbo Pascalideal-mode, but I never had the time.Colin Plum wrote IDEA-386 code which is included in PGP2.3a and uses the same tricks.  I don't know who's isfaster, but I expect they will be very close.  Nowhere's how it's done.A major bottleneck in software IDEA is the mul() routine, whichis used 34 times per 64 bit block.  The routine performsmultiplication in the multiplicative group mod 2^16+1.  The twofactors are each in a 16 bit word, and the output is also in a 16bit word.  Note that 0 is not a member of the multiplicativegroup and 2^16 does not fit in 16 bits. We therefor use the 0word to represent 2^16.  Now group elements map one to one ontoall possible 16 bit words, since 2^16+1 is prime.Here is (essentially) the reference implementation from [Lai].unsigned mul( unsigned a, unsigned b ) {  long int p ;  long unsigned q ;    if( a==0 ) p= 0x00010001 - b ;    else if( b==0 ) p= 0x00010001 - a ;    else {        q= a*b;        p= (q & 0xffff) - (q>>16)        if( p<0 ) p= p + 0x00010001 ;      }    return (unsigned)(p & 0xffff) ;}Note the method of reducing a 32 bit word modulo 2^16-1.  Wesubtract the high word from the low word, and add the modulusback if the result is less than 0.  [Lai] contains a proof thatthis works, and you can convince yourself fairly easily.To speed up this routine, we note that the tests for a=0 and b=0will rarely be false.  With the possible exception of the first 2of the 34 multiplications, 0 should be no more likely than any ofthe other 65535 numbers.  Note that if (and only if) either a orb is 0 then q will also be 0, and we can check for this in oneinstruction if our processor sets a zero flag for multiplication(as the 68000 does but 80x86 does not).  Fortunately p will also be zero after the subtraction if and onlyif either a or b is 0.  Proof: r will be zero when the high orderword of q equals the low order word, and that happens when q isdivisible by 00010001 hex.  Since 00010001h = 2^16+1 is prime,this happens if either a or b is a multiple of 2^16+1, and 0 isthe only such multiple which will fit in a 16 bit word.The speed-up strategy is to proceed under the assumption that aand b are not 0, check to be sure in one instruction, andrecompute if the assumption was wrong.  Here's some 8086assembler code:    mov  ax, [a]    mul  [b]        ; ax is implied. q is now in DX AX    sub  ax, dx     ; mod 2^16+1     jnz  not0       ; Jump if neither op was 0. Usually taken.    mov  ax, 1      ; recompute result knowing one op is 0.    sub  ax, [a]    sub  ax, [b]    jmp  out        ; Just jump over adding the carry.not0:    adc  ax, 0      ; If r<0 add 1, otherwise do nothing.out:                ; Result is now in axNote that when r<0 we add 1 instead of 2^16+1 since the 2^16 partoverflows out of the result.  The "adc  ax, 0" does all the workof checking for a negative result and adding the modulus ifneeded.The multiplication takes 9 instructions, 4 of which are rarelyexecuted.  I believe similar tricks are possible on manyprocessors.  The one drawback to the check-after-multiply tacticis that we can't let the multiply overwrite the only copy of anoperand.Note that most software implementations of IDEA will run atslightly different speeds when 0's come up in the multiplyroutine.  The reference implementation is faster on 0, this oneis faster on non-zero.  This may be a problem for some real-timestuff, and also suggests an attack based on timing.Finally, below is an implementation of the complete encryptionfunction in 8086 assembler, to replace the cipher_idea() functionin PGP.  It takes the same parameters as the function from PGP,and uses the c language calling conventions.  I tested it usingthe debug features of the idea.c file in PGP.  You will need toadd segment/assume directives.  This version uses no global dataand should be reentrant.The handling of zero multipliers is outside the inner loop sothat a short conditional jump can loop back to the beginning. Forward conditional jumps are usually not taken and backwardjumps are usually taken, which is consistent with 586 branchprediction (or so I've heard).  Stalls where the output of oneinstruction is needed for the next seem unavoidable.Last I heard, IDEA was patent pending.  My code is up for grabs,although I would get a kick out being credited if you use it.On the other hand Colin's code is already tested and readyto assemble and link with PGP.--Bryan____________________CODE STARTS BELOW THIS LINE_________;  Called as: asmcrypt( inbuff, outbuff, zkey ) just like PGPPROC    _asmcrypt        ; establish parameter and local space on stack        ; follow c language calling conventions        ARG  inblock:Word, outblock:Word, zkey:Word        LOCAL sx1:Word,sx4:Word,skk:Word,done8:Word =stacksize        push bp        mov  bp, sp        sub  sp, stacksize ;      push ax     ; My compiler assumes these are not saved. ;      push bx ;      push cx ;      push dx        push si        push di; Put the 16 bit sub-blocks in registers and/or local variables        mov  si, [inblock]        mov  ax, [si]        mov  [sx1], ax       ; x1  is in ax and sx1        mov  di, [si+2]      ; x2  is in di        mov  bx, [si+4]      ; x3  is in bx        mov  dx, [si+6]        mov  [sx4], dx       ; x4  is in sx4        mov  si, [zkey]      ; si points to next subkey        mov  [done8], si        add  [done8], 96     ; we will be finished with 8 rounds                             ; when si=done8@@loop:                      ; 8 rounds of this        add  di, [si+2]      ; x2+=zkey[2]  is in di        add  bx, [si+4]      ; x3+=zkey[4]  is in bx        mul  [Word si]       ;x1 *= zkey[0]        sub  ax, dx        jz  @@x1             ; if 0, use special case multiply        adc  ax, 0@@x1out:        mov  [sx1], ax       ; x1 is in ax and sx1        xor  ax, bx          ; ax= x1^x3        mul  [Word si+8]     ; compute kk        sub  ax, dx          ; if 0, use special case multiply        jz  @@kk        adc  ax, 0@@kkout:        mov  cx, ax          ; kk is in cx        mov  ax, [sx4]       ; x4 *= zkey[6]        mul  [Word si+6]        sub  ax, dx        jz   @@x4            ; if 0, use special case multiply        adc  ax, 0@@x4out:        mov  [sx4], ax       ; x4 is in sx4 and ax        xor  ax, di          ; x4^x2        add  ax, cx          ; kk+(x2^x4)        mul  [Word si+10]    ; compute t1        sub  ax, dx        jz  @@t1             ; if 0, use special case multiply        adc  ax, 0@@t1out:                     ; t1 is in ax        add  cx, ax          ; t2 is in cx   kk+t1        xor  [sx4], cx       ; x4 in sx4        xor  di, cx          ; new x3 in di        xor  bx, ax          ; new x2 in bx        xchg bx, di          ; x2 in di, x3 in bx        xor  ax, [sx1]       ; x1 in ax        mov  [sx1], ax       ; and [sx1]        add  si, 12          ; point to next subkey        cmp  si, [done8]        jne  @@loop        jmp  @@out8;------------------------------------------; Special case multiplications, when one factor is 0@@x1:   mov  ax, 1        sub  ax, [sx1]        sub  ax, [Word si]        jmp  @@x1out@@kk:   mov  ax, [sx1]       ; rebuild overwritten operand        xor  ax, bx        neg  ax        inc  ax        sub  ax, [si+8]        jmp  @@kkout@@x4:   mov  ax, 1        sub  ax, [sx4]        sub  ax, [Word si+6]        jmp  @@x4out@@t1:   mov  ax, [sx4]       ; rebuild        xor  ax, di        add  ax, cx        neg  ax        inc  ax        sub  ax, [si+10]        jmp  @@t1out;---------------------------------------------------;   8 rounds are done, now that extra pseudo-round@@out8:        push di        mov  di, [outblock]        mul  [Word si]        sub  ax, dx        jnz  @@o1n           ; jump over special case code        mov  ax, 1        sub  ax, [sx1]        sub  ax, [si]        jmp  @@o1out@@o1n:  adc  ax, 0@@o1out:  mov [di], ax       ; final ciphertext block 1        mov  ax, [sx4]        mul  [Word si+6]        sub  ax, dx        jnz  @@o4n           ; jump over special case code        mov  ax, 1        sub  ax, [sx4]        sub  ax, [si+6]        jmp  @@o4out@@o4n:  adc  ax, 0@@o4out: mov  [di+6], ax     ; final ciphertext block 4        add  bx, [si+2]        mov  [di+2], bx      ; final ciphertext block 2        pop  ax        add  ax, [si+4]        mov  [di+4], ax      ; final ciphertext block 3;  Restore the stack and return        pop  di        pop  si;       pop  dx;       pop  cx;       pop  bx;       pop  ax        mov  sp, bp        pop  bp        retENDP    _asmcrypt

⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -