readme

来自「a very popular packet of cryptography to」· 代码 · 共 106 行

TXT

106 行

Copyright 1997, 1999, 2000, 2001, 2002 Free Software Foundation, Inc.This file is part of the GNU MP Library.The GNU MP Library is free software; you can redistribute it and/or modifyit under the terms of the GNU Lesser General Public License as published bythe Free Software Foundation; either version 2.1 of the License, or (at youroption) any later version.The GNU MP Library is distributed in the hope that it will be useful, butWITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITYor FITNESS FOR A PARTICULAR PURPOSE.  See the GNU Lesser General PublicLicense for more details.You should have received a copy of the GNU Lesser General Public Licensealong with the GNU MP Library; see the file COPYING.LIB.  If not, write tothe Free Software Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA02111-1307, USA.This directory contains mpn functions for 64-bit V9 SPARCRELEVANT OPTIMIZATION ISSUESNotation:  IANY = shift/add/sub/logical/sethi  IADDLOG = add/sub/logical/sethi  MEM = ld*/st*  FA = fadd*/fsub*/f*to*/fmov*  FM = fmul*UltraSPARC-1/2 can issue four instructions per cycle, with these restrictions:* Two IANY instructions, but only one of these may be a shift.  If there is a  shift and an IANY instruction, the shift must precede the IANY instruction.* One FA.* One FM.* One branch.* One MEM.* IANY/IADDLOG/MEM must be insn 1, 2, or 3 in an issue bundle.  Taken branches  should not be in slot 4, since that makes the delay insn come from separate  bundle.* If two IANY/IADDLOG instructions are to be executed in the same cycle and one  of these is setting the condition codes, that instruction must be the second  one.To summarize, ignoring branches, these are the bundles that can reach the peakexecution speed:insn1	iany	iany	mem	iany	iany	mem	iany	iany	meminsn2	iaddlog	mem	iany	mem	iaddlog	iany	mem	iaddlog	ianyinsn3	mem	iaddlog	iaddlog	fa	fa	fa	fm	fm	fminsn4	fa/fm	fa/fm	fa/fm	fm	fm	fm	fa	fa	faThe 64-bit integer multiply instruction mulx takes from 5 cycles to 35 cycles,depending on the position of the most significant bit of the first sourceoperand.  When used for 32x32->64 multiplication, it needs 20 cycles.Furthermore, it stalls the processor while executing.  We stay away from thatinstruction, and instead use floating-point operations.Integer conditional move instructions cannot dual-issue with other integerinstructions.  No conditional move can issue 1-5 cycles after a load.  (Orsomething such bizarre.)  Useless.The UltraSPARC-3 pipeline seems similar, but is somewhat more rigid.  Branchesexecute slower, and there may be other new stalls.  Integer multiply doesn'thalt the CPU and also has a much lower latency.  But it's still not pipelined,and thus useless for our needs.STATUS(Timings are for UltraSPARC-1/2.  UltraSPARC-3 is a few cycles slower.)* mpn_lshift, mpn_rshift: The current code runs at 2.0 cycles/limb.  The IEU0  functional unit is saturated with shifts.* mpn_add_n, mpn_sub_n: The current code runs at 4 cycles/limb.  The 4  instruction recurrency is the speed limiter.* mpn_addmul_1: The current code runs at 14 cycles/limb asymptotically.  The  code sustains 4 instructions/cycle.  It might be possible to invent a better  way of summing the intermediate 49-bit operands, but it is unlikely that it  will save enough instructions to save an entire cycle.  The load-use of the `rlimb' operand is not enough scheduled for good L2 cache  performance.  Since UltraSPARC-1/2 L1 cache is direct mapped, we miss to L2  very often.  The load-use of the std/ldx pairs via the stack are somewhat  over-scheduled.  It would be possible to save two instructions: (1) The `mov' could be avoided  if ths std/ldx were less scheduled.  (2) The ldx of `rlimb' could be split  into two `ld' instructions, saving the shifts/masks.* mpn_mul_1: The current code is a straightforward edit of the mpn_addmul_1  code.  It would be possible to shave one or two cycles from it, with some  labour.* mpn_submul_1: Braindead code just calling mpn_mul_1 + mpn_sub_n.  It would be  possible to either match the mpn_addmul_1 performance, or in the worst case  use one more instruction group.* mpn_Xmul_2: These could be made to run at 9 cycles/limb.  Straightforward  generalization of mpn_Xmul_1.

readme - 源码说明

本页面展示了「a very popular packet of cryptography tools,it encloses the most common used algorithm and protocols」中的 readme 源码文件，采用编程语言编写，共 106 行代码。您可以在线阅读完整代码内容，也可以返回资源详情页下载完整源码包进行本地学习和开发。

虫虫下载站收录了大量与cryptography相关的技术资源，包括源代码、技术文档、电路图等，是电子工程师和嵌入式开发者的专业学习平台。

⌨️ 快捷键说明

复制代码Ctrl + C

搜索代码Ctrl + F

全屏模式F11

增大字号Ctrl + =

减小字号Ctrl + -

显示快捷键?