⭐ 欢迎来到虫虫下载站! | 📦 资源下载 📁 资源专辑 ℹ️ 关于我们
⭐ 虫虫下载站

📄 howto_sse2.lyx

📁 软件无线电的平台
💻 LYX
📖 第 1 页 / 共 5 页
字号:
PADDUSW Add Unsigned with Saturation on Word\layout StandardLatency : 2 Throughput : 1\layout StandardPurpose:\layout StandardPADDUSW adds the unsigned words of the source operand to the unsigned words of the destination operand and stores the result in destination. If the result is larger than the range of an unsigned word (overflow), the value is saturated to 0FFFFh. If the result is smaller than the range of an unsigned word (underflow), the value is saturated to zero.\layout Standard\begin_inset Graphics	filename images_mmx/NewPADDS.gif	scale 55	keepAspectRatio\end_inset \layout StandardSSE-\layout StandardUsage : instruction destination,source\layout ItemizeADDPS xmm1,xmm2/m128\layout StandardLatency : 4 Throughput : 2\layout StandardPurpose:\layout StandardPerforms a SIMD add of the four packed single-precision floating-point values from the source operand (second operand) and the destination operand (first operand), and stores the packed single-precision floating-point results in the destination operand. The source operand can be an XMM register or a 128-bit memory location. The destination operand is an XMM register.\layout Standard\begin_inset Graphics	filename images_sse/NewPADDPS.gif	lyxscale 55	scale 55	keepAspectRatio\end_inset \layout ItemizeADDSS xmm1, xmm2/m32\layout StandardLatency : 4 Throughput : 2\layout StandardPurpose:\layout StandardAdds the low single-precision floating-point values from the source operand (second operand) and the destination operand (first operand), and stores the single-precision floating-point result in the destination operand. The source operand can be an XMM register or a 32-bit memory location. The destination operand is an XMM register. The three high-order doublewords of the destination operand remain unchanged.\layout Standard\begin_inset Graphics	filename images_sse/NewPADDSS.gif	lyxscale 65	scale 55	keepAspectRatio\end_inset \layout StandardSSE2-\layout ItemizeADDPD xmm, xmm\layout StandardLatency: 4 Throughput: 2 \layout StandardPurpose:\layout StandardParallel add double precision floating point values\layout ItemizeADDSD xmm, xmm \layout StandardLatency: 4 Throughput: 2\layout StandardPurpose:\layout StandardAdd double precision floating point value in lower 64 bits\layout SubsubsectionSubtraction\layout StandardMMX-\layout StandardUsage: instruction destination, source \layout StandardDestination: MMX register. \layout StandardSource: MMX register or 64-bit memory operand.\layout ItemizePSUBB Subtraction with Wrap-around on Byte\layout StandardLatency : 2 Throughput : 1\layout StandardPurpose:\layout StandardPSUBB subtracts the bytes of the source operand from the bytes of the destination operand and returns the result to destination. When the result is too large or too small to be represented in a byte, the result wraps around and the lower 8 bits are written to the destination register.\layout ItemizePSUBW Subtraction with Wrap-around on Word\layout StandardLatency : 2 Throughput : 1\layout StandardPurpose:\layout StandardPSUBW subtracts the words of the source operand from the words of the destination operand and returns the result to destination. When the result is too large or too small to be represented in a word, the result wraps around and the lower 16 bits are written to the destination register.\layout ItemizePSUBD Subtraction with Wrap-around on Doubleword\layout StandardLatency : 2 Throughput : 1\layout StandardPurpose:\layout StandardPSUBD subtracts the doublewords of the source operand from the doublewords of the destination operand and returns the result to destination. When the result is too large or too small to be represented in a doubleword, the result wraps around and the lower 32 bits are written to the destination register.\layout Standard\begin_inset Graphics	filename images_mmx/NewPSUB.gif	lyxscale 65	scale 55	keepAspectRatio\end_inset \layout ItemizePSUBSB Subtract Signed with Saturation on Byte\layout StandardLatency : 2 Throughput : 1\layout StandardPurpose:\layout StandardPSUBSB subtracts the signed bytes of the source operand from the signed bytes of the destination operand, and writes the results to the destination register. If the result is larger or smaller than the range of a signed byte, the value is saturated; in the case of an overflow - to 7Fh, and in the case of an underflow - to 80h.\layout ItemizePSUBSW Subtract Signed with Saturation on Word\layout StandardLatency : 2 Throughput : 1\layout StandardPurpose:\layout StandardPSUBSW subtracts the signed words of the source operand from the signed words of the destination operand, and writes the results to the destination register. If the result is larger or smaller than the range of a signed word, the value is saturated; in the case of an overflow - to 7FFFh, and in the case of an underflow - to 8000h.\layout ItemizePSUBUSB Subtract Unsigned with Saturation on Byte\layout StandardLatency : 2 Throughput : 1\layout StandardPurpose:\layout StandardPSUBUSB subtracts the bytes of the source operand from the bytes of the destination operand and writes the results to the destination register. If the result element is less than zero (a negative value), it is saturated to 00h.\layout ItemizePSUBUSW Subtract Unsigned with Saturation on Word\layout StandardLatency : 2 Throughput : 1\layout StandardPurpose:\layout StandardPSUBUSW subtracts the words of the source operand from the words of the destination operand and writes the results to the destination register. If the result element is less than zero (a negative value), it is saturated to 0000h.\layout Standard\begin_inset Graphics	filename images_mmx/NewPSUBS.gif	lyxscale 65	scale 55	keepAspectRatio\end_inset \layout StandardSSE-\layout ItemizeSUBPS xmm1 xmm2/m128 \layout StandardLatency : 4 Throughput : 2\layout StandardPurpose:\layout StandardPerforms a SIMD subtract of the four packed single-precision floating-point values in the source operand (second operand) from the four packed single-precision floating-point values in the destination operand (first operand), and stores the packed single-precision floating-point results in the destination operand. The source operand can be an XMM register or a 128-bit memory location. The destination operand is an XMM register.\layout Standard\begin_inset Graphics	filename images_sse/NewPSUBPS.gif	lyxscale 65	scale 55	keepAspectRatio\end_inset \layout ItemizeSUBSS xmm1, xmm2/m32\layout StandardLatency : 4 Throughput : 2\layout StandardPurpose:\layout StandardSubtracts the low single-precision floating-point value in the source operand (second operand) from the low single-precision floating-point value in the destination operand (first operand), and stores the single-precision floating-point result in the destination operand. The source operand can be an XMM register or a 32-bit memory location. The destination operand is an XMM register. The three high-order doublewords of the destination operand remain unchanged.\layout Standard\begin_inset Graphics	filename images_sse/NewPSUBSS.gif	lyxscale 65	scale 55	keepAspectRatio\end_inset \layout StandardSSE2-\layout ItemizeSUBPD xmm, xmm \layout StandardLatency: 4 Throughput: 2 \layout StandardPurpose:\layout StandardParallel subtract double precision floating point values\layout ItemizeSUBSD xmm, xmm \layout StandardLatency: 4 Throughput: 2 \layout StandardPurpose:\layout StandardSubtract double precision floating point value in lower 64 bits\layout SubsubsectionMultiplication\layout StandardMMX-\layout StandardUsage: instruction destination, source \layout StandardDestination: MMX register. \layout StandardSource: MMX register or 64-bit memory operand.\layout ItemizePMULHW Packed Multiply High on Words\layout StandardLatency : 8 Throughput : 1\layout StandardPurpose:\layout StandardThe PMULHW instruction multiplies the four signed words of the destination operand with the four signed words of the source operand. The high-order 16 bits of the 32-bit intermediate results are written to the destination operand.\layout ItemizePMULLW Packed Multiply Low on Words\layout StandardLatency : 8 Throughput : 1\layout StandardPurpose:\layout StandardThe PMULLW instruction multiplies the four signed or unsigned words of the destination operand with the four signed or unsigned words of the source operand. The low-order 16 bits of the 32-bit intermediate results are written to the destination operand.\layout Standard\begin_inset Graphics	filename images_mmx/NewPMUL.gif	lyxscale 65	scale 55	keepAspectRatio\end_inset \layout ItemizePMADDWD Packed Multiply and Add\layout StandardLatency : 8 Throughput : 1\layout StandardPurpose:\layout StandardThe PMADDWD instruction multiplies the four signed words of the destination operand by the four signed words of the source operand. The result is two 32-bit doublewords. The two high-order words are summed and stored in the upper doubleword of the destination operand. The two low-order words are summed and stored in the lower doubleword of the destination operand. This result is written to the destination operand. \layout StandardThe PMADDWD instruction wraps around to 80000000h only when all four words of both the source and destination operands are 8000h.\layout Standard\begin_inset Graphics	filename images_mmx/NewPMADD.gif	lyxscale 65	scale 55	keepAspectRatio\end_inset \layout StandardSSE-\layout ItemizeMULPS xmm1, xmm2/m128 \layout StandardLatency : 6 Throughput : 2\layout StandardPurpose:\layout StandardPerforms a SIMD multiply of the four packed single-precision floating-point values from the source operand (second operand) and the destination operand (first operand), and stores the packed single-precision floating-point results in the destination operand. The source operand can be an XMM register or a 128-bit memory location. The destination operand is an XMM register.\layout Standard\begin_inset Graphics	filename images_sse/NewPMULPS.gif	lyxscale 65	scale 55	keepAspectRatio\end_inset \layout ItemizeMULSS xmm1, xmm2/m32 \layout StandardLatency : 6 Throughput : 2\layout StandardPurpose:\layout StandardMultiplies the low single-precision floating-point value from the source operand (second operand) by the low single-precision floating-point value in the destination operand (first operand), and stores the single-precision floating-point result in the destination operand. The source operand can be an XMM register or a 32-bit memory location. The destination operand is an XMM register. The three high-order doublewords of the destination operand remain unchanged.\layout Standard\begin_inset Graphics	filename images_sse/NewPMULSS.gif	lyxscale 65	scale 55	keepAspectRatio\end_inset \layout StandardSSE2-\layout ItemizeMULPD xmm, xmm \layout StandardLatency: 6 Throughput: 2\layout StandardPurpose:\layout StandardParallel multiply double precision floating point values\layout ItemizeMULSS xmm, xmm \layout StandardLatency: 6 Throughput: 2\layout StandardPurpose:\layout StandardMultiply double precision floating point value in lower 64 bits\layout SubsectionComparison Instructions\layout StandardMMX-\layout StandardUsage: instruction destination, source \layout StandardDestination: MMX register. \layout StandardSource: MMX register or 64-bit memory operand.\layout ItemizePCMPEQB Packed Compare for Equal, Byte\layout StandardLatency : 2 Throughput : 1\layout StandardPurpose:\layout StandardPCMPEQB compares the bytes in the destination operand to the corresponding bytes in the source operand. If the data elements are equal, the corresponding data element in destination is set to all ones. If they are not equal, the corresponding data element is set to all zeros.

⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -