📄 howto_sse2.lyx
字号:
doublewords from the destination operand into four signed words in the destination register. If the signed value of a doubleword is larger or smaller than the range of a signed word, the value is saturated (in the case of an overflow to 7FFFh, and in the case of an underflow to 8000h).\layout ItemizePACKSSWB Pack Words into Bytes (Signed with Saturation)\layout StandardLatency : 2 Throughput : 1\layout StandardPurpose:\layout StandardPACKSSWB packs four signed words from the source operand and four signed words from the destination operand into eight signed bytes in the destination register. If the signed value of a word is larger or smaller than the range of a signed byte, the value is saturated (in the case of an overflow to 7Fh, and in the case of an underflow to 80h).\layout Standard\begin_inset Graphics filename images_mmx/NewPACKSS.gif lyxscale 65 scale 55 keepAspectRatio\end_inset \layout ItemizePACKUSWB Pack Words into Bytes (Unsigned with Saturation)\layout StandardLatency : 2 Throughput : 1\layout StandardPurpose:\layout StandardPACKUSWB packs and saturates four signed words of the source operand and four signed words of the destination operand into eight unsigned bytes. The result is written to the destination operand. If the signed value of a word is larger or smaller than the range of an unsigned byte, the value is saturated (in the case of an overflow to 0FFh, and in the case of an underflow to 00h).\layout Standard\begin_inset Graphics filename images_mmx/NewPACKUS.gif lyxscale 65 scale 55 keepAspectRatio\end_inset \layout ItemizePUNPCKHBW Unpack (interleave) High-order Bytes\layout StandardLatency : 2 Throughput : 1\layout StandardPurpose:\layout StandardPUNPCKHBW interleaves the four high-order bytes of the source operand and the four high-order bytes of the destination operand and writes them to the destination. When unpacking from a memory operand, the full 64-bit operand is accessed from memory. PUNPCKHBW uses only the high-order 32 bits.\layout StandardNote If the source operand is all zeros, this instruction converts bytes to unsigned words.\layout ItemizePUNPCKHDQ Unpack (interleave) High-order Doublewords\layout StandardLatency : 2 Throughput : 1\layout StandardPurpose:\layout StandardPUNPCKHDQ interleaves the high-order doubleword of the source operand and the high-order doubleword of the destination operand and writes them to the destination. When unpacking from a memory operand, the full 64-bit operand is accessed from memory. PUNPCKHDQ uses only the high-order 32 bits.\layout ItemizePUNPCKHWD Unpack (interleave) High-order Words\layout StandardLatency : 2 Throughput : 1\layout StandardPurpose:\layout StandardPUNPCKHWD interleaves the two high-order words of the source operand and the two high-order words of the destination operand and writes them to the destination. When unpacking from a memory operand, the full 64-bit operand is accessed from memory. PUNPCKHWD uses only the high-order 32 bits.\layout Standard\begin_inset Graphics filename images_mmx/NewPUNPCKH.gif lyxscale 65 scale 55 keepAspectRatio\end_inset \layout ItemizePUNPCKLBW Unpack (interleave) Low-order Bytes\layout StandardLatency : 2 Throughput : 1\layout StandardPurpose:\layout StandardPUNPCKLBW interleaves the four low-order bytes of the source operand and the four low-order bytes of the destination operand and writes them to the destination. When unpacking from a memory operand, only 32 bits are accessed.\layout ItemizePUNPCKLDQ Unpack (interleave) Low-order Doublewords\layout StandardLatency : 2 Throughput : 1\layout StandardPurpose:\layout StandardPUNPCKLDQ interleaves the low-order doubleword of the source operand and the low-order doubleword of the destination operand and writes them to the destination. When unpacking from a memory operand, only 32 bits are accessed.\layout ItemizePUNPCKLWD Unpack (interleave) Low-order Words\layout StandardLatency : 2 Throughput : 1\layout StandardPurpose:\layout StandardPUNPCKLWD interleaves the two low-order words of the source operand and the two low-order words of the destination operand and writes them to the destination. When unpacking from a memory operand, only 32 bits are accessed.\layout Standard\begin_inset Graphics filename images_mmx/NewPUNPCKL.gif lyxscale 65 scale 55 keepAspectRatio\end_inset \layout StandardSSE-\layout ItemizeCVTPS2PI mm,xmm/mem64 \layout StandardLatency : 7 Throughput : 2\layout StandardPurpose:\layout StandardCVTPS2PI converts two packed single-precision FP values from the source operand to two packed signed doublewords in the destination operand.\layout StandardThe destination operand is an MMX register. The source can be either an XMM register or a 64-bit memory location. If the source is a register, the input values are in the low quadword.\layout ItemizeCVTSD2SS xmm1,xmm2/mem64 \layout StandardPurpose:\layout StandardCVTSD2SS converts a double-precision FP value from the source perand to a single-precision FP value in the low doubleword of the estination operand. The upper 3 doublewords are left unchanged.\layout StandardThe destination operand is an XMM register. The source can be either an XMM register or a 64-bit memory location. If the source is a register, the input value is in the low quadword.\layout ItemizeCVTSI2SS xmm,r/m32 \layout StandardLatency : 11 Throughput : 2\layout StandardPurpose:\layout StandardCVTSI2SS converts a signed doubleword from the source operand to a single-precision FP value in the low doubleword of the destination operand. The upper 3 doublewords are left unchanged.\layout StandardThe destination operand is an XMM register. The source can be either a general purpose register or a 32-bit memory location.\layout ItemizeCVTSS2SI reg32,xmm/mem32 \layout StandardLatency : 8 Throughput : 2\layout StandardPurpose:\layout StandardCVTSS2SI converts a single-precision FP value from the source operand to a signed doubleword in the destination operand.\layout StandardThe destination operand is a general purpose register. The source can be either an XMM register or a 32-bit memory location. If the source is a register, the input value is in the low doubleword.\layout ItemizeCVTTPS2PI mm,xmm/mem64 \layout StandardLatency : 7 Throughput : 2\layout StandardPurpose:\layout StandardCVTTPS2PI converts two packed single-precision FP values in the source operand to two packed signed doublewords in the destination operand. If the result is inexact, it is truncated (rounded toward zero). If the source is a register, the input values are in the low quadword.\layout StandardThe destination operand is an MMX register. The source can be either an XMM register or a 64-bit memory location. If the source is a register, the input value is in the low quadword.\layout ItemizeCVTTSS2SI reg32,xmm/mem32\layout StandardLatency : 8 Throughput : 2\layout StandardPurpose:\layout StandardCVTTSS2SI converts a single-precision FP value in the source operand to a signed doubleword in the destination operand. If the result is inexact, it is truncated (rounded toward zero).\layout StandardThe destination operand is a general purpose register. The source can be either an XMM register or a 32-bit memory location. If the source is a register, the input value is in the low doubleword.\layout ItemizeUNPCKHPS xmm1,xmm2/m128 \layout StandardLatency : 6 Throughput : 2\layout StandardPurpose:\layout StandardUNPCKHPS performs an interleaved unpack of the high-order data elements of the source and destination operands, saving the result in xmm1. It ignores the lower half of the sources.\layout StandardThe operation of this instruction is:\layout Standarddst[31-0] = dst[95-64]\layout Standarddst[63-32] = src[95-64]\layout Standarddst[95-64] = dst[127-96]\layout Standarddst[127-96] = src[127-96]\layout ItemizeUNPCKLPS xmm1,xmm2/m128 \layout StandardLatency : 4 Throughput : 2\layout StandardPurpose:\layout StandardUNPCKLPS performs an interleaved unpack of the low-order data elements of the source and destination operands, saving the result in xmm1. It ignores the lower half of the sources.\layout StandardThe operation of this instruction is:\layout Standarddst[31-0] = dst[31-0]\layout Standarddst[63-32] = src[31-0]\layout Standarddst[95-64] = dst[63-32]\layout Standarddst[127-96] = src[63-32]\layout ItemizeSHUFPS xmm1,xmm2/m128,imm8 \layout StandardLatency : 6 Throughput : 2\layout StandardPurpose:\layout StandardSHUFPS moves two of the packed single-precision FP values from the destination operand into the low quadword of the destination operand; the upper quadword is generated by moving two of the single-precision FP values from the source operand into the destination. The select (third) operand selects which of the values are moved to the destination register.\layout StandardThe select operand is an 8-bit immediate: bits 0 and 1 select the value to be moved from the destination operand the low doubleword of the result, bits 2 and 3 select the value to be moved from the destination operand the second doubleword of the result, bits 4 and 5 select the value to be moved from the source operand the third doubleword of the result, and bits 6 and 7 select the value to be moved from the source operand to the high doubleword of the result. \layout StandardSSE2-\layout ItemizeCVTDQ2PS xmm, xmm\layout StandardLatency: 5 Throughput: 2\layout StandardPurpose:\layout StandardParallel convert signed dword integers to single precision floating point values\layout ItemizeCVTPS2DQ xmm, xmm\layout StandardLatency: 5 Throughput: 2\layout StandardPurpose:\layout StandardParallel convert single precision floating point values to signed dword integers\layout ItemizeCVTSD2SI r32, xmm \layout StandardLatency: 8 Throughput: 2\layout StandardPurpose:\layout StandardConvert double precision floating point value to integer qword (XMM only) \layout ItemizeCVTPS2PD xmm, xmm \layout StandardLatency: 10 Throughput: 4\layout StandardPurpose:\layout StandardParallel convert single precision floating point values to double precision floating point values\layout ItemizeCVTPD2PS xmm, xmm \layout StandardLatency: 10 Throughput: 2\layout StandardPurpose:\layout StandardParallel convert double precision floating point values to single precision floating point values\layout ItemizeCVTSS2SD xmm, xmm \layout StandardLatency: 14 Throughput: 3\layout StandardPurpose:\layout StandardConvert single precision floating point value to double precision floating point value\layout ItemizeCVTSD2SS xmm, xmm \layout StandardLatency: 16 Throughput: 4\layout StandardPurpose:\layout StandardConvert double precision floating point value to single precision floating point value\layout ItemizePUNPCKHQDQ xmm1, xmm2/m128 \layout StandardSource: an XMM register or 128-bit memory location.\layout StandardDestination: an XMM register\layout StandardLatency: 4 Throughput: 2\layout StandardPurpose:\layout StandardInterleaves the high quadword of the source operand and the high quadword of the destination operand and writes them to the destination register. The low quadwords of the source operands are ignored.\layout Standard\begin_inset Graphics filename images_sse2/NewPUNPCKLQDQ.gif lyxscale 55 scale 55 keepAspectRatio\end_inset \layout ItemizePUNPCKLQDQ xmm1, xmm2/m128 \layout StandardSource: an XMM register.\layout StandardDestination: an XMM register or 128-bit memory location.\layout StandardLatency: 4 Throughput: 2\layout StandardPurpose:\layout StandardInterleaves the low quadwords of the source operand and the low quadwords of the destination operand and writes them to the destination register. The high quadwords of the source operands are ignored.\layout Standard\begin_inset Graphics filename images_sse2/NewPUNPCKHQDQ.gif lyxscale 55 scale 55 keepAspectRatio\end_inset \layout ItemizePSHUFHW xmm1, xmm2/m128, imm8 \layout StandardSource: unsigned doubleword integer stored in the low doubleword of an MMX register \layout Standardor a 64-bit memory location, \layout Standardor two packed unsigned doubleword integers in the first (low) and third doublewords of an XMM register \layout Standardor an 128-bit memory location. \layout StandardDestination: an unsigned doubleword integer in the low doubleword an MMX register\layout Standardor two packed doubleword integers in the first and third doublewords of an XMM register. \layout StandardLatency: 2 Throughput: 2\layout StandardPurpose:
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -