📄 pnggccrd.c
字号:
/* pnggccrd.c - mixed C/assembler version of utilities to read a PNG file * * For Intel x86 CPU (Pentium-MMX or later) and GNU C compiler. * * See http://www.intel.com/drg/pentiumII/appnotes/916/916.htm * and http://www.intel.com/drg/pentiumII/appnotes/923/923.htm * for Intel's performance analysis of the MMX vs. non-MMX code. * * libpng 1.0.7 - July 1, 2000 * For conditions of distribution and use, see copyright notice in png.h * Copyright (c) 1998, 1999, 2000 Glenn Randers-Pehrson * Copyright (c) 1998, Intel Corporation * * Based on MSVC code contributed by Nirav Chhatrapati, Intel Corp., 1998. * Interface to libpng contributed by Gilles Vollant, 1999. * GNU C port by Greg Roelofs, 1999. * * Lines 2350-4300 converted in place with intel2gas 1.3.1: * * intel2gas -mdI pnggccrd.c.partially-msvc -o pnggccrd.c * * and then cleaned up by hand. See http://hermes.terminal.at/intel2gas/ . * * NOTE: A sufficiently recent version of GNU as (or as.exe under DOS/Windows) * is required to assemble the newer MMX instructions such as movq. * For djgpp, see * * ftp://ftp.simtel.net/pub/simtelnet/gnu/djgpp/v2gnu/bnu281b.zip * * (or a later version in the same directory). For Linux, check your * distribution's web site(s) or try these links: * * http://rufus.w3.org/linux/RPM/binutils.html * http://www.debian.org/Packages/stable/devel/binutils.html * ftp://ftp.slackware.com/pub/linux/slackware/slackware/slakware/d1/ * binutils.tgz * * For other platforms, see the main GNU site: * * ftp://ftp.gnu.org/pub/gnu/binutils/ * * Version 2.5.2l.15 is definitely too old... *//* * NOTES (mostly by Greg Roelofs) * ===== * * 19991006: * - fixed sign error in post-MMX cleanup code (16- & 32-bit cases) * * 19991007: * - additional optimizations (possible or definite): * x [DONE] write MMX code for 64-bit case (pixel_bytes == 8) [not tested] * - write MMX code for 48-bit case (pixel_bytes == 6) * - figure out what's up with 24-bit case (pixel_bytes == 3): * why subtract 8 from width_mmx in the pass 4/5 case? * (only width_mmx case) * x [DONE] replace pixel_bytes within each block with the true * constant value (or are compilers smart enough to do that?) * - rewrite all MMX interlacing code so it's aligned with * the *beginning* of the row buffer, not the end. This * would not only allow one to eliminate half of the memory * writes for odd passes (i.e., pass == odd), it may also * eliminate some unaligned-data-access exceptions (assuming * there's a penalty for not aligning 64-bit accesses on * 64-bit boundaries). The only catch is that the "leftover" * pixel(s) at the end of the row would have to be saved, * but there are enough unused MMX registers in every case, * so this is not a problem. A further benefit is that the * post-MMX cleanup code (C code) in at least some of the * cases could be done within the assembler block. * x [DONE] the "v3 v2 v1 v0 v7 v6 v5 v4" comments are confusing, * inconsistent, and don't match the MMX Programmer's Reference * Manual conventions anyway. They should be changed to * "b7 b6 b5 b4 b3 b2 b1 b0," where b0 indicates the byte that * was lowest in memory (e.g., corresponding to a left pixel) * and b7 is the byte that was highest (e.g., a right pixel). * * 19991016: * - Brennan's Guide notwithstanding, gcc under Linux does *not* * want globals prefixed by underscores when referencing them-- * i.e., if the variable is const4, then refer to it as const4, * not _const4. This seems to be a djgpp-specific requirement. * Also, such variables apparently *must* be declared outside * of functions; neither static nor automatic variables work if * defined within the scope of a single function, but both * static and truly global (multi-module) variables work fine. * * 19991023: * - fixed png_combine_row() non-MMX replication bug (odd passes only?) * - switched from string-concatenation-with-macros to cleaner method of * renaming global variables for djgpp--i.e., always use prefixes in * inlined assembler code (== strings) and conditionally rename the * variables, not the other way around. Hence _const4, _mask8_0, etc. * * 19991024: * - fixed mmxsupport()/png_do_interlace() first-row bug * This one was severely weird: even though mmxsupport() doesn't touch * ebx (where "row" pointer was stored), it nevertheless managed to zero * the register (even in static/non-fPIC code--see below), which in turn * caused png_do_interlace() to return prematurely on the first row of * interlaced images (i.e., without expanding the interlaced pixels). * Inspection of the generated assembly code didn't turn up any clues, * although it did point at a minor optimization (i.e., get rid of * mmx_supported_local variable and just use eax). Possibly the CPUID * instruction is more destructive than it looks? (Not yet checked.) * - "info gcc" was next to useless, so compared fPIC and non-fPIC assembly * listings... Apparently register spillage has to do with ebx, since * it's used to index the global offset table. Commenting it out of the * input-reg lists in png_combine_row() eliminated compiler barfage, so * ifdef'd with __PIC__ macro: if defined, use a global for unmask * * 19991107: * - verified CPUID clobberage: 12-char string constant ("GenuineIntel", * "AuthenticAMD", etc.) placed in EBX:ECX:EDX. Still need to polish. * * 19991120: * - made "diff" variable (now "_dif") global to simplify conversion of * filtering routines (running out of regs, sigh). "diff" is still used * in interlacing routines, however. * - fixed up both versions of mmxsupport() (ORIG_THAT_USED_TO_CLOBBER_EBX * macro determines which is used); original not yet tested. * * 20000319: * - fixed a register-name typo in png_do_read_interlace(), default (MMX) case, * pass == 4 or 5, that caused visible corruption of interlaced images * * - When compiling with gcc, be sure to use -fomit-frame-pointer */#define PNG_INTERNAL#include "png.h"#if defined(PNG_ASSEMBLER_CODE_SUPPORTED) && defined(PNG_USE_PNGGCCRD)int mmxsupport(void);static int mmx_supported = 2;#ifdef PNG_USE_LOCAL_ARRAYSstatic const int png_pass_start[7] = {0, 4, 0, 2, 0, 1, 0};static const int png_pass_inc[7] = {8, 8, 4, 4, 2, 2, 1};static const int png_pass_width[7] = {8, 4, 4, 2, 2, 1, 1};#endif// djgpp and Win32 add their own underscores to global variables,// so define them without:#if defined(__DJGPP__) || defined(WIN32)# define _unmask unmask# define _const4 const4# define _const6 const6# define _mask8_0 mask8_0 # define _mask16_1 mask16_1 # define _mask16_0 mask16_0 # define _mask24_2 mask24_2 # define _mask24_1 mask24_1 # define _mask24_0 mask24_0 # define _mask32_3 mask32_3 # define _mask32_2 mask32_2 # define _mask32_1 mask32_1 # define _mask32_0 mask32_0 # define _mask48_5 mask48_5 # define _mask48_4 mask48_4 # define _mask48_3 mask48_3 # define _mask48_2 mask48_2 # define _mask48_1 mask48_1 # define _mask48_0 mask48_0 # define _FullLength FullLength# define _MMXLength MMXLength# define _dif dif#endif/* These constants are used in the inlined MMX assembly code. Ignore gcc's "At top level: defined but not used" warnings. */#ifdef __PIC__static int _unmask; // not enough regs when compiling with -fPIC, so...#endifstatic unsigned long long _mask8_0 = 0x0102040810204080LL;static unsigned long long _mask16_1 = 0x0101020204040808LL;static unsigned long long _mask16_0 = 0x1010202040408080LL;static unsigned long long _mask24_2 = 0x0101010202020404LL;static unsigned long long _mask24_1 = 0x0408080810101020LL;static unsigned long long _mask24_0 = 0x2020404040808080LL;static unsigned long long _mask32_3 = 0x0101010102020202LL;static unsigned long long _mask32_2 = 0x0404040408080808LL;static unsigned long long _mask32_1 = 0x1010101020202020LL;static unsigned long long _mask32_0 = 0x4040404080808080LL;static unsigned long long _mask48_5 = 0x0101010101010202LL;static unsigned long long _mask48_4 = 0x0202020204040404LL;static unsigned long long _mask48_3 = 0x0404080808080808LL;static unsigned long long _mask48_2 = 0x1010101010102020LL;static unsigned long long _mask48_1 = 0x2020202040404040LL;static unsigned long long _mask48_0 = 0x4040808080808080LL;static unsigned long long _const4 = 0x0000000000FFFFFFLL;//static unsigned long long _const5 = 0x000000FFFFFF0000LL; // NOT USEDstatic unsigned long long _const6 = 0x00000000000000FFLL;// These are used in the row-filter routines and should/would be local// variables if not for gcc addressing limitations.static png_uint_32 _FullLength;static png_uint_32 _MMXLength;static int _dif;void /* PRIVATE */png_read_filter_row_c(png_structp png_ptr, png_row_infop row_info, png_bytep row, png_bytep prev_row, int filter);#if defined(PNG_HAVE_ASSEMBLER_COMBINE_ROW)/* Combines the row recently read in with the previous row. This routine takes care of alpha and transparency if requested. This routine also handles the two methods of progressive display of interlaced images, depending on the mask value. The mask value describes which pixels are to be combined with the row. The pattern always repeats every 8 pixels, so just 8 bits are needed. A one indicates the pixel is to be combined; a zero indicates the pixel is to be skipped. This is in addition to any alpha or transparency value associated with the pixel. If you want all pixels to be combined, pass 0xff (255) in mask. *//* Use this routine for the x86 platform - it uses a faster MMX routine if the machine supports MMX. */void /* PRIVATE */png_combine_row(png_structp png_ptr, png_bytep row, int mask){ png_debug(1,"in png_combine_row_asm\n"); if (mmx_supported == 2) mmx_supported = mmxsupport();/*fprintf(stderr, "GRR DEBUG: png_combine_row() pixel_depth = %d, mask = 0x%02x, unmask = 0x%02x\n", png_ptr->row_info.pixel_depth, mask, ~mask);fflush(stderr); */ if (mask == 0xff) { png_memcpy(row, png_ptr->row_buf + 1, (png_size_t)((png_ptr->width * png_ptr->row_info.pixel_depth + 7) >> 3)); } /* GRR: add "else if (mask == 0)" case? * or does png_combine_row() not even get called in that case? */ else { switch (png_ptr->row_info.pixel_depth) { case 1: // png_ptr->row_info.pixel_depth { png_bytep sp; png_bytep dp; int s_inc, s_start, s_end; int m; int shift; png_uint_32 i; sp = png_ptr->row_buf + 1; dp = row; m = 0x80;#if defined(PNG_READ_PACKSWAP_SUPPORTED) if (png_ptr->transformations & PNG_PACKSWAP) { s_start = 0; s_end = 7; s_inc = 1; } else#endif { s_start = 7; s_end = 0; s_inc = -1; } shift = s_start; for (i = 0; i < png_ptr->width; i++) { if (m & mask) { int value; value = (*sp >> shift) & 0x1; *dp &= (png_byte)((0x7f7f >> (7 - shift)) & 0xff); *dp |= (png_byte)(value << shift); } if (shift == s_end) { shift = s_start; sp++; dp++; } else shift += s_inc; if (m == 1) m = 0x80; else m >>= 1; } break; } case 2: // png_ptr->row_info.pixel_depth { png_bytep sp; png_bytep dp; int s_start, s_end, s_inc; int m; int shift; png_uint_32 i; int value; sp = png_ptr->row_buf + 1; dp = row; m = 0x80;#if defined(PNG_READ_PACKSWAP_SUPPORTED) if (png_ptr->transformations & PNG_PACKSWAP)
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -