📄 pnggccrd.c
字号:
/* pnggccrd.c - mixed C/assembler version of utilities to read a PNG file
*
* For Intel x86 CPU (Pentium-MMX or later) and GNU C compiler.
*
* See http://www.intel.com/drg/pentiumII/appnotes/916/916.htm
* and http://www.intel.com/drg/pentiumII/appnotes/923/923.htm
* for Intel's performance analysis of the MMX vs. non-MMX code.
*
* libpng version 1.2.8 - December 3, 2004
* For conditions of distribution and use, see copyright notice in png.h
* Copyright (c) 1998-2004 Glenn Randers-Pehrson
* Copyright (c) 1998, Intel Corporation
*
* Based on MSVC code contributed by Nirav Chhatrapati, Intel Corp., 1998.
* Interface to libpng contributed by Gilles Vollant, 1999.
* GNU C port by Greg Roelofs, 1999-2001.
*
* Lines 2350-4300 converted in place with intel2gas 1.3.1:
*
* intel2gas -mdI pnggccrd.c.partially-msvc -o pnggccrd.c
*
* and then cleaned up by hand. See http://hermes.terminal.at/intel2gas/ .
*
* NOTE: A sufficiently recent version of GNU as (or as.exe under DOS/Windows)
* is required to assemble the newer MMX instructions such as movq.
* For djgpp, see
*
* ftp://ftp.simtel.net/pub/simtelnet/gnu/djgpp/v2gnu/bnu281b.zip
*
* (or a later version in the same directory). For Linux, check your
* distribution's web site(s) or try these links:
*
* http://rufus.w3.org/linux/RPM/binutils.html
* http://www.debian.org/Packages/stable/devel/binutils.html
* ftp://ftp.slackware.com/pub/linux/slackware/slackware/slakware/d1/
* binutils.tgz
*
* For other platforms, see the main GNU site:
*
* ftp://ftp.gnu.org/pub/gnu/binutils/
*
* Version 2.5.2l.15 is definitely too old...
*/
/*
* TEMPORARY PORTING NOTES AND CHANGELOG (mostly by Greg Roelofs)
* =====================================
*
* 19991006:
* - fixed sign error in post-MMX cleanup code (16- & 32-bit cases)
*
* 19991007:
* - additional optimizations (possible or definite):
* x [DONE] write MMX code for 64-bit case (pixel_bytes == 8) [not tested]
* - write MMX code for 48-bit case (pixel_bytes == 6)
* - figure out what's up with 24-bit case (pixel_bytes == 3):
* why subtract 8 from width_mmx in the pass 4/5 case?
* (only width_mmx case) (near line 1606)
* x [DONE] replace pixel_bytes within each block with the true
* constant value (or are compilers smart enough to do that?)
* - rewrite all MMX interlacing code so it's aligned with
* the *beginning* of the row buffer, not the end. This
* would not only allow one to eliminate half of the memory
* writes for odd passes (that is, pass == odd), it may also
* eliminate some unaligned-data-access exceptions (assuming
* there's a penalty for not aligning 64-bit accesses on
* 64-bit boundaries). The only catch is that the "leftover"
* pixel(s) at the end of the row would have to be saved,
* but there are enough unused MMX registers in every case,
* so this is not a problem. A further benefit is that the
* post-MMX cleanup code (C code) in at least some of the
* cases could be done within the assembler block.
* x [DONE] the "v3 v2 v1 v0 v7 v6 v5 v4" comments are confusing,
* inconsistent, and don't match the MMX Programmer's Reference
* Manual conventions anyway. They should be changed to
* "b7 b6 b5 b4 b3 b2 b1 b0," where b0 indicates the byte that
* was lowest in memory (e.g., corresponding to a left pixel)
* and b7 is the byte that was highest (e.g., a right pixel).
*
* 19991016:
* - Brennan's Guide notwithstanding, gcc under Linux does *not*
* want globals prefixed by underscores when referencing them--
* i.e., if the variable is const4, then refer to it as const4,
* not _const4. This seems to be a djgpp-specific requirement.
* Also, such variables apparently *must* be declared outside
* of functions; neither static nor automatic variables work if
* defined within the scope of a single function, but both
* static and truly global (multi-module) variables work fine.
*
* 19991023:
* - fixed png_combine_row() non-MMX replication bug (odd passes only?)
* - switched from string-concatenation-with-macros to cleaner method of
* renaming global variables for djgpp--i.e., always use prefixes in
* inlined assembler code (== strings) and conditionally rename the
* variables, not the other way around. Hence _const4, _mask8_0, etc.
*
* 19991024:
* - fixed mmxsupport()/png_do_read_interlace() first-row bug
* This one was severely weird: even though mmxsupport() doesn't touch
* ebx (where "row" pointer was stored), it nevertheless managed to zero
* the register (even in static/non-fPIC code--see below), which in turn
* caused png_do_read_interlace() to return prematurely on the first row of
* interlaced images (i.e., without expanding the interlaced pixels).
* Inspection of the generated assembly code didn't turn up any clues,
* although it did point at a minor optimization (i.e., get rid of
* mmx_supported_local variable and just use eax). Possibly the CPUID
* instruction is more destructive than it looks? (Not yet checked.)
* - "info gcc" was next to useless, so compared fPIC and non-fPIC assembly
* listings... Apparently register spillage has to do with ebx, since
* it's used to index the global offset table. Commenting it out of the
* input-reg lists in png_combine_row() eliminated compiler barfage, so
* ifdef'd with __PIC__ macro: if defined, use a global for unmask
*
* 19991107:
* - verified CPUID clobberage: 12-char string constant ("GenuineIntel",
* "AuthenticAMD", etc.) placed in ebx:ecx:edx. Still need to polish.
*
* 19991120:
* - made "diff" variable (now "_dif") global to simplify conversion of
* filtering routines (running out of regs, sigh). "diff" is still used
* in interlacing routines, however.
* - fixed up both versions of mmxsupport() (ORIG_THAT_USED_TO_CLOBBER_EBX
* macro determines which is used); original not yet tested.
*
* 20000213:
* - when compiling with gcc, be sure to use -fomit-frame-pointer
*
* 20000319:
* - fixed a register-name typo in png_do_read_interlace(), default (MMX) case,
* pass == 4 or 5, that caused visible corruption of interlaced images
*
* 20000623:
* - Various problems were reported with gcc 2.95.2 in the Cygwin environment,
* many of the form "forbidden register 0 (ax) was spilled for class AREG."
* This is explained at http://gcc.gnu.org/fom_serv/cache/23.html, and
* Chuck Wilson supplied a patch involving dummy output registers. See
* http://sourceforge.net/bugs/?func=detailbug&bug_id=108741&group_id=5624
* for the original (anonymous) SourceForge bug report.
*
* 20000706:
* - Chuck Wilson passed along these remaining gcc 2.95.2 errors:
* pnggccrd.c: In function `png_combine_row':
* pnggccrd.c:525: more than 10 operands in `asm'
* pnggccrd.c:669: more than 10 operands in `asm'
* pnggccrd.c:828: more than 10 operands in `asm'
* pnggccrd.c:994: more than 10 operands in `asm'
* pnggccrd.c:1177: more than 10 operands in `asm'
* They are all the same problem and can be worked around by using the
* global _unmask variable unconditionally, not just in the -fPIC case.
* Reportedly earlier versions of gcc also have the problem with more than
* 10 operands; they just don't report it. Much strangeness ensues, etc.
*
* 20000729:
* - enabled png_read_filter_row_mmx_up() (shortest remaining unconverted
* MMX routine); began converting png_read_filter_row_mmx_sub()
* - to finish remaining sections:
* - clean up indentation and comments
* - preload local variables
* - add output and input regs (order of former determines numerical
* mapping of latter)
* - avoid all usage of ebx (including bx, bh, bl) register [20000823]
* - remove "$" from addressing of Shift and Mask variables [20000823]
*
* 20000731:
* - global union vars causing segfaults in png_read_filter_row_mmx_sub()?
*
* 20000822:
* - ARGH, stupid png_read_filter_row_mmx_sub() segfault only happens with
* shared-library (-fPIC) version! Code works just fine as part of static
* library. Damn damn damn damn damn, should have tested that sooner.
* ebx is getting clobbered again (explicitly this time); need to save it
* on stack or rewrite asm code to avoid using it altogether. Blargh!
*
* 20000823:
* - first section was trickiest; all remaining sections have ebx -> edx now.
* (-fPIC works again.) Also added missing underscores to various Shift*
* and *Mask* globals and got rid of leading "$" signs.
*
* 20000826:
* - added visual separators to help navigate microscopic printed copies
* (http://pobox.com/~newt/code/gpr-latest.zip, mode 10); started working
* on png_read_filter_row_mmx_avg()
*
* 20000828:
* - finished png_read_filter_row_mmx_avg(): only Paeth left! (930 lines...)
* What the hell, did png_read_filter_row_mmx_paeth(), too. Comments not
* cleaned up/shortened in either routine, but functionality is complete
* and seems to be working fine.
*
* 20000829:
* - ahhh, figured out last(?) bit of gcc/gas asm-fu: if register is listed
* as an input reg (with dummy output variables, etc.), then it *cannot*
* also appear in the clobber list or gcc 2.95.2 will barf. The solution
* is simple enough...
*
* 20000914:
* - bug in png_read_filter_row_mmx_avg(): 16-bit grayscale not handled
* correctly (but 48-bit RGB just fine)
*
* 20000916:
* - fixed bug in png_read_filter_row_mmx_avg(), bpp == 2 case; three errors:
* - "_ShiftBpp.use = 24;" should have been "_ShiftBpp.use = 16;"
* - "_ShiftRem.use = 40;" should have been "_ShiftRem.use = 48;"
* - "psllq _ShiftRem, %%mm2" should have been "psrlq _ShiftRem, %%mm2"
*
* 20010101:
* - added new png_init_mmx_flags() function (here only because it needs to
* call mmxsupport(), which should probably become global png_mmxsupport());
* modified other MMX routines to run conditionally (png_ptr->asm_flags)
*
* 20010103:
* - renamed mmxsupport() to png_mmx_support(), with auto-set of mmx_supported,
* and made it public; moved png_init_mmx_flags() to png.c as internal func
*
* 20010104:
* - removed dependency on png_read_filter_row_c() (C code already duplicated
* within MMX version of png_read_filter_row()) so no longer necessary to
* compile it into pngrutil.o
*
* 20010310:
* - fixed buffer-overrun bug in png_combine_row() C code (non-MMX)
*
* 20020304:
* - eliminated incorrect use of width_mmx in pixel_bytes == 8 case
*
* 20040724:
* - more tinkering with clobber list at lines 4529 and 5033, to get
* it to compile on gcc-3.4.
*
* STILL TO DO:
* - test png_do_read_interlace() 64-bit case (pixel_bytes == 8)
* - write MMX code for 48-bit case (pixel_bytes == 6)
* - figure out what's up with 24-bit case (pixel_bytes == 3):
* why subtract 8 from width_mmx in the pass 4/5 case?
* (only width_mmx case) (near line 1606)
* - rewrite all MMX interlacing code so it's aligned with beginning
* of the row buffer, not the end (see 19991007 for details)
* x pick one version of mmxsupport() and get rid of the other
* - add error messages to any remaining bogus default cases
* - enable pixel_depth == 8 cases in png_read_filter_row()? (test speed)
* x add support for runtime enable/disable/query of various MMX routines
*/
#define PNG_INTERNAL
#include "png.h"
#if defined(PNG_USE_PNGGCCRD)
int PNGAPI png_mmx_support(void);
#ifdef PNG_USE_LOCAL_ARRAYS
static const int FARDATA png_pass_start[7] = {0, 4, 0, 2, 0, 1, 0};
static const int FARDATA png_pass_inc[7] = {8, 8, 4, 4, 2, 2, 1};
static const int FARDATA png_pass_width[7] = {8, 4, 4, 2, 2, 1, 1};
#endif
#if defined(PNG_ASSEMBLER_CODE_SUPPORTED)
/* djgpp, Win32, and Cygwin add their own underscores to global variables,
* so define them without: */
#if defined(__DJGPP__) || defined(WIN32) || defined(__CYGWIN__)
# define _mmx_supported mmx_supported
# define _const4 const4
# define _const6 const6
# define _mask8_0 mask8_0
# define _mask16_1 mask16_1
# define _mask16_0 mask16_0
# define _mask24_2 mask24_2
# define _mask24_1 mask24_1
# define _mask24_0 mask24_0
# define _mask32_3 mask32_3
# define _mask32_2 mask32_2
# define _mask32_1 mask32_1
# define _mask32_0 mask32_0
# define _mask48_5 mask48_5
# define _mask48_4 mask48_4
# define _mask48_3 mask48_3
# define _mask48_2 mask48_2
# define _mask48_1 mask48_1
# define _mask48_0 mask48_0
# define _LBCarryMask LBCarryMask
# define _HBClearMask HBClearMask
# define _ActiveMask ActiveMask
# define _ActiveMask2 ActiveMask2
# define _ActiveMaskEnd ActiveMaskEnd
# define _ShiftBpp ShiftBpp
# define _ShiftRem ShiftRem
#ifdef PNG_THREAD_UNSAFE_OK
# define _unmask unmask
# define _FullLength FullLength
# define _MMXLength MMXLength
# define _dif dif
# define _patemp patemp
# define _pbtemp pbtemp
# define _pctemp pctemp
#endif
#endif
/* These constants are used in the inlined MMX assembly code.
Ignore gcc's "At top level: defined but not used" warnings. */
/* GRR 20000706: originally _unmask was needed only when compiling with -fPIC,
* since that case uses the %ebx register for indexing the Global Offset Table
* and there were no other registers available. But gcc 2.95 and later emit
* "more than 10 operands in `asm'" errors when %ebx is used to preload unmask
* in the non-PIC case, so we'll just use the global unconditionally now.
*/
#ifdef PNG_THREAD_UNSAFE_OK
static int _unmask;
#endif
static unsigned long long _mask8_0 = 0x0102040810204080LL;
static unsigned long long _mask16_1 = 0x0101020204040808LL;
static unsigned long long _mask16_0 = 0x1010202040408080LL;
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -