pnggccrd.c

来自「HEG是一个易用的强大的硬件加速2D游戏引擎他完全具备了具有开发商业质量的2」· C语言代码 · 共 1,576 行 · 第 1/5 页
1,576 行
/* pnggccrd.c - mixed C/assembler version of utilities to read a PNG file
 *
 * For Intel x86 CPU (Pentium-MMX or later) and GNU C compiler.
 *
 *     See http://www.intel.com/drg/pentiumII/appnotes/916/916.htm
 *     and http://www.intel.com/drg/pentiumII/appnotes/923/923.htm
 *     for Intel's performance analysis of the MMX vs. non-MMX code.
 *
 * libpng version 1.2.8 - December 3, 2004
 * For conditions of distribution and use, see copyright notice in png.h
 * Copyright (c) 1998-2004 Glenn Randers-Pehrson
 * Copyright (c) 1998, Intel Corporation
 *
 * Based on MSVC code contributed by Nirav Chhatrapati, Intel Corp., 1998.
 * Interface to libpng contributed by Gilles Vollant, 1999.
 * GNU C port by Greg Roelofs, 1999-2001.
 *
 * Lines 2350-4300 converted in place with intel2gas 1.3.1:
 *
 *   intel2gas -mdI pnggccrd.c.partially-msvc -o pnggccrd.c
 *
 * and then cleaned up by hand.  See http://hermes.terminal.at/intel2gas/ .
 *
 * NOTE:  A sufficiently recent version of GNU as (or as.exe under DOS/Windows)
 *        is required to assemble the newer MMX instructions such as movq.
 *        For djgpp, see
 *
 *           ftp://ftp.simtel.net/pub/simtelnet/gnu/djgpp/v2gnu/bnu281b.zip
 *
 *        (or a later version in the same directory).  For Linux, check your
 *        distribution's web site(s) or try these links:
 *
 *           http://rufus.w3.org/linux/RPM/binutils.html
 *           http://www.debian.org/Packages/stable/devel/binutils.html
 *           ftp://ftp.slackware.com/pub/linux/slackware/slackware/slakware/d1/
 *             binutils.tgz
 *
 *        For other platforms, see the main GNU site:
 *
 *           ftp://ftp.gnu.org/pub/gnu/binutils/
 *
 *        Version 2.5.2l.15 is definitely too old...
 */

/*
 * TEMPORARY PORTING NOTES AND CHANGELOG (mostly by Greg Roelofs)
 * =====================================
 *
 * 19991006:
 *  - fixed sign error in post-MMX cleanup code (16- & 32-bit cases)
 *
 * 19991007:
 *  - additional optimizations (possible or definite):
 *     x [DONE] write MMX code for 64-bit case (pixel_bytes == 8) [not tested]
 *     - write MMX code for 48-bit case (pixel_bytes == 6)
 *     - figure out what's up with 24-bit case (pixel_bytes == 3):
 *        why subtract 8 from width_mmx in the pass 4/5 case?
 *        (only width_mmx case) (near line 1606)
 *     x [DONE] replace pixel_bytes within each block with the true
 *        constant value (or are compilers smart enough to do that?)
 *     - rewrite all MMX interlacing code so it's aligned with
 *        the *beginning* of the row buffer, not the end.  This
 *        would not only allow one to eliminate half of the memory
 *        writes for odd passes (that is, pass == odd), it may also
 *        eliminate some unaligned-data-access exceptions (assuming
 *        there's a penalty for not aligning 64-bit accesses on
 *        64-bit boundaries).  The only catch is that the "leftover"
 *        pixel(s) at the end of the row would have to be saved,
 *        but there are enough unused MMX registers in every case,
 *        so this is not a problem.  A further benefit is that the
 *        post-MMX cleanup code (C code) in at least some of the
 *        cases could be done within the assembler block.
 *  x [DONE] the "v3 v2 v1 v0 v7 v6 v5 v4" comments are confusing,
 *     inconsistent, and don't match the MMX Programmer's Reference
 *     Manual conventions anyway.  They should be changed to
 *     "b7 b6 b5 b4 b3 b2 b1 b0," where b0 indicates the byte that
 *     was lowest in memory (e.g., corresponding to a left pixel)
 *     and b7 is the byte that was highest (e.g., a right pixel).
 *
 * 19991016:
 *  - Brennan's Guide notwithstanding, gcc under Linux does *not*
 *     want globals prefixed by underscores when referencing them--
 *     i.e., if the variable is const4, then refer to it as const4,
 *     not _const4.  This seems to be a djgpp-specific requirement.
 *     Also, such variables apparently *must* be declared outside
 *     of functions; neither static nor automatic variables work if
 *     defined within the scope of a single function, but both
 *     static and truly global (multi-module) variables work fine.
 *
 * 19991023:
 *  - fixed png_combine_row() non-MMX replication bug (odd passes only?)
 *  - switched from string-concatenation-with-macros to cleaner method of
 *     renaming global variables for djgpp--i.e., always use prefixes in
 *     inlined assembler code (== strings) and conditionally rename the
 *     variables, not the other way around.  Hence _const4, _mask8_0, etc.
 *
 * 19991024:
 *  - fixed mmxsupport()/png_do_read_interlace() first-row bug
 *     This one was severely weird:  even though mmxsupport() doesn't touch
 *     ebx (where "row" pointer was stored), it nevertheless managed to zero
 *     the register (even in static/non-fPIC code--see below), which in turn
 *     caused png_do_read_interlace() to return prematurely on the first row of
 *     interlaced images (i.e., without expanding the interlaced pixels).
 *     Inspection of the generated assembly code didn't turn up any clues,
 *     although it did point at a minor optimization (i.e., get rid of
 *     mmx_supported_local variable and just use eax).  Possibly the CPUID
 *     instruction is more destructive than it looks?  (Not yet checked.)
 *  - "info gcc" was next to useless, so compared fPIC and non-fPIC assembly
 *     listings...  Apparently register spillage has to do with ebx, since
 *     it's used to index the global offset table.  Commenting it out of the
 *     input-reg lists in png_combine_row() eliminated compiler barfage, so
 *     ifdef'd with __PIC__ macro:  if defined, use a global for unmask
 *
 * 19991107:
 *  - verified CPUID clobberage:  12-char string constant ("GenuineIntel",
 *     "AuthenticAMD", etc.) placed in ebx:ecx:edx.  Still need to polish.
 *
 * 19991120:
 *  - made "diff" variable (now "_dif") global to simplify conversion of
 *     filtering routines (running out of regs, sigh).  "diff" is still used
 *     in interlacing routines, however.
 *  - fixed up both versions of mmxsupport() (ORIG_THAT_USED_TO_CLOBBER_EBX
 *     macro determines which is used); original not yet tested.
 *
 * 20000213:
 *  - when compiling with gcc, be sure to use  -fomit-frame-pointer
 *
 * 20000319:
 *  - fixed a register-name typo in png_do_read_interlace(), default (MMX) case,
 *     pass == 4 or 5, that caused visible corruption of interlaced images
 *
 * 20000623:
 *  - Various problems were reported with gcc 2.95.2 in the Cygwin environment,
 *     many of the form "forbidden register 0 (ax) was spilled for class AREG."
 *     This is explained at http://gcc.gnu.org/fom_serv/cache/23.html, and
 *     Chuck Wilson supplied a patch involving dummy output registers.  See
 *     http://sourceforge.net/bugs/?func=detailbug&bug_id=108741&group_id=5624
 *     for the original (anonymous) SourceForge bug report.
 *
 * 20000706:
 *  - Chuck Wilson passed along these remaining gcc 2.95.2 errors:
 *       pnggccrd.c: In function `png_combine_row':
 *       pnggccrd.c:525: more than 10 operands in `asm'
 *       pnggccrd.c:669: more than 10 operands in `asm'
 *       pnggccrd.c:828: more than 10 operands in `asm'
 *       pnggccrd.c:994: more than 10 operands in `asm'
 *       pnggccrd.c:1177: more than 10 operands in `asm'
 *     They are all the same problem and can be worked around by using the
 *     global _unmask variable unconditionally, not just in the -fPIC case.
 *     Reportedly earlier versions of gcc also have the problem with more than
 *     10 operands; they just don't report it.  Much strangeness ensues, etc.
 *
 * 20000729:
 *  - enabled png_read_filter_row_mmx_up() (shortest remaining unconverted
 *     MMX routine); began converting png_read_filter_row_mmx_sub()
 *  - to finish remaining sections:
 *     - clean up indentation and comments
 *     - preload local variables
 *     - add output and input regs (order of former determines numerical
 *        mapping of latter)
 *     - avoid all usage of ebx (including bx, bh, bl) register [20000823]
 *     - remove "$" from addressing of Shift and Mask variables [20000823]
 *
 * 20000731:
 *  - global union vars causing segfaults in png_read_filter_row_mmx_sub()?
 *
 * 20000822:
 *  - ARGH, stupid png_read_filter_row_mmx_sub() segfault only happens with
 *     shared-library (-fPIC) version!  Code works just fine as part of static
 *     library.  Damn damn damn damn damn, should have tested that sooner.
 *     ebx is getting clobbered again (explicitly this time); need to save it
 *     on stack or rewrite asm code to avoid using it altogether.  Blargh!
 *
 * 20000823:
 *  - first section was trickiest; all remaining sections have ebx -> edx now.
 *     (-fPIC works again.)  Also added missing underscores to various Shift*
 *     and *Mask* globals and got rid of leading "$" signs.
 *
 * 20000826:
 *  - added visual separators to help navigate microscopic printed copies
 *     (http://pobox.com/~newt/code/gpr-latest.zip, mode 10); started working
 *     on png_read_filter_row_mmx_avg()
 *
 * 20000828:
 *  - finished png_read_filter_row_mmx_avg():  only Paeth left! (930 lines...)
 *     What the hell, did png_read_filter_row_mmx_paeth(), too.  Comments not
 *     cleaned up/shortened in either routine, but functionality is complete
 *     and seems to be working fine.
 *
 * 20000829:
 *  - ahhh, figured out last(?) bit of gcc/gas asm-fu:  if register is listed
 *     as an input reg (with dummy output variables, etc.), then it *cannot*
 *     also appear in the clobber list or gcc 2.95.2 will barf.  The solution
 *     is simple enough...
 *
 * 20000914:
 *  - bug in png_read_filter_row_mmx_avg():  16-bit grayscale not handled
 *     correctly (but 48-bit RGB just fine)
 *
 * 20000916:
 *  - fixed bug in png_read_filter_row_mmx_avg(), bpp == 2 case; three errors:
 *     - "_ShiftBpp.use = 24;"      should have been   "_ShiftBpp.use = 16;"
 *     - "_ShiftRem.use = 40;"      should have been   "_ShiftRem.use = 48;"
 *     - "psllq _ShiftRem, %%mm2"   should have been   "psrlq _ShiftRem, %%mm2"
 *
 * 20010101:
 *  - added new png_init_mmx_flags() function (here only because it needs to
 *     call mmxsupport(), which should probably become global png_mmxsupport());
 *     modified other MMX routines to run conditionally (png_ptr->asm_flags)
 *
 * 20010103:
 *  - renamed mmxsupport() to png_mmx_support(), with auto-set of mmx_supported,
 *     and made it public; moved png_init_mmx_flags() to png.c as internal func
 *
 * 20010104:
 *  - removed dependency on png_read_filter_row_c() (C code already duplicated
 *     within MMX version of png_read_filter_row()) so no longer necessary to
 *     compile it into pngrutil.o
 *
 * 20010310:
 *  - fixed buffer-overrun bug in png_combine_row() C code (non-MMX)
 *
 * 20020304:
 *  - eliminated incorrect use of width_mmx in pixel_bytes == 8 case
 *
 * 20040724:
 *   - more tinkering with clobber list at lines 4529 and 5033, to get
 *     it to compile on gcc-3.4.
 *
 * STILL TO DO:
 *     - test png_do_read_interlace() 64-bit case (pixel_bytes == 8)
 *     - write MMX code for 48-bit case (pixel_bytes == 6)
 *     - figure out what's up with 24-bit case (pixel_bytes == 3):
 *        why subtract 8 from width_mmx in the pass 4/5 case?
 *        (only width_mmx case) (near line 1606)
 *     - rewrite all MMX interlacing code so it's aligned with beginning
 *        of the row buffer, not the end (see 19991007 for details)
 *     x pick one version of mmxsupport() and get rid of the other
 *     - add error messages to any remaining bogus default cases
 *     - enable pixel_depth == 8 cases in png_read_filter_row()? (test speed)
 *     x add support for runtime enable/disable/query of various MMX routines
 */

#define PNG_INTERNAL
#include "png.h"

#if defined(PNG_USE_PNGGCCRD)

int PNGAPI png_mmx_support(void);

#ifdef PNG_USE_LOCAL_ARRAYS
static const int FARDATA png_pass_start[7] = {0, 4, 0, 2, 0, 1, 0};
static const int FARDATA png_pass_inc[7]   = {8, 8, 4, 4, 2, 2, 1};
static const int FARDATA png_pass_width[7] = {8, 4, 4, 2, 2, 1, 1};
#endif

#if defined(PNG_ASSEMBLER_CODE_SUPPORTED)
/* djgpp, Win32, and Cygwin add their own underscores to global variables,
 * so define them without: */
#if defined(__DJGPP__) || defined(WIN32) || defined(__CYGWIN__)
#  define _mmx_supported  mmx_supported
#  define _const4         const4
#  define _const6         const6
#  define _mask8_0        mask8_0
#  define _mask16_1       mask16_1
#  define _mask16_0       mask16_0
#  define _mask24_2       mask24_2
#  define _mask24_1       mask24_1
#  define _mask24_0       mask24_0
#  define _mask32_3       mask32_3
#  define _mask32_2       mask32_2
#  define _mask32_1       mask32_1
#  define _mask32_0       mask32_0
#  define _mask48_5       mask48_5
#  define _mask48_4       mask48_4
#  define _mask48_3       mask48_3
#  define _mask48_2       mask48_2
#  define _mask48_1       mask48_1
#  define _mask48_0       mask48_0
#  define _LBCarryMask    LBCarryMask
#  define _HBClearMask    HBClearMask
#  define _ActiveMask     ActiveMask
#  define _ActiveMask2    ActiveMask2
#  define _ActiveMaskEnd  ActiveMaskEnd
#  define _ShiftBpp       ShiftBpp
#  define _ShiftRem       ShiftRem
#ifdef PNG_THREAD_UNSAFE_OK
#  define _unmask         unmask
#  define _FullLength     FullLength
#  define _MMXLength      MMXLength
#  define _dif            dif
#  define _patemp         patemp
#  define _pbtemp         pbtemp
#  define _pctemp         pctemp
#endif
#endif


/* These constants are used in the inlined MMX assembly code.
   Ignore gcc's "At top level: defined but not used" warnings. */

/* GRR 20000706:  originally _unmask was needed only when compiling with -fPIC,
 *  since that case uses the %ebx register for indexing the Global Offset Table
 *  and there were no other registers available.  But gcc 2.95 and later emit
 *  "more than 10 operands in `asm'" errors when %ebx is used to preload unmask
 *  in the non-PIC case, so we'll just use the global unconditionally now.
 */
#ifdef PNG_THREAD_UNSAFE_OK
static int _unmask;
#endif

static unsigned long long _mask8_0  = 0x0102040810204080LL;

static unsigned long long _mask16_1 = 0x0101020204040808LL;
static unsigned long long _mask16_0 = 0x1010202040408080LL;
pnggccrd.c - 源码说明

本页面展示了「HEG是一个易用的强大的硬件加速2D游戏引擎他完全具备了具有开发商业质量的2D游戏的中层引擎」中的 pnggccrd.c 源码文件，采用 C语言编程语言编写，共 1,576 行代码。您可以在线阅读完整代码内容，也可以返回资源详情页下载完整源码包进行本地学习和开发。
虫虫下载站收录了大量与HEG相关的技术资源，包括源代码、技术文档、电路图等，是电子工程师和嵌入式开发者的专业学习平台。
⌨️ 快捷键说明

复制代码Ctrl + C
搜索代码Ctrl + F
全屏模式F11
增大字号Ctrl + =
减小字号Ctrl + -
显示快捷键?