📄 polysorting.htm

📁 3D游戏编程领域专家撰写的启迪性文章之一
💻 HTM
📖 第 1 页 / 共 2 页
字号:
上一页 12
                }            }            // put the 1s back in the original data array first            memcpy(data, oneArray, 2 * numOnes);            // now put the 0s back            memcpy(data + (2 * numOnes), zeroArray, 2 * numZeros);            // mask out the next most significant bit next time through            mask <<= 1;        }                This code is considerably faster than either the bubble        sort or the linear sort.  The inner loop is executed 16 *        numData times, and consists of three indirect references, one        TEST, one MOV, one INC, and one JZ/JMP (depending on the branch        taken) plus loop overhead.  The outer loop is actually more        costly than the inner loop because of the two block memory        transfers.  However, the outer loop is only executed 16 times        in this example.  Even so, this is a lot of iterations through        the loop.  If we could somehow find a way to cut down on the 16        iterations required per data element while keeping the inner        loop very simple, we could get a big increase in performance.THE BYTE SORT (BASE 256 RADIX SORT)                The obvious solution to this problem is to increase the        base of each radix examined.  The next logical base to use is        base 256, which can be represented in a single byte.  If we        were to implement the byte sort the same way we implemented the        base 10 radix sort in the original radix sort example, we would        look for a value of 255 in the least significant byte of each        data element, then look for a 254 in each element, etc.  This        would yeild (numData * 256 * 2 iterations) of the inner loop if        we used 16 bit data.  This would be nowhere near as fast as a        bit sort.                However, we can apply a bit of ingenuity to the byte        sort algorithm by taking a hint from the implementation of the        bit sort.  In the bit sort, we had a separate array for each        possible value of a given radix.  In a base 2 sort, there were        two possible values for each radix, therefore we had two        arrays.  If we extend this concept to a base 256 sort, we would        need 256 arrays, one for each possible value of a radix of base        256.  Now, following the bitsort algorithm, we would go through        the least significant byte of the data, placing the data in the        appropriate array which corresponds to the value of the least        significant radix.  We would of course repeat the procedure for        the next most significant radix and so on until all radices        have been considered.  In the case of 16 bit data, this method        would yeild numData * 2 iterations through the inner loop,        which is an 8 fold speed increase over the bit sort.                However, there is a rather large problem in the        implementation of the byte sort: memory.  In the implementation         of the bit sort, it is fairly easy to organize code around two         arrays.  However, a byte sort necessitates that the radix being         examined be used as an index to an appropriate array (for         example, a radix of value 4 would need to be placed in the '4s'         array, a radix of value 33 would need to be placed in the '33s'         array, etc).  Therefore, a two dimensional data structure needs         to be used, with one dimension corresponding to the possible         values of radices, and the other index corresponding to the         actual data containing radices of a certain value.                          <-- RADIX  -->        ^   |00|01|02|03|04|05|06|07|08|09|0A|0B|0C|0D|0E|..|FC|FD|FE|FF        | --|-----------------------------------------------------------            00|  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |..|  |  |  |        D --|-----------------------------------------------------------        A 01|  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |..|  |  |  |        T --|-----------------------------------------------------------        A ..|..|..|..|..|..|..|..|..|..|..|..|..|..|..|..|..|..|..|..|..          --|-----------------------------------------------------------        | nn|  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |..|  |  |  |        v --------------------------------------------------------------                Our dilemma lies in the fact that there is no way         whatsoever to determine the number of data that will contain a         given radix at a given time.  Therefore, there is no way of         knowing how large to make the dimension (nn) when initializing         the data structure.  We could always make 256 arrays of         [numData] elements each, but this would be extremely         inefficient in that we would be allocating 256 times the memory         we really need.                The solution to this problem lies in dynamic data         structures.  By setting up an array of 256 linked lists, we can         assure that we will not be allocating much more memory than we         need.  The total overhead for a singly linked list is 4 bytes         (the size of the starting list pointer), plus 4 bytes per data         element.  Therefore, the total amount of storage needed for an         array of 256 linked lists containing a total of numData         elements is 4 * (256 + numData), only 1024 bytes more than we         would need to store the data alone.                Now comes the task of selecting a data structure to fit         our needs.  The two types of singly linked lists that could be         used are stacks and queues.  I have no desire whatsoever to         teach a course in data structures in this file; suffice to say         that a stack is a last in first out (LIFO) structure, and a         queue is a first in first out structure (FIFO).  In other         words, the first value placed on a stack will be the last one         removed, and the first value placed in a queue will be the         first one removed.  I chose stacks because the operations to         place (push) and remove (pop) values on and off the stack are         faster than those for queues, and it is very easy to check to         see if a stack is empty.  The LIFO nature of stacks complicates         things a bit between loops, but within a loop a stack is by far         the speediest option.                Using stacks, the byte sort looks something like this        typedef struct         {            (polygon data goes here)            ...            // pointer to next polygon in the stack            polygon *next;        } polygon;        polygon *stack1[256], *stack2[256];     // our arrays of stacks        // this is the inner sort loop        for (count = 0; count < numData; count++)        {            index = poly[count]->z & 0xFF;      // consider only the                                                 // least significant radix            push(stack1[index], poly[count]);   // put the poly in its                                                 // proper place        }                That excruciatingly simple loop will sort the least         significant byte.  Now, sorting the next most significant byte         is a bit more complicated.  You must remember that the radix         sort algorithm states we must sort the previously sorted data         from greatest to least if we want our resultant data to be         sorted from greatest to least.  So we must start with the         highest value for the first radix and count backwards.  That         means we need to start with the data on stack 255 and count         down.        for (count = 255; count >= 0; count--)        {            while (!empty(stack1[count])            {                temp = pop(stack1[count]);                index = (temp->z & 0xFF00) >> 8; // next radix                push(stack2[index], temp);            }        }                After this loop, the data will be in stack2[] with the         smallest value at the top of stack2[0], and the largest value         at the bottom of stack2[255].  From here, you can have your         data sorted from least to greatest or greatest to least         depending on how you put it back in the original poly[] array.WELL...WHAT NOW?                If you are experienced with data structures, start         coding.  If you are a bit dilenquent in your knowledge, start         reading.  I recommend that you code the entire sort in         assembler.  If you are smart about the way you set things up,         your code will consist solely of MOV, AND, and SHL         instructions with a JMP thrown in every once in awhile for         good measure, and will occupy about 25 lines.  I have to         confess that the assembler version of my sort is not the most         highly optimized, simply because it is inherently so fast.  As         I said before, my sort takes a whopping 5% of runtime.  Since         the sort time of a radix sort is linear with respect to data         size, this figure is the same whether you are sorting 10         polygons or 10,000.  Before I spend long hours looking for         places to save a cycle or two, I am going over the slower parts         of my code and optimizing them.  However, I am positive I would         have to spend a great deal of time optimizing my sort if I had         used a slower algorithm.FINAL WORDS                I am not interested in doing your coding for you.  I am         happy to answer any questions you may have, but I will not         write your code for you, and I refuse to teach a course in data         structures.  If you don't understand basic stack operations,         please don't ask me to explain them to you.  Any mail         pertaining to this topic will be at best ignored, at worst         returned accompanied by a polemic.GREETS & THANKS        Siri                    - for being YOU!        Hurricane/OTM           - for continuing to give us direction        Zilym Limms/OTM         - for BWSB, and asm help        Phred/OTM               - 3d coalescence        Rest of OTM             - for surviving in the otherwise dead 602         Rich Beerman            - we were going to...make a game? :>        Jailcoder               - first mentioned 'byte sort' to me         Darkshade               - taught me the bit sort, many other things         Daredevil and Tran      - pmode/w         Otto Chrons             - for telling me what I am doing wrong         Barry Egerter           - showed me how cool 640x480x256 looks         Jmagic/Complex          - cool intro for TP94         Phantom/Sonic PC        - show us a 3d demo already, will you? :>          StarScream              - I want to see your vector code too :)        Kodiak_                 -\        ae-                     -          Error                   -  For continued faith in the potential         Omni                    -  of the North American scene         ior                     -/        Anyone else I forgot    - lame, cliche, copout greet phrase</pre>                 </td>          </tr>          </table>          </center>                      </BODY>		</HTML>
上一页 12
💿 文件大小 8 K
👤 上传用户 foxfsk
📂 所属分类文章/文档
📄 代码行数 492 行
💻 语言类型 HTM
🏷️ 相关标签

#3D游戏 #编程 #家
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -