📄 polysorting.htm
字号:
} } // put the 1s back in the original data array first memcpy(data, oneArray, 2 * numOnes); // now put the 0s back memcpy(data + (2 * numOnes), zeroArray, 2 * numZeros); // mask out the next most significant bit next time through mask <<= 1; } This code is considerably faster than either the bubble sort or the linear sort. The inner loop is executed 16 * numData times, and consists of three indirect references, one TEST, one MOV, one INC, and one JZ/JMP (depending on the branch taken) plus loop overhead. The outer loop is actually more costly than the inner loop because of the two block memory transfers. However, the outer loop is only executed 16 times in this example. Even so, this is a lot of iterations through the loop. If we could somehow find a way to cut down on the 16 iterations required per data element while keeping the inner loop very simple, we could get a big increase in performance.THE BYTE SORT (BASE 256 RADIX SORT) The obvious solution to this problem is to increase the base of each radix examined. The next logical base to use is base 256, which can be represented in a single byte. If we were to implement the byte sort the same way we implemented the base 10 radix sort in the original radix sort example, we would look for a value of 255 in the least significant byte of each data element, then look for a 254 in each element, etc. This would yeild (numData * 256 * 2 iterations) of the inner loop if we used 16 bit data. This would be nowhere near as fast as a bit sort. However, we can apply a bit of ingenuity to the byte sort algorithm by taking a hint from the implementation of the bit sort. In the bit sort, we had a separate array for each possible value of a given radix. In a base 2 sort, there were two possible values for each radix, therefore we had two arrays. If we extend this concept to a base 256 sort, we would need 256 arrays, one for each possible value of a radix of base 256. Now, following the bitsort algorithm, we would go through the least significant byte of the data, placing the data in the appropriate array which corresponds to the value of the least significant radix. We would of course repeat the procedure for the next most significant radix and so on until all radices have been considered. In the case of 16 bit data, this method would yeild numData * 2 iterations through the inner loop, which is an 8 fold speed increase over the bit sort. However, there is a rather large problem in the implementation of the byte sort: memory. In the implementation of the bit sort, it is fairly easy to organize code around two arrays. However, a byte sort necessitates that the radix being examined be used as an index to an appropriate array (for example, a radix of value 4 would need to be placed in the '4s' array, a radix of value 33 would need to be placed in the '33s' array, etc). Therefore, a two dimensional data structure needs to be used, with one dimension corresponding to the possible values of radices, and the other index corresponding to the actual data containing radices of a certain value. <-- RADIX --> ^ |00|01|02|03|04|05|06|07|08|09|0A|0B|0C|0D|0E|..|FC|FD|FE|FF | --|----------------------------------------------------------- 00| | | | | | | | | | | | | | | |..| | | | D --|----------------------------------------------------------- A 01| | | | | | | | | | | | | | | |..| | | | T --|----------------------------------------------------------- A ..|..|..|..|..|..|..|..|..|..|..|..|..|..|..|..|..|..|..|..|.. --|----------------------------------------------------------- | nn| | | | | | | | | | | | | | | |..| | | | v -------------------------------------------------------------- Our dilemma lies in the fact that there is no way whatsoever to determine the number of data that will contain a given radix at a given time. Therefore, there is no way of knowing how large to make the dimension (nn) when initializing the data structure. We could always make 256 arrays of [numData] elements each, but this would be extremely inefficient in that we would be allocating 256 times the memory we really need. The solution to this problem lies in dynamic data structures. By setting up an array of 256 linked lists, we can assure that we will not be allocating much more memory than we need. The total overhead for a singly linked list is 4 bytes (the size of the starting list pointer), plus 4 bytes per data element. Therefore, the total amount of storage needed for an array of 256 linked lists containing a total of numData elements is 4 * (256 + numData), only 1024 bytes more than we would need to store the data alone. Now comes the task of selecting a data structure to fit our needs. The two types of singly linked lists that could be used are stacks and queues. I have no desire whatsoever to teach a course in data structures in this file; suffice to say that a stack is a last in first out (LIFO) structure, and a queue is a first in first out structure (FIFO). In other words, the first value placed on a stack will be the last one removed, and the first value placed in a queue will be the first one removed. I chose stacks because the operations to place (push) and remove (pop) values on and off the stack are faster than those for queues, and it is very easy to check to see if a stack is empty. The LIFO nature of stacks complicates things a bit between loops, but within a loop a stack is by far the speediest option. Using stacks, the byte sort looks something like this typedef struct { (polygon data goes here) ... // pointer to next polygon in the stack polygon *next; } polygon; polygon *stack1[256], *stack2[256]; // our arrays of stacks // this is the inner sort loop for (count = 0; count < numData; count++) { index = poly[count]->z & 0xFF; // consider only the // least significant radix push(stack1[index], poly[count]); // put the poly in its // proper place } That excruciatingly simple loop will sort the least significant byte. Now, sorting the next most significant byte is a bit more complicated. You must remember that the radix sort algorithm states we must sort the previously sorted data from greatest to least if we want our resultant data to be sorted from greatest to least. So we must start with the highest value for the first radix and count backwards. That means we need to start with the data on stack 255 and count down. for (count = 255; count >= 0; count--) { while (!empty(stack1[count]) { temp = pop(stack1[count]); index = (temp->z & 0xFF00) >> 8; // next radix push(stack2[index], temp); } } After this loop, the data will be in stack2[] with the smallest value at the top of stack2[0], and the largest value at the bottom of stack2[255]. From here, you can have your data sorted from least to greatest or greatest to least depending on how you put it back in the original poly[] array.WELL...WHAT NOW? If you are experienced with data structures, start coding. If you are a bit dilenquent in your knowledge, start reading. I recommend that you code the entire sort in assembler. If you are smart about the way you set things up, your code will consist solely of MOV, AND, and SHL instructions with a JMP thrown in every once in awhile for good measure, and will occupy about 25 lines. I have to confess that the assembler version of my sort is not the most highly optimized, simply because it is inherently so fast. As I said before, my sort takes a whopping 5% of runtime. Since the sort time of a radix sort is linear with respect to data size, this figure is the same whether you are sorting 10 polygons or 10,000. Before I spend long hours looking for places to save a cycle or two, I am going over the slower parts of my code and optimizing them. However, I am positive I would have to spend a great deal of time optimizing my sort if I had used a slower algorithm.FINAL WORDS I am not interested in doing your coding for you. I am happy to answer any questions you may have, but I will not write your code for you, and I refuse to teach a course in data structures. If you don't understand basic stack operations, please don't ask me to explain them to you. Any mail pertaining to this topic will be at best ignored, at worst returned accompanied by a polemic.GREETS & THANKS Siri - for being YOU! Hurricane/OTM - for continuing to give us direction Zilym Limms/OTM - for BWSB, and asm help Phred/OTM - 3d coalescence Rest of OTM - for surviving in the otherwise dead 602 Rich Beerman - we were going to...make a game? :> Jailcoder - first mentioned 'byte sort' to me Darkshade - taught me the bit sort, many other things Daredevil and Tran - pmode/w Otto Chrons - for telling me what I am doing wrong Barry Egerter - showed me how cool 640x480x256 looks Jmagic/Complex - cool intro for TP94 Phantom/Sonic PC - show us a 3d demo already, will you? :> StarScream - I want to see your vector code too :) Kodiak_ -\ ae- - Error - For continued faith in the potential Omni - of the North American scene ior -/ Anyone else I forgot - lame, cliche, copout greet phrase</pre> </td> </tr> </table> </center> </BODY> </HTML>
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -