muxtablegenerator.java
来自「一种将c高级语言转化给VHDL的编译器」· Java 代码 · 共 823 行 · 第 1/3 页
JAVA
823 行
/* * LA-CC 05-135 Trident 0.7.1Copyright NoticeCopyright 2006 (c) the Regents of the University of California.This Software was produced under a U.S. Government contract(W-7405-ENG-36) by Los Alamos National Laboratory, which is operatedby the University of California for the U.S. Department of Energy. TheU.S. Government is licensed to use, reproduce, and distribute thisSoftware. Permission is granted to the public to copy and use thisSoftware without charge, provided that this Notice and any statementof authorship are reproduced on all copies. Neither the Government northe University makes any warranty, express or implied, or assumes anyliability or responsibility for the user of this Software. */package fp.flowgraph;import java.util.*;import fp.util.*;import java.io.*;import fp.hardware.*;/** This class is used to generate the table of muxes that store the final * output from a modulo scheduled loop. The epilog block created by the modulo scheduler contains the end of several iterations of a loop. Each of these iterations should be trying to save values to some register for use by the rest of the circuit later or as output of the circuit. However each of these iterations will be attempting to write to the same registers and multiple writes to a register is not allowed, and even if it were, there is nothing to gaurantee that the correct value was stored. Also when the kernal exists it might start iterations in the epilog that are after what was supposed to be the final iteration. We need a way to make sure only the correct iteration's output is stored in the registers. This is performed with a tree of muxes, the inputs of which are the outputs from the different iterations and the select signals are made up of logic using the store instructions' predicates from each iteration. It is easiest to explain how the mux tree and its control is built, using an example. Let's say we have the following c code: for(int i=0; i<5; i++) x+=5; When the loop exits it will save a value to a register representing x. Now let us assume that after translating the c code to byte code we have a schedule that looks like this: 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 where each of these numbers represents a cycle in a schedule of the byte code and for each cycle there is some set of instructions. Now let's assume after modulo scheduling we find an ii of 4 and can pipeline the loop like this: 15 14 13 12 11 15 10 14 9 13 8 12 7 11 15 6 10 14 5 9 13 4 8 12 3 7 11 15 2 6 10 14 1 5 9 13 0 4 8 12 3 7 11 15 2 6 10 14 1 5 9 13 0 4 8 12 3 7 11 2 6 10 1 5 9 0 4 8 3 7 2 6 1 5 0 4 3 2 1 0 iteration 5 4 3 2 1 For this, the kernal is this: 3 7 11 15 2 6 10 14 1 5 9 13 0 4 8 12 and the epilog is this: 15 14 13 12 11 15 10 14 9 13 8 12 7 11 15 6 10 14 5 9 13 4 8 12 There are three iterations running concurrently in the epilog and all three will try to save x. However only the fifth iteration, which in this case happens to be the last iteration of the epilog, should be written to x. What we need is a table of muxes to decide which iteration's output to write to the register. My mux trees are as flat as possible, to try and reduce the number of muxes. If there were 8 iterations in the epilog the mux tree would look like this: x / \ mux5 mux6 / \ / \ mux1 mux2 mux3 mux4 /\ /\ /\ /\ 1 2 3 4 5 6 7 8 Our three iteration example above would have two muxes, the first of which would choose between the two earliest iterations (the two shortest), and the second mux would choose between the first mux's output and third iteration and would store the result to the x register. My algorithm starts with a list of the outputs from each iteration. It takes the first two (if there are more than 1), creates a mux and saves the output from the mux to a new list. It then takes the next two outputs and does the same. It continues until there is only one or zero outputs in the list. If there is one, it copies this output directly to the new list and if there's zero, it has completed the first row of muxes. Then it repeats this process on the new list of mux outputs. it continues to repeat until there is only one mux output. The final step is to create a store instruction to store the last mux's output to the register. This process is performed on every primal operand in the loop that must be saved (i.e. all but the modPrims). In addition to creating the muxes we need logic to control the select signals on the muxes. We need to be able to choose the correct output to save to the register. We can know this by looking at the predicates on the store instructions for each iteration. In the above c-code example, there would be an instruction like this: x = aaa_store %tmp_1 | ~%tmp_2 where "~%tmp_2" (or when %tmp_2 is 0) is the predicate. Also, somewhere in the schedule there should be an instruction to calculate %tmp_2, which should look something like this: %tmp_2 = aaa_setlt %inc_1 %const5 Which means that if %inc_1 is less than 5 return true (actually 1) and otherwise return false (0). Each iteration will calculate the predicate separately, where in iteration 1, inc_1 will be 1 (at the end), %tmp_2 will be 1, and ~%tmp_2 will be 0, and the store will not occur. In iteration 2, inc_1 will be 2, and ~tmp_2 still 0, and in iteration 3, inc_1 will be 3, and ~tmp_2 again 0, up until iteration 5, when inc_1 will be 5 and ~tmp_2 will finally be 1 and the store will occur (and the loop will exit). But actually, if iteration 6 is allowed to occur (which in this case could happen) %inc_1 will be 6, and ~tmp_2 will still be 1 and it could be stored too. We don't want that value, however. What we need is the first true. So if we have conditions like the following: iteration: 1 2 3 4 5 6 7 8 predicate value: F F F F T T T T we want to write on the first iteration that was true (6). However, in while loops with strange conditions, it may be true in one iteration and false again in the next one. We still, however, want the first true iteration. The reason, more than five iterations may have started is that the location that the exit condition for the kernal can be in different places. Reffering back to the kernal from above: 3 7 11 15 2 6 10 14 1 5 9 13 0 4 8 12 If the exit condition is calculated somewhere between cycles 12 and 15, three later cycles will have already started, and if it was calculated between 8 and 11, two will later cycles will have started, and if was calculated between 4 and 7, 1 later cycle. In all of those conditions not only will the 5th iteration that we want be finishing in the epilog (actually if the exit condition is calculated between 12 and 15, then iteration 5 will be finished before the epilog, but I'll discuss that later), but also some later iterations will be finishing in the epilog and they will all want to store their values in the register and all their predicates will be true. To choose the correct iteration we can take advantage of the mutual exclusitivity of the even when one iteration should be stored as opposed to another. This means, if we have to choose between two different iterations with one mux like this: x /\ 1 2 the control is iteration 1's predicate. If it's false, the mux will select 2 and if it's true it will select 1, even if both predicates are true, and iteration 1 (the earlier iteration) was choosen simply due to its location on the input pins. When another layer is added, the select is only slightly more complicated. x / \ mux1 mux2 /\ /\ 1 2 3 4 The control for mux 1 is iteration 1's predicate and the control for mux 2 is iteration 3's predicate. The control for the output mux is iteration 1's predicate OR iteration 2's. If iteration 4 is the only one that is true it will be selected, because 3's predicate will evaluate to false, and since both 1 and 2 are false the output mux will choose the right input coming from mux 2. If 3 was correct it would be choosed by mux 2 similar to above even if 4 was also correct, and choosen in the output mux because 1 and 2 were both false. If 1 or 2 was true the output mux would choose them even if iteration 3 and/or 4 was also correct. When creating each mux, I also OR the two predicates from its input and save the result to be the control for the mux above and to be ORed with that mux's other input to create the control for the mux above that. However there are a few special cases that must be handled slightly differently. The first case is when the correct iteration actually finished in the kernal. In the above example kernal: 3 7 11 15 2 6 10 14 1 5 9 13 0 4 8 12 if the correct iteration evaluated its loop exit condition somewhere between cycles 12 and 15, when the kernal reached its end and exited, this iteration will have completely finished and the result will have been saved to register x. To handle this, I store the predicate used in the store in the kernal, and load this in the epilog. Then I use the inverse of this predicate as the final epilog store to x's predicate. That is, if the predicate for the kernal's store was true, the epilog store's predicate needs to be false. This, unfortunately however, involves a few steps. The first step is to evaluate the logic used in the kernal store's predicate. If it was ~%tmp_2, we must add the instruction: %tmp_3 = aaa_not %tmp_2 because, unfortunately, we cannot simply store ~%tmp_2. (And if the
⌨️ 快捷键说明
复制代码Ctrl + C
搜索代码Ctrl + F
全屏模式F11
增大字号Ctrl + =
减小字号Ctrl + -
显示快捷键?