muxtablegenerator.java

来自「一种将c高级语言转化给VHDL的编译器」· Java 代码 · 共 823 行 · 第 1/3 页

JAVA
823
字号
/* * LA-CC 05-135 Trident 0.7.1Copyright NoticeCopyright 2006 (c) the Regents of the University of California.This Software was produced under a U.S. Government contract(W-7405-ENG-36) by Los Alamos National Laboratory, which is operatedby the University of California for the U.S. Department of Energy. TheU.S. Government is licensed to use, reproduce, and distribute thisSoftware. Permission is granted to the public to copy and use thisSoftware without charge, provided that this Notice and any statementof authorship are reproduced on all copies. Neither the Government northe University makes any warranty, express or implied, or assumes anyliability or responsibility for the user of this Software. */package fp.flowgraph;import java.util.*;import fp.util.*;import java.io.*;import fp.hardware.*;/** This class is used to generate the table of muxes that store the final *  output from a modulo scheduled loop.  The epilog block created by the modulo     scheduler contains the end of several iterations of a loop.  Each of these    iterations should be trying to save values to some register for use by the    rest of the circuit later or as output of the circuit.  However each of    these iterations will be attempting to write to the same registers and    multiple writes to a register is not allowed, and even if it were, there is    nothing to gaurantee that the correct value was stored.  Also when the    kernal exists it might start iterations in the epilog that are after what    was supposed to be the final iteration.  We need a way to make sure only the    correct iteration's output is stored in the registers.  This is performed    with a tree of muxes, the inputs of which are the outputs from the different    iterations and the select signals are made up of logic using the store    instructions' predicates from each iteration.          It is easiest to explain how the mux tree and its control is built, using an    example.  Let's say we have the following c code:        for(int i=0; i<5; i++)      x+=5;          When the loop exits it will save a value to a register representing x.  Now    let us assume that after translating the c code to byte code we have    a schedule that looks like this:         15     14     13     12     11     10     9     8     7     6     5     4     3     2     1     0         where each of these numbers represents a cycle in a schedule of the byte     code and for each cycle there is some set of instructions.  Now let's     assume after modulo scheduling we find an ii of 4 and can pipeline the loop     like this:          15     14     13     12      11 15     10 14     9  13     8  12     7  11   15     6  10   14     5   9   13     4   8   12     3   7   11   15     2   6   10   14     1   5    9   13     0   4    8   12         3    7   11   15	 2    6   10   14	 1    5    9   13	 0    4    8   12	      3    7   11	      2    6   10	      1    5    9	      0    4    8	           3    7		   2    6		   1    5		   0    4                        3			2			1			0          iteration     5   4    3    2    1                   For this, the kernal is this:          3   7   11   15     2   6   10   14     1   5    9   13     0   4    8   12          and the epilog is this:          15     14     13     12      11 15     10 14     9  13     8  12     7  11   15     6  10   14     5   9   13     4   8   12          There are three iterations running concurrently in the epilog and all three     will try to save x.  However only the fifth iteration, which in this case     happens to be the last iteration of the epilog, should be written to x.      What we need is a table of muxes to decide which iteration's output to     write to the register.  My mux trees are as flat as possible, to try and     reduce the number of muxes.  If there were 8 iterations in the epilog the     mux tree would look like this:                        x               /        \           mux5          mux6         /     \       /     \      mux1    mux2   mux3   mux4      /\      /\     /\     /\     1  2    3  4   5  6   7  8          Our three iteration example above would have two muxes, the first of which     would choose between the two earliest iterations (the two shortest), and     the second mux would choose between the first mux's output and third     iteration and would store the result to the x register.            My algorithm starts with a list of the outputs from each iteration.  It     takes the first two (if there are more than 1), creates a mux and saves the     output from the mux to a new list.  It then takes the next two outputs and     does the same.  It continues until there is only one or zero outputs in the     list.  If there is one, it copies this output directly to the new list and     if there's zero, it has completed the first row of muxes.  Then it repeats     this process on the new list of mux outputs.  it continues to repeat until     there is only one mux output.  The final step is to create a store     instruction to store the last mux's output to the register.  This process     is performed on every primal operand in the loop that must be saved (i.e.     all but the modPrims).            In addition to creating the muxes we need logic to control the select     signals on the muxes.  We need to be able to choose the correct output to     save to the register.  We can know this by looking at the predicates on the     store instructions for each iteration.  In the above c-code example, there     would be an instruction like this:          x = aaa_store %tmp_1  | ~%tmp_2          where "~%tmp_2" (or when %tmp_2 is 0) is the predicate.  Also, somewhere in      the schedule there should be an instruction to calculate %tmp_2, which      should look something like this:          %tmp_2 = aaa_setlt %inc_1 %const5          Which means that if %inc_1 is less than 5 return true (actually 1) and      otherwise return false (0).  Each iteration will calculate the predicate     separately, where in iteration 1, inc_1 will be 1 (at the end), %tmp_2 will      be 1, and     ~%tmp_2 will be 0, and the store will not occur.  In iteration 2, inc_1     will be 2, and ~tmp_2 still 0, and in iteration 3, inc_1 will be 3, and     ~tmp_2 again 0, up until iteration 5, when inc_1 will be 5 and ~tmp_2 will     finally be 1 and the store will occur (and the loop will exit).  But     actually, if iteration 6 is allowed to occur (which in this case could     happen) %inc_1 will be 6, and ~tmp_2 will still be 1 and it could be stored     too.  We don't want that value, however.  What we need is the first true.      So if we have conditions like the following:          iteration:         1   2   3   4   5   6   7   8     predicate value:   F   F   F   F   T   T   T   T          we want to write on the first iteration that was true (6).  However, in     while loops with strange conditions, it may be true in one iteration and     false again in the next one.  We still, however, want the first true     iteration. The reason, more than five iterations may have started is that     the location that the exit condition for the kernal can be in different     places.  Reffering back to the kernal from above:          3   7   11   15     2   6   10   14     1   5    9   13     0   4    8   12         If the exit condition is calculated somewhere between cycles 12 and 15,     three later cycles will have already started, and if it was calculated     between 8 and 11, two will later cycles will have started, and if was     calculated between 4 and 7, 1 later cycle.  In all of those conditions not     only will the 5th iteration that we want be finishing in the epilog     (actually if the exit condition is calculated between 12 and 15, then     iteration 5 will be finished before the epilog, but I'll discuss that     later), but also some later iterations will be finishing in the epilog and     they will all want to store their values in the register and all their     predicates will be true.  To choose the correct iteration we can take     advantage of the mutual exclusitivity of the even when one iteration should     be stored as opposed to another.  This means, if we have to choose between     two different iterations with one mux like this:            x      /\     1  2          the control is iteration 1's predicate.  If it's false, the mux will select     2 and if it's true it will select 1, even if both predicates are true, and     iteration 1 (the earlier iteration) was choosen simply due to its location     on the input pins.  When another layer is added, the select is only     slightly more complicated.                 x         /    \      mux1    mux2      /\      /\     1  2    3  4           The control for mux 1 is iteration 1's predicate and the control for     mux 2 is iteration 3's predicate.  The control for the output mux is     iteration 1's predicate OR iteration 2's.  If iteration 4 is the only one     that is true it will be selected, because 3's predicate will evaluate to     false, and since both 1 and 2 are false the output mux will choose the      right input coming from mux 2.  If 3 was correct it would be choosed by     mux 2 similar to above even if 4 was also correct, and choosen in the     output mux because 1 and 2 were both false. If 1 or 2 was true the output     mux would choose them even if iteration 3 and/or 4 was also correct.  When     creating each mux, I also OR the two predicates from its input and save the     result to be the control for the mux above and to be ORed with that mux's     other input to create the control for the mux above that.            However there are a few special cases that must be handled slightly     differently.  The first case is when the correct iteration actually     finished in the kernal.  In the above example kernal:          3   7   11   15     2   6   10   14     1   5    9   13     0   4    8   12          if the correct iteration evaluated its loop exit condition somewhere     between cycles 12 and 15, when the kernal reached its end and exited, this     iteration will have completely finished and the result will have been saved      to register x.  To handle this, I store the predicate used in the store in      the kernal, and load this in the epilog.  Then I use the inverse of this      predicate as the final epilog store to x's predicate.  That is, if the     predicate for the kernal's store was true, the epilog store's predicate     needs to be false.  This, unfortunately however, involves a few steps. The     first step is to evaluate the logic used in the kernal store's predicate.      If it was ~%tmp_2, we must add the instruction:          %tmp_3 = aaa_not %tmp_2          because, unfortunately, we cannot simply store ~%tmp_2.  (And if the

⌨️ 快捷键说明

复制代码Ctrl + C
搜索代码Ctrl + F
全屏模式F11
增大字号Ctrl + =
减小字号Ctrl + -
显示快捷键?