📄 manual.tex
字号:
integer index as argument. Indexes to the \cp may be contained ininstructions as well as in other components of a class file and in \cpentries themselves.Methods and fields contain a signature, symbolically defining theirtypes. Access flags like \texttt{public static final} occur inseveral places and are encoded by an integer bit mask, e.g.\texttt{public static final} matches to the Java expression\begin{verbatim} int access_flags = ACC_PUBLIC | ACC_STATIC | ACC_FINAL;\end{verbatim}As mentioned in section \ref{sec:format} already, several componentsmay contain \emph{attribute} objects: classes, fields, methods, and\texttt{Code} objects (introduced in section \ref{sec:code2}). Thelatter is an attribute itself that contains the actual byte codearray, the maximum stack size, the number of local variables, a tableof handled exceptions, and some optional debugging information codedas \texttt{LineNumberTable} and \texttt{LocalVariableTable}attributes. Attributes are in general specific to some data structure,i.e. no two components share the same kind of attribute, though thisis not explicitly forbidden. In the figure the \texttt{Attribute}classes are marked with the component they belong to.\subsection{Class repository}Using the provided \texttt{Repository} class, reading class files intoa \texttt{JavaClass} object is quite simple:\begin{verbatim} JavaClass clazz = Repository.lookupClass("java.lang.String");\end{verbatim}The repository also contains methods providing the dynamic equivalentof the \texttt{instanceof} operator, and other useful routines:\begin{verbatim} if(Repository.instanceOf(clazz, super_class) { ... }\end{verbatim}\subsubsection{Accessing class file data}Information within the class file components may be accessed like JavaBeans via intuitive set/get methods. All of them also define a\texttt{toString()} method so that implementing a simple class vieweris very easy. In fact all of the examples used here have been producedthis way:{\small \begin{verbatim} System.out.println(clazz); printCode(clazz.getMethods()); ... public static void printCode(Method[] methods) { for(int i=0; i < methods.length; i++) { System.out.println(methods[i]); Code code = methods[i].getCode(); if(code != null) // Non-abstract method System.out.println(code); } }\end{verbatim}}\subsubsection{Analyzing class data}Last but not least, \jc supports the \emph{Visitor} designpattern \cite{design}, so one can write visitor objects to traverseand analyze the contents of a class file. Included in the distributionis a class \texttt{JasminVisitor} that converts class files into theJasmin assembler language \cite{jasmin}.\subsection{ClassGen}\label{sec:cgapi}This part of the API (package \path{ork.apache.bcel.generic}) suppliesan abstraction level for creating or transforming class filesdynamically. It makes the static constraints of Java class files likethe hard-coded byte code addresses generic. The generic \cpe, forexample, is implemented by the class \texttt{ConstantPoolGen} whichoffers methods for adding different types of constants. Accordingly,\texttt{ClassGen} offers an interface to add methods, fields, andattributes. Figure \ref{fig:umlcg} gives an overview of this part ofthe API.\begin{figure}[htbp] \begin{center} \leavevmode \epsfysize0.93\textheight \epsfbox{eps/classgen.eps} \caption{UML diagram of the ClassGen API}\label{fig:umlcg} \end{center}\end{figure}\subsubsection{Types}We abstract from the concrete details of the type signature syntax(see \ref{sec:types}) by introducing the \texttt{Type} class, which isused, for example, by methods to define their return and argumenttypes. Concrete sub-classes are \texttt{BasicType},\texttt{ObjectType}, and \texttt{ArrayType} which consists of theelement type and the number of dimensions. For commonly used types theclass offers some predefined constants. For example the methodsignature of the \texttt{main} method as shown in section\ref{sec:types} is represented by:\begin{verbatim} Type return_type = Type.VOID; Type[] arg_types = new Type[] { new ArrayType(Type.STRING, 1) };\end{verbatim}\texttt{Type} also contains methods to convert types into textualsignatures and vice versa. The sub-classes contain implementations ofthe routines and constraints specified by the Java LanguageSpecification \cite{gosling}.\subsubsection{Generic fields and methods}Fields are represented by \texttt{FieldGen} objects, which may befreely modified by the user. If they have the access rights\texttt{static final}, i.e. are constants and of basic type, they mayoptionally have an initializing value.Generic methods contain methods to add exceptions the method maythrow, local variables, and exception handlers. The latter two arerepresented by user-configurable objects as well. Because exceptionhandlers and local variables contain references to byte codeaddresses, they also take the role of an \emph{instruction targeter}in our terminology. Instruction targeters contain a method\texttt{updateTarget()} to redirect a reference. Generic(non-abstract) methods refer to \emph{instruction lists} that consistof instruction objects. References to byte code addresses areimplemented by handles to instruction objects. This is explained inmore detail in the following sections.The maximum stack size needed by the method and the maximum number oflocal variables used may be set manually or computed via the\texttt{setMaxStack()} and \texttt{setMaxLocals()} methodsautomatically.\subsubsection{Instructions}Modeling instructions as objects may look somewhat odd at first sight,but in fact enables programmers to obtain a high-level view uponcontrol flow without handling details like concrete byte code offsets.Instructions consist of a tag, i.e. an opcode, their length in bytesand an offset (or index) within the byte code. Since many instructionsare immutable, the \texttt{InstructionConstants} interface offersshareable predefined ``fly-weight'' constants to use.Instructions are grouped via sub-classing, the type hierarchy ofinstruction classes is illustrated by (incomplete) figure\ref{fig:umlinstr} in the appendix. The most important family ofinstructions are the \emph{branch instructions}, e.g. \texttt{goto},that branch to targets somewhere within the byte code. Obviously,this makes them candidates for playing an \texttt{InstructionTargeter}role, too. Instructions are further grouped by the interfaces theyimplement, there are, e.g., \texttt{TypedInstruction}s that areassociated with a specific type like \texttt{ldc}, or\texttt{ExceptionThrower} instructions that may raise exceptions whenexecuted.All instructions can be traversed via \texttt{accept(Visitor v)} methods,i.e., the Visitor design pattern. There is however some special trickin these methods that allows to merge the handling of certaininstruction groups. The \texttt{accept()} do not only call thecorresponding \texttt{visit()} method, but call \texttt{visit()}methods of their respective super classes and implemented interfacesfirst, i.e. the most specific \texttt{visit()} call is last. Thus onecan group the handling of, say, all \texttt{BranchInstruction}s intoone single method.For debugging purposes it may even make sense to ``invent'' your owninstructions. In a sophisticated code generator like the one used as abackend of the Barat framework \cite{barat} one often has to inserttemporary \texttt{nop} (No operation) instructions. When examiningthe produced code it may be very difficult to track back where the\texttt{nop} was actually inserted. One could think of a derived\texttt{nop2} instruction that contains additional debugginginformation. When the instruction list is dumped to byte code, theextra data is simply dropped.One could also think of new byte code instructions operating oncomplex numbers that are replaced by normal byte code upon load-timeor are recognized by a new JVM.\subsubsection{Instruction lists}\label{sec:il}An \emph{instruction list} is implemented by a list of\emph{instruction handles} encapsulating instruction objects.References to instructions in the list are thus not implemented bydirect pointers to instructions but by pointers to instruction\emph{handles}. This makes appending, inserting and deleting areas ofcode very simple. Since we use symbolic references, computation ofconcrete byte code offsets does not need to occur until finalization,i.e. until the user has finished the process of generating ortransforming code. We will use the term instruction handle andinstruction synonymously throughout the rest of the paper.Instruction handles may contain additional user-defined data using the\texttt{addAttribute()} method.\paragraph{Appending.}One can append instructions or other instruction lists anywhere to anexisting list. The instructions are appended after the giveninstruction handle. All append methods return a new instructionhandle which may then be used as the target of a branch instruction,e.g..{\small \begin{verbatim} InstructionList il = new InstructionList(); ... GOTO g = new GOTO(null); il.append(g); ... InstructionHandle ih = il.append(InstructionConstants.ACONST_NULL); g.setTarget(ih);\end{verbatim}}\paragraph{Inserting.}Instructions may be inserted anywhere into an existing list. They areinserted before the given instruction handle. All insert methodsreturn a new instruction handle which may then be used as the startaddress of an exception handler, for example.{\small \begin{verbatim} InstructionHandle start = il.insert(insertion_point, InstructionConstants.NOP); ... mg.addExceptionHandler(start, end, handler, "java.io.IOException");\end{verbatim}}\paragraph{Deleting.}Deletion of instructions is also very straightforward; all instructionhandles and the contained instructions within a given range areremoved from the instruction list and disposed. The \texttt{delete()}method may however throw a \texttt{TargetLostException} when there areinstruction targeters still referencing one of the deletedinstructions. The user is forced to handle such exceptions in a\texttt{try-catch} block and redirect these references elsewhere. The\emph{peep hole} optimizer described in section \ref{sec:nop} gives adetailed example for this.{\small \begin{verbatim} try { il.delete(first, last); } catch(TargetLostException e) { InstructionHandle[] targets = e.getTargets(); for(int i=0; i < targets.length; i++) { InstructionTargeter[] targeters = targets[i].getTargeters(); for(int j=0; j < targeters.length; j++) targeters[j].updateTarget(targets[i], new_target); } }\end{verbatim}}\paragraph{Finalizing.}When the instruction list is ready to be dumped to pure byte code, allsymbolic references must be mapped to real byte code offsets. This isdone by the \texttt{getByteCode()} method which is called by defaultby \texttt{MethodGen.getMethod()}. Afterwards you should call\texttt{dispose()} so that the instruction handles can be reusedinternally. This helps to reduce memory usage.\begin{verbatim} InstructionList il = new InstructionList(); ClassGen cg = new ClassGen("HelloWorld", "java.lang.Object", "<generated>", ACC_PUBLIC | ACC_SUPER,
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -