📄 manual.tex

📁 一个用于对.class文件进行插桩的开源工具
💻 TEX
📖 第 1 页 / 共 4 页
字号:
integer  index as argument.  Indexes to  the \cp  may be  contained ininstructions as well as in other components of a class file and in \cpentries themselves.Methods and  fields contain  a signature, symbolically  defining theirtypes.   Access  flags  like  \texttt{public static  final}  occur  inseveral  places  and  are  encoded   by  an  integer  bit  mask,  e.g.\texttt{public static final} matches to the Java expression\begin{verbatim}  int access_flags = ACC_PUBLIC | ACC_STATIC | ACC_FINAL;\end{verbatim}As mentioned in section \ref{sec:format} already, several componentsmay contain \emph{attribute} objects: classes, fields, methods, and\texttt{Code} objects (introduced in section \ref{sec:code2}).  Thelatter is an attribute itself that contains the actual byte codearray, the maximum stack size, the number of local variables, a tableof handled exceptions, and some optional debugging information codedas \texttt{LineNumberTable} and \texttt{LocalVariableTable}attributes. Attributes are in general specific to some data structure,i.e. no two components share the same kind of attribute, though thisis not explicitly forbidden. In the figure the \texttt{Attribute}classes are marked with the component they belong to.\subsection{Class repository}Using the provided \texttt{Repository} class, reading class files intoa \texttt{JavaClass} object is quite simple:\begin{verbatim}  JavaClass clazz = Repository.lookupClass("java.lang.String");\end{verbatim}The repository also contains methods providing the dynamic equivalentof the \texttt{instanceof} operator, and other useful routines:\begin{verbatim}  if(Repository.instanceOf(clazz, super_class) {    ...  }\end{verbatim}\subsubsection{Accessing class file data}Information within the class file components may be accessed like JavaBeans via intuitive set/get methods.  All of them also define a\texttt{toString()} method so that implementing a simple class vieweris very easy. In fact all of the examples used here have been producedthis way:{\small \begin{verbatim}  System.out.println(clazz);  printCode(clazz.getMethods());  ...  public static void printCode(Method[] methods) {    for(int i=0; i < methods.length; i++) {      System.out.println(methods[i]);      Code code = methods[i].getCode();      if(code != null) // Non-abstract method        System.out.println(code);    }  }\end{verbatim}}\subsubsection{Analyzing class data}Last but not least, \jc supports the \emph{Visitor} designpattern \cite{design},  so one can write visitor  objects to traverseand analyze the contents of a class file. Included in the distributionis a  class \texttt{JasminVisitor} that converts class  files into theJasmin assembler language \cite{jasmin}.\subsection{ClassGen}\label{sec:cgapi}This part of the API (package \path{ork.apache.bcel.generic}) suppliesan abstraction level for creating or transforming class filesdynamically.  It makes the static constraints of Java class files likethe hard-coded byte code addresses generic.  The generic \cpe, forexample, is implemented by the class \texttt{ConstantPoolGen} whichoffers methods for adding different types of constants.  Accordingly,\texttt{ClassGen} offers an interface to add methods, fields, andattributes.  Figure \ref{fig:umlcg} gives an overview of this part ofthe API.\begin{figure}[htbp]  \begin{center}    \leavevmode    \epsfysize0.93\textheight    \epsfbox{eps/classgen.eps}    \caption{UML diagram of the ClassGen API}\label{fig:umlcg}  \end{center}\end{figure}\subsubsection{Types}We abstract from the concrete details of the type signature syntax(see \ref{sec:types}) by introducing the \texttt{Type} class, which isused, for example, by methods to define their return and argumenttypes.  Concrete sub-classes are \texttt{BasicType},\texttt{ObjectType}, and \texttt{ArrayType} which consists of theelement type and the number of dimensions. For commonly used types theclass offers some predefined constants.  For example the methodsignature of the \texttt{main} method as shown in section\ref{sec:types} is represented by:\begin{verbatim}  Type   return_type = Type.VOID;  Type[] arg_types   = new Type[] { new ArrayType(Type.STRING, 1) };\end{verbatim}\texttt{Type} also contains methods to convert types into textualsignatures and vice versa. The sub-classes contain implementations ofthe routines and constraints specified by the Java LanguageSpecification \cite{gosling}.\subsubsection{Generic fields and methods}Fields  are represented  by  \texttt{FieldGen} objects,  which may  befreely  modified  by  the  user.   If  they  have  the  access  rights\texttt{static final}, i.e. are constants  and of basic type, they mayoptionally have an initializing value.Generic  methods contain  methods  to add  exceptions  the method  maythrow,  local variables, and  exception handlers.  The latter  two arerepresented by  user-configurable objects as  well.  Because exceptionhandlers  and   local  variables  contain  references   to  byte  codeaddresses, they  also take the role of  an \emph{instruction targeter}in   our  terminology.    Instruction  targeters   contain   a  method\texttt{updateTarget()}    to   redirect    a    reference.    Generic(non-abstract) methods refer  to \emph{instruction lists} that consistof  instruction  objects.   References  to  byte  code  addresses  areimplemented by  handles to instruction  objects. This is  explained inmore detail in the following sections.The maximum stack size needed by the method and the maximum number oflocal variables used may be set manually or computed via the\texttt{setMaxStack()} and \texttt{setMaxLocals()} methodsautomatically.\subsubsection{Instructions}Modeling instructions as objects may look somewhat odd at first sight,but in fact enables programmers to obtain a high-level view uponcontrol flow without handling details like concrete byte code offsets.Instructions consist of a tag, i.e. an opcode, their length in bytesand an offset (or index) within the byte code. Since many instructionsare immutable, the \texttt{InstructionConstants} interface offersshareable predefined ``fly-weight'' constants to use.Instructions are grouped via sub-classing, the type hierarchy ofinstruction classes is illustrated by (incomplete) figure\ref{fig:umlinstr} in the appendix.  The most important family ofinstructions are the \emph{branch instructions}, e.g.  \texttt{goto},that branch to targets somewhere within the byte code.  Obviously,this makes them candidates for playing an \texttt{InstructionTargeter}role, too. Instructions are further grouped by the interfaces theyimplement, there are, e.g., \texttt{TypedInstruction}s that areassociated with a specific type like \texttt{ldc}, or\texttt{ExceptionThrower} instructions that may raise exceptions whenexecuted.All instructions can be traversed via \texttt{accept(Visitor v)} methods,i.e., the Visitor design pattern. There is however some special trickin these methods that allows to merge the handling of certaininstruction groups. The \texttt{accept()} do not only call thecorresponding \texttt{visit()} method, but call \texttt{visit()}methods of their respective super classes and implemented interfacesfirst, i.e. the most specific \texttt{visit()} call is last. Thus onecan group the handling of, say, all \texttt{BranchInstruction}s intoone single method.For debugging purposes  it may even make sense  to ``invent'' your owninstructions. In a sophisticated code generator like the one used as abackend of  the Barat framework  \cite{barat} one often has  to inserttemporary  \texttt{nop} (No  operation) instructions.   When examiningthe produced  code it may  be very difficult  to track back  where the\texttt{nop}  was actually  inserted.  One  could think  of  a derived\texttt{nop2}   instruction   that   contains   additional   debugginginformation. When  the instruction  list is dumped  to byte  code, theextra data is simply dropped.One  could also  think  of  new byte  code  instructions operating  oncomplex numbers that  are replaced by normal byte  code upon load-timeor are recognized by a new JVM.\subsubsection{Instruction lists}\label{sec:il}An \emph{instruction list} is implemented by a list of\emph{instruction handles} encapsulating instruction objects.References to instructions in the list are thus not implemented bydirect pointers to instructions but by pointers to instruction\emph{handles}. This makes appending, inserting and deleting areas ofcode very simple. Since we use symbolic references, computation ofconcrete byte code offsets does not need to occur until finalization,i.e.  until the user has finished the process of generating ortransforming code.  We will use the term instruction handle andinstruction synonymously throughout the rest of the paper.Instruction handles may contain additional user-defined data using the\texttt{addAttribute()} method.\paragraph{Appending.}One can append instructions or  other instruction lists anywhere to anexisting  list.   The  instructions   are  appended  after  the  giveninstruction  handle.   All append  methods  return  a new  instructionhandle which may  then be used as the target  of a branch instruction,e.g..{\small \begin{verbatim}  InstructionList il = new InstructionList();  ...  GOTO g = new GOTO(null);  il.append(g);  ...  InstructionHandle ih = il.append(InstructionConstants.ACONST_NULL);  g.setTarget(ih);\end{verbatim}}\paragraph{Inserting.}Instructions may be  inserted anywhere into an existing  list.  They areinserted  before the  given  instruction handle.   All insert  methodsreturn a  new instruction handle which  may then be used  as the startaddress of an exception handler, for example.{\small \begin{verbatim}  InstructionHandle start = il.insert(insertion_point,                                      InstructionConstants.NOP);  ...  mg.addExceptionHandler(start, end, handler, "java.io.IOException");\end{verbatim}}\paragraph{Deleting.}Deletion of instructions is also very straightforward; all instructionhandles and the contained instructions within a given range areremoved from the instruction list and disposed.  The \texttt{delete()}method may however throw a \texttt{TargetLostException} when there areinstruction targeters still referencing one of the deletedinstructions.  The user is forced to handle such exceptions in a\texttt{try-catch} block and redirect these references elsewhere. The\emph{peep hole} optimizer described in section \ref{sec:nop} gives adetailed example for this.{\small \begin{verbatim}  try {    il.delete(first, last);  } catch(TargetLostException e) {    InstructionHandle[] targets = e.getTargets();    for(int i=0; i < targets.length; i++) {      InstructionTargeter[] targeters = targets[i].getTargeters();      for(int j=0; j < targeters.length; j++)         targeters[j].updateTarget(targets[i], new_target);    }  }\end{verbatim}}\paragraph{Finalizing.}When the instruction list is ready to be dumped to pure byte code, allsymbolic references must be mapped to real byte code offsets.  This isdone by the \texttt{getByteCode()} method which is called by defaultby \texttt{MethodGen.getMethod()}. Afterwards you should call\texttt{dispose()} so that the instruction handles can be reusedinternally. This helps to reduce memory usage.\begin{verbatim}  InstructionList il = new InstructionList();  ClassGen  cg = new ClassGen("HelloWorld", "java.lang.Object",                              "<generated>", ACC_PUBLIC | ACC_SUPER,
⌨️ 快捷键说明

复制代码 Ctrl + C
搜索代码 Ctrl + F
全屏模式 F11
切换主题 Ctrl + Shift + D
显示快捷键 ?
增大字号 Ctrl + =
减小字号 Ctrl + -