📄 manual.tex
字号:
\documentclass[12pt,twoside]{article}\usepackage{epsf,a4wide,moreverb,url}\usepackage{palatino}\newcommand\jc{{\sffamily BCEL }}\newcommand\cp{{constant pool }}\newcommand\cpe{constant pool}\newcommand\jvm{{Java Virtual Machine }}\newcommand\jvme{{Java Virtual Machine}}\newcommand\vm{{Virtual Machine }}\newcommand\href[2]{#2}\begin{document}\title{Byte Code Engineering Library (BCEL)\\ Description and usage manual\\ {\small \textbf{Version 1.0}}}\author{{\Large Markus Dahm}\\\\ \href{mailto:markus.dahm@inf.fu-berlin.de}{\texttt{markus.dahm@berlin.de}}}\maketitle%\tableofcontents\begin{abstract}Extensions and improvements of the programming language Java and itsrelated execution environment (Java Virtual Machine, JVM) are thesubject of a large number of research projects and proposals. Thereare projects, for instance, to add parameterized types to Java, toimplement ``Aspect-Oriented Programming'', to perform sophisticatedstatic analysis, and to improve the run-time performance.Since Java classes are compiled into portable binary class files(called \emph{byte code}), it is the most convenient andplatform-independent way to implement these improvements not bywriting a new compiler or changing the JVM, but by transforming thebyte code. These transformations can either be performed aftercompile-time, or at load-time. Many programmers are doing this byimplementing their own specialized byte code manipulation tools, whichare, however, restricted in the range of their re-usability.To deal with the necessary class file transformations, we introduce anAPI that helps developers to conveniently implement theirtransformations.\end{abstract}\section{Introduction}\label{sec:intro}The Java language \cite{gosling} has become very popular and manyresearch projects deal with further improvements of the language orits run-time behavior. The possibility to extend a language with newconcepts is surely a desirable feature, but implementation issuesshould be hidden from the user. Fortunately, the concepts of the \jvmpermit the user-transparent implementation of such extensions withrelatively little effort.Because the target language of Java is an interpreted language with asmall and easy-to-understand set of instructions (the \emph{bytecode}), developers can implement and test their concepts in a veryelegant way. One can write a plug-in replacement for the system'sclass loader which is responsible for dynamically loading class filesat run-time and passing the byte code to the \vm (see section\ref{sec:classloaders}). Class loaders may thus be used to interceptthe loading process and transform classes before they get actuallyexecuted by the JVM \cite{classloader}. While the original classfiles always remain unaltered, the behavior of the class loader may bereconfigured for every execution or instrumented dynamically.The \jc API (Byte Code Engineering Library), formerly known asJavaClass, is a toolkit for the static analysis and dynamic creationor transformation of Java class files. It enables developers toimplement the desired features on a high level of abstraction withouthandling all the internal details of the Java class file format andthus re-inventing the wheel every time. \jc is written entirely inJava and freely available under the terms of the Apache SoftwareLicense. \footnote{The distribution is available at \url{http://jakarta.apache.org/bcel/}, including several code examples and javadoc manuals. }This paper is structured as follows: We give a brief description ofthe \jvm and the class file format in section \ref{sec:jvm}. Section\ref{sec:api} introduces the \jc API. Section \ref{sec:application}describes some typical application areas and example projects. Theappendix contains code examples that are to long to be presented inthe main part of this paper. All examples are included in thedown-loadable distribution.\subsection{Related work}There are a number of proposals and class libraries that have somesimilarities with \textsc{BCEL}: The JOIE \cite{joie} toolkit canbe used to instrument class loaders with dynamic behavior. Similarly,``Binary Component Adaptation'' \cite{bca} allows components to beadapted and evolved on-the-fly. Han Lee's ``Byte-code InstrumentingTool'' \cite{bit} allows the user to insert calls to analysis methodsanywhere in the byte code. The Jasmin language \cite{jasmin} can beused to hand-write or generate pseudo-assembler code. D-Java\cite{classfile} and JCF \cite{inside} are class viewing tools.In contrast to these projects, \jc is intended to be a general purposetool for ``byte code engineering''. It gives full control to thedeveloper on a high level of abstraction and is not restricted to anyparticular application area.\section{The Java Virtual Machine}\label{sec:jvm}Readers already familiar with the \jvm and the Java class file formatmay want to skip this section and proceed with section \ref{sec:api}.Programs written in the Java language are compiled into a portablebinary format called \emph{byte code}. Every class is represented bya single class file containing class related data and byte codeinstructions. These files are loaded dynamically into an interpreter(\jvme, JVM) and executed.Figure \ref{fig:jvm} illustrates the procedure of compiling andexecuting a Java class: The source file (\texttt{HelloWorld.java}) iscompiled into a Java class file (\texttt{HelloWorld.class}), loaded bythe byte code interpreter and executed. In order to implementadditional features, researchers may want to transform class files(drawn with bold lines) before they get actually executed. Thisapplication area is one of the main issues of this article.\begin{figure}[htbp] \begin{center} \leavevmode \epsfxsize\textwidth \epsfbox{eps/jvm.eps} \caption{Compilation and execution of Java classes} \label{fig:jvm} \end{center}\end{figure}Note that the use of the general term ``Java'' implies two meanings:on the one hand, Java as a programming language is meant, on the otherhand, the Java Virtual Machine, which is not necessarily targeted bythe Java language exclusively, but may be used by other languages aswell (e.g. Eiffel \cite{eiffel}, or Ada \cite{ada}). We assume thereader to be familiar with the Java language and to have a generalunderstanding of the Virtual Machine.\subsection{Java class file format}\label{sec:format}Giving a full overview of the design issues of the Java class fileformat and the associated byte code instructions is beyond the scopeof this paper. We will just give a brief introduction covering thedetails that are necessary for understanding the rest of thispaper. The format of class files and the byte code instruction set aredescribed in more detail in the ``\jvm Specification'' \cite{jvm}\footnote{Also available online at\url{http://www.javasoft.com/docs/books/vmspec/index.html}}, and in\cite{jasmin}. Especially, we will not deal with the securityconstraints that the \jvm has to check at run-time, i.e. the byte codeverifier.Figure \ref{fig:classfile} shows a simplified example of the contentsof a Java class file: It starts with a header containing a ``magicnumber'' (\texttt{0xCAFEBABE}) and the version number, followed by the\emph{\cpe}, which can be roughly thought of as the text segment of anexecutable, the \emph{access rights} of the class encoded by a bitmask, a list of interfaces implemented by the class, lists containingthe fields and methods of the class, and finally the \emph{classattributes}, e.g. the \texttt{SourceFile} attribute telling the nameof the source file. Attributes are a way of putting additional,e.g. user-defined, information into class file data structures. Forexample, a custom class loader may evaluate such attribute data inorder to perform its transformations. The JVM specification declaresthat unknown, i.e. user-defined attributes must be ignored by any \vmimplementation.\begin{figure}[htbp] \begin{center} \leavevmode \epsfxsize\textwidth \epsfbox{eps/classfile.eps} \caption{Java class file format} \label{fig:classfile} \end{center}\end{figure}Because all of the information needed to dynamically resolve thesymbolic references to classes, fields and methods at run-time iscoded with string constants, the \cp contains in fact the largestportion of an average class file, approximately 60\% \cite{statistic}.The byte code instructions themselves just make up 12\%.The right upper box shows a ``zoomed'' excerpt of the \cpe, while therounded box below depicts some instructions that are contained withina method of the example class. These instructions represent thestraightforward translation of the well-known statement:\begin{verbatim} System.out.println("Hello, world");\end{verbatim}The first instruction loads the contents of the field \texttt{out} ofclass \texttt{java.lang.System} onto the operand stack. This is aninstance of the class \texttt{java.io.PrintStream}. The \texttt{ldc}(``Load constant'') pushes a reference to the string "Hello world" onthe stack. The next instruction invokes the instance method\texttt{println} which takes both values as parameters (Instancemethods always implicitly take an instance reference as their firstargument).Instructions, other data structures within the class file andconstants themselves may refer to constants in the \cpe. Suchreferences are implemented via fixed indexes encoded directly into theinstructions. This is illustrated for some items of the figureemphasized with a surrounding box.For example, the \texttt{invokevirtual} instruction refers to a\texttt{MethodRef} constant that contains information about the nameof the called method, the signature (i.e. the encoded argument andreturn types), and to which class the method belongs. In fact, asemphasized by the boxed value, the \texttt{MethodRef} constant itselfjust refers to other entries holding the real data, e.g. it refers toa \texttt{ConstantClass} entry containing a symbolic reference to theclass \texttt{java.io.PrintStream}. To keep the class file compact,such constants are typically shared by different instructions.Similarly, a field is represented by a \texttt{Fieldref} constant thatincludes information about the name, the type and the containing classof the field.The \cp basically holds the following types of constants: Referencesto methods, fields and classes, strings, integers, floats, longs, anddoubles.\subsection{Byte code instruction set}\label{sec:code}The JVM is a stack-oriented interpreter that creates a local stackframe of fixed size for every method invocation. The size of the localstack has to be computed by the compiler. Values may also be storedintermediately in a frame area containing \emph{local variables} whichcan be used like a set of registers. These local variables arenumbered from 0 to 65535, i.e. you have a maximum of 65536 of localvariables. The stack frames of caller and callee method areoverlapping, i.e. the caller pushes arguments onto the operand stackand the called method receives them in local variables.The byte code instruction set currently consists of 212 instructions,44 opcodes are marked as reserved and may be used for futureextensions or intermediate optimizations within the VirtualMachine. The instruction set can be roughly grouped as follows:\begin{description}\item[Stack operations:] Constants can be pushed onto the stack eitherby loading them from the \cp with the \texttt{ldc} instruction or withspecial ``short-cut'' instructions where the operand is encoded intothe instructions, e.g. \texttt{iconst\_0} or \texttt{bipush} (pushbyte value).\item[Arithmetic operations:] The instruction set of the \jvmdistinguishes its operand types using different instructions tooperate on values of specific type. Arithmetic operations startingwith \texttt{i}, for example, denote an integer operation. E.g.,\texttt{iadd} that adds two integers and pushes the result back on thestack. The Java types \texttt{boolean}, \texttt{byte},\texttt{short}, and \texttt{char} are handled as integers by the JVM.\item[Control flow:] There are branch instructions like \texttt{goto}and \texttt{if\_icmpeq}, which compares two integers forequality. There is also a \texttt{jsr} (jump sub-routine) and\texttt{ret} pair of instructions that is used to implement the\texttt{finally} clause of \texttt{try-catch} blocks. Exceptions maybe thrown with the \texttt{athrow} instruction.Branch targets are coded as offsets from the current byte codeposition, i.e. with an integer number.
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -