📄 luajit_intro.html
字号:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd"><html><head><title>Introducing LuaJIT</title><meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1"><meta name="Author" content="Mike Pall"><meta name="Copyright" content="Copyright (C) 2005-2007, Mike Pall"><meta name="Language" content="en"><link rel="stylesheet" type="text/css" href="bluequad.css" media="screen"><link rel="stylesheet" type="text/css" href="bluequad-print.css" media="print"></head><body><div id="site"><a href="http://luajit.org/"><span>Lua<span id="logo">JIT</span></span></a></div><div id="head"><h1>Introducing LuaJIT</h1></div><div id="nav"><ul><li><a href="index.html">Index</a></li><li><a href="luajit.html">LuaJIT</a><ul><li><a href="luajit_features.html">Features</a></li><li><a href="luajit_install.html">Installation</a></li><li><a href="luajit_run.html">Running</a></li><li><a href="luajit_api.html">API Extensions</a></li><li><a class="current" href="luajit_intro.html">Introduction</a></li><li><a href="luajit_performance.html">Performance</a></li><li><a href="luajit_debug.html">Debugging</a></li><li><a href="luajit_changes.html">Changes</a></li></ul></li><li><a href="coco.html">Coco</a><ul><li><a href="coco_portability.html">Portability</a></li><li><a href="coco_api.html">API Extensions</a></li><li><a href="coco_changes.html">Changes</a></li></ul></li><li><a href="dynasm.html">DynASM</a><ul><li><a href="dynasm_features.html">Features</a></li><li><a href="dynasm_examples.html">Examples</a></li></ul></li><li><a href="http://luajit.org/download.html">Download <span class="ext">»</span></a></li></ul></div><div id="main"><p>This is a little essay that tries to answer the question:<em>'So, how does LuaJIT really work?'</em>.</p><p>I tried to avoid going into all the gory details, but at thesame time provide a deep enough explanation, to let you findyour way around LuaJIT's inner workings.</p><p>The learning curve is maybe a little bit steep for newbies andcompiler gurus will certainly fall asleep after two paragraphs.It's difficult to strike a balance here.</p><h2>Acronym Soup</h2><p>As the name says LuaJIT is a <em>Just-In-Time</em> (JIT) compiler.This means that functions are compiled on demand, i.e. when theyare run first. This ensures both a quick application startupand helps to avoid useless work, too. E.g. unused functionsare not compiled at all.</p><p>The other alternative is known as <em>Ahead-Of-Time</em> (AOT)compilation. Here everything is compiled before running any function.This is the classic way for many languages, such as C or C++.</p><p>In fact plain Lua allows you to pre-compile Lua source code intoLua bytecode and store it in a binary file that can be runlater on. This is used only in specific settings (e.g. memory limitedembedded systems), because the Lua bytecode compiler is really fast.The ability to run source files right away is part of what makesa dynamic language (aka scripting language) so powerful.</p><p>JIT compilation has a few other advantages for dynamic languagesthat AOT compilation can only provide with a massive amountof code analysis. More can be found in the literature.One particular advantage is explained later.</p><h2>Quick, JIT — Run!</h2><p>JIT compilation happens mostly invisible. You'll probably nevernotice that a compilation is going on. Part of the secret isthat everything happens in little pieces intermixed with runningthe application itself inbetween. The other part of the secretis that JIT compilation can be made pretty fast.</p><p>Most applications quickly converge to a stable state whereeverything that really needs to be compiled is compiledright away. Only occasional isolated compiles happen later on.</p><p>Even though the name doesn't suggest it, LuaJIT <em>can</em> operatein AOT mode, too. But this is completely under user control(see <a href="luajit_api.html#jit_compile"><tt>jit.compile()</tt></a>)and doesn't happen automatically.</p><p>Unless you have good reason to suspect that AOT compilationmight help for a specific application, I wouldn't bother though.Compilation speed is usually a non-argument, because LuaJITis extremely fast. Compilation times are typically in the<em>microsecond range</em> for individual Lua functions.</p><h2>Starting Up</h2><p>The next few paragraphs may not be exactly breaking news to you,if you are familiar with JIT compilers. Still, please read on,because some terms are introduced that are used later on.</p><p>When you start LuaJIT everything proceeds like in standard Lua:the Lua core is initialized, the standard libraries are loaded andthe command line is analyzed. Then usually the first Lua sourcecode file is loaded and is translated to Lua bytecode. And finallythe function for the initial main chunk is run ...</p><h2>Kicking the Compiler</h2><p>This is where LuaJIT kicks in:</p><p>All Lua functions carry an additional <em>status code</em> for LuaJIT.Initially this is set to 'NONE', i.e. the function has not beenlooked at (yet). If a function is run with this setting,the LuaJIT <em>compiler pipeline</em> is started up.</p><p>If you haven't loaded any special LuaJIT modules and optimizationis not turned on, the compiler pipeline only consists of the<em>compiler backend</em>.</p><p>The compiler backend is the low-level encoding engine that translatesbytecode instructions to machine code instructions. Without anyfurther hints from other modules, the backend more or less does a1:1 translation. I.e. a single variant of a bytecode instructioncorresponds to a single piece of machine code.</p><p>If all goes well, these little code pieces are put together,a function prologue is slapped on and voila: your Lua functionhas been translated to machine code. Of course things are notthat simple when you look closer, but hey — this isthe theory.</p><p>Anyway, the status code for the function is set to 'OK' and themachine code is run. If this function runs another Lua functionwhich has not been compiled, that one is compiled, too. And so on.</p><h2>Call Gates</h2><p>Ok, so what happens when a function is called repeatedly? After allthis is the most common case.</p><p>Simple: The status code is checked again. This time it's set to 'OK',so the machine code can be run directly. Well — that's not thewhole truth: for calls that originate in a JIT compiled functiona better mechanism, tentatively named <em>call gates</em> is used.</p><p>Every function has a call gate field (a function pointer). By defaultit's set to a function that does the above checks and runs thecompiler. But as soon as a function is compiled, the call gateis modified to point to the just compiled machine code.</p><p>Calling a function is then as easy as calling the code that thecall gate points to. But due to special (faster) calling conventionsthis function pointer cannot be used directly from C. So calls froma non-compiled function or from a C function use an extra entrycall gate which in turn calls the real call gate. But this isreally a non-issue since most calls in typical applicationsare intra-JIT calls.</p><h2>The Compiler Pipeline</h2><p>The compiler pipeline has already been mentioned. This soundsmore complicated than it is. Basically this is a coroutine thatruns a <em>frontend</em> function which in turn calls all functionsfrom the <em>pipeline table</em>.</p><p>The pipeline table is sorted by <em>priorities</em>. The standardbackend has priority 0. Positive priorities are run before thebackend and negative priorities are run after the backend. Modulescan dynamically attach or detach themselves to the pipeline withthe library function <tt>jit.attach()</tt>.</p><p>So a typical optimizer pass better have a positive priority,because it needs to be run before the backend is run. E.g. theLuaJIT optimizer module registers itself with priority 50.</p><p>On the other hand a typical helper module for debugging —a machine code disassembler — needs to be run after thebackend and is attached with a negative priority.</p><p>One special case occurs when compilation fails. This can be due toan internal error (ouch) or on purpose. E.g. the optimizer modulechecks some characteristics of the function to be compiled andmay decide that it's just not worth it. In this case a statusother than OK is passed back to the pipeline frontend.</p><p>The easiest thing would be to abort pipeline processing and justgive up. But this would remove the ability to trace the progressof the compiler (which better include failed compilations, too).So there is a special rule that odd priorities are still run,but even priorities are not. That's why e.g. <tt>-j trace</tt>registers itself with priority -99.</p><h2>The Optimizer</h2><p>Maybe it hasn't become clear from the above description,but a module can attach any Lua or C function to the compilerpipeline. In fact all of the loadable modules are Lua modules.Only the backend itself is written in C.</p><p>So, yes — the LuaJIT optimizer is written in pure Lua!</p><p>And no, don't worry, it's quite fast. One reason for this isthat a very simple <em>abstract interpretation</em> algorithmis used. It mostly ignores control flow and/or basic blockboundaries.</p><p>Thus the results of the analysis are really only <em>hints</em>.The backend <em>must</em> check the preconditions (the contracts)for these hints (e.g. the object type). Still, the generatedhints are pretty accurate and quite useful to speed up thecompiled code (see below).</p><p>Explaining how abstract interpretation works is not within thescope for this short essay. You may want to have a look at theoptimizer source code and/or read some articles or books onthis topic. The canonical reference is<a href="http://www2.imm.dtu.dk/~riis/PPA/ppa.html"><span class="ext">»</span> Principles of Program Analysis</a>.Ok, so this one is a bit more on the theoretical side (a grossunderstatement). Try a search engine with the keywords "abstractinterpretation", too.</p><p>Suffice to say the optimizer generates hints and passes theseon to the backend. The backend then decides to encode differentforms for the same bytecode instruction, to combine severalinstructions or to inline code for C functions. If the hintsfrom the optimizer are good, the resulting code will performbetter because shorter code paths are used for the typical cases.</p><h2>The JIT Advantage</h2><p>One important feature of the optimizer is that it takes 'live'function arguments into account. Since the JIT compiler iscalled just before the function is run, the arguments for thisfirst invocation are already present. This can be used to greatadvantage in a <em>dynamically typed language</em>, such as Lua.</p><p>Here's a trivial example:</p><pre>function foo(t, k) return t[k]end</pre><p>Without knowing the most likely arguments for the functionthere's not much to optimize.</p><p>Ok, so 't' is most likely a table. But it could be userdata, too.In fact it could be any type since the introduction of genericmetatables for types.</p><p>And more importantly 'k' can be a number, a stringor any other type. Oh and let's not forget about metamethods ...</p><p>If you know a bit about Lua internals, it should be clear by nowthat the code for this function could potentially branch to halfof the Lua core. And it's of course impossible to inline allthese cases.</p><p>On the other hand if it's <em>known</em> (or there's a good hint)that 't' is a table and that 'k' is a positive integer, then thereis a high likeliness that the key 'k' is in the array partof the table. This lookup can be done with just a few machine codeinstructions.</p><p>Of course the preconditions for this fast path have to be checked(unless there are definitive hints). But if the hints are right,the code runs a lot faster (about a factor of 3 in this casefor the pure table lookup).</p><h2>Optimizing the Optimizer</h2><p>A question that surely popped up in your mind while readingthe above section: does the optimizer optimize itself? I.e.is the optimizer module compiled?</p><p>The current answer is no. Mainly because the compiler pipelineis single-threaded only. It's locked during compilation andany parallel attempt to JIT compile a function results ina 'DELAYED' status code. In fact all modules that attach tothe compiler pipeline disable compilation for the entiremodule (because LuaJIT would do that anyway). The main chunkof modules loaded with <tt>require()</tt> is never compiled,so there is no chicken-and-egg problem here.</p><p>Of course you could do an AOT compilation in the main chunk ofthe optimizer module. But then only with the plain backend.Recompiling it later on with the optimizer attached doesn't work,because a function cannot be compiled twice (I plan to liftthis restriction).</p><p>The other question is whether it pays off to compile the optimizerat all? Honestly, I haven't tried, because the current optimizeris really simple. It runs very quickly, even under the bytecodeinterpreter.</p><h2>That's All Folks</h2><p>Ok, that's all for now. I'll extend this text later on withnew topics that come up in questions. Keep on asking theseon the mailing list if you are interested.</p><p>Thank you for your attention!</p><br class="flush"></div><div id="foot"><hr class="hide">Copyright © 2005-2007 Mike Pall<span class="noprint">·<a href="contact.html">Contact</a></span></div></body></html>
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -