📄 mal_instruction.mx
字号:
@' The contents of this file are subject to the MonetDB Public License@' Version 1.1 (the "License"); you may not use this file except in@' compliance with the License. You may obtain a copy of the License at@' http://monetdb.cwi.nl/Legal/MonetDBLicense-1.1.html@'@' Software distributed under the License is distributed on an "AS IS"@' basis, WITHOUT WARRANTY OF ANY KIND, either express or implied. See the@' License for the specific language governing rights and limitations@' under the License.@'@' The Original Code is the MonetDB Database System.@'@' The Initial Developer of the Original Code is CWI.@' Portions created by CWI are Copyright (C) 1997-2007 CWI.@' All Rights Reserved.@a M. Kersten@v 1.0@* MonetDB Assembly Language (MAL)The primary textual interface to the Monetdb kernelis a simple, assembly-like language, called MAL. The language reflects the virtual machine architecture around thekernel libraries and has been designed for speed of parsing,ease of analysis, and ease of target compilation by query compilers.The language is not meant as a primary programming language,or scripting language. Such use is even discouraged.Furthermore, a MAL program is considered a specificationof intended computation and data flow behavior. It should beunderstood that its actual evaluation depends on the executionparadigm choosen in the scenario. The program blocks can bothbe interpreted as ordered sequences of assembler instructions,or as a representation of a data-flow graph that should be resolvedin a dataflow driven manner.The language syntax uses a functional style definition of actions andmark those that affect the flow explicitly.Flow of control keywords identify a point to chancethe interpretation paradigm and denote a synchronization point.MAL is the target language for query compilers, such as theSQL and XQuery front-ends. Even simple SQL queries generate a long sequence of MAL instructions.They represent both the administrative actions to ensure binding and transactioncontrol, the flow dependencies to produce the query result,and the steps needed to prepare the result set for delivery tothe front-end.Only when the algebraic structure is too limited (e.g. updates),or the database back-end lacks feasible builtin bulk operators,one has to rely on more detailed flow of control primitives.But even in that case, the basic blocks to be processedby a MAL back-end are considered large, e.g. tens of simplebulk assignment instructions.The remainder of this chapter provide a concise overview of thelanguage features and illustrative examples.@menu* MAL Literals::* MAL Variables::* MAL Instructions::* MAL Flow-of-control::* MAL Functions::* MAL Factories::* MAL Type System::* Boxed variables::* Property Management::@end menu@node MAL Literals, MAL Variables, MAL Reference,MAL Reference@+ MAL Literals Literals in MAL follow the lexical conventions of the programminglanguage C.A default type is attached, e.g. the literal 1 is typedas an @sc{int} value.Likewise, the literal 3.14 is typed @sc{flt} rather than @sc{dbl}.A literal can be coerced to another type by tagging it with a typeclassifier, provided a coercion operation is defined.For example, @sc{1:lng} marks the literal as of type @sc{lng}.and @sc{"1999-12-10":date} creates a @sc{date} literal.MonetDB comes with the hardwired types @sc{bit, bte, chr, wrd, sht, int, lng, oid, flt, dbl, str} and @sc{bat}, the bat identifier.The kernel code has been optimized to deal with these types efficiently,i.e. without unnecessary function call overheadsIn addition, the system supportstemporal types @sc{date, daytime, time, timestamp, timezone},extensions to deal with IPv4 addresses and URLs using @sc{inet, url},and several types to interact more closely with the kernel @sc{lock, semphore}.This list can be extended with user defined types.@node MAL Variables, MAL Instructions, MAL Literals, MAL Reference@+ MAL VariablesVariables are denoted by identifers andimplicitly defined upon first use. They takeon a type through a type classifier or inherit it fromthe context in which they are first used, see @xref{MAL Type System}. Variables are organized into two classes, starting with and withoutan underscore. The latter are reserved as MAL parser tempoaries, whose name aligns with an entry in the symbol table.In general they can not be used in MAL programs, but they may becomevisible in MAL program listings or during debugging.@node MAL Instructions, MAL Flow-of-control, MAL Variables, MAL Reference@+ InstructionsA MAL instruction has purposely a simple format. It is syntactically represented by an assignment, wherean expression (function call) delivers results to multiple target variables. The assignment patterns recognized are illustrated below.@example(t1,..,t32) := module.fcn(a1,..,a32);t1 := module.fcn(a1,..,a32);t1 := v1 operator v2;t1 := literal;(t1,..,tn) := (a1,..,an);@end exampleOperators are grouped into user defined modules.Ommission of the module name is interpreter as the @sc{user} module.Simple binary arithmetic operations are merely provided as a short-hand,e.g. the expression @sc{t:=2+2} is converted directly into @sc{t:= calc.+(2,2)}.Target variables are optional. The compiler introduces temporary variables to hold the result of the expression upon need. They won't show up when you list the MAL program unless itis used elsewhere.For parsing simplicity, each instruction fits on a single line.Comments start with a sharp '#' and continues to the end of the line.They are retained in the internal code representation to easedebugging of compiler generated MAL programs.The data structure to represent a MAL block is kept simple.It carries a sequence of MAL statements and a symbol table.The MAL instruction record is a code byte string overlaid with theinstruction pattern, which contains references into the symbol tablesand administrative data for the interpreter. This method leads to a large allocated block, which can be easily freed.Variable- and statement- block together describe thestatic part of a MAL procedure. It carries enough information to produce a listing and to aid symbolic debugging.@node MAL Flow-of-control, MAL Functions, MAL Instructions, MAL Reference@+ MAL Flow-of-controlThe flow of control within a MAL program block can be changed bytagging a statement with either @sc{return}, @sc{yield},@sc{barrier}, @sc{catch}, @sc{leave}, @sc{redo}, or @sc{exit}.The flow modifiers @sc{return} and @sc{yield} mark the endof a call and return one or more results to the calling environment.The @sc{return} and @sc{yield} are followed by a target listor an assignment, which is executed first.The @sc{barrier} (@sc{catch}) and @sc{exit} pair mark aguarded statement block. They may be nested to form a properhierarchy identified by their primary target variable,also called the control variable.The @sc{leave} and @sc{redo} are conditional flow modifiers.The control variable is used after the assignment statement hasbeen evaluated to decide on the flow-of-control action to be taken.Built-in controls exists for booleans and numeric values.The barrier block is opened when the control variable holdstrue, when its numeric value >= 0, or when it is a non-emptystring. The @sc{nil} value blocks entry in all cases.Once inside the barrier you have an option to prematurely@sc{leave} it at the exit statementor to @sc{redo} interpretation just after the correspondingbarrier statement. Much like 'break' and 'continue' statementsin the programming language C.The action is taken when the condition is met.The @sc{exit} marks the exit for a block. Its optional assignmentcan be used to re-initialize the barrier control variablesor wrap-up any related administration.The barrier blocks can be properly nested to forma hierarchy of basic blocks.The control flow within and between blocks issimple enough to deal with during an optimizer stage.The @sc{redo} and @sc{leave} statements markthe partial end of a block. Statements within theseblocks can be re-arranged according to the data-flowdependencies. The order of partial blocks can notbe changed that easily. It depends on the mutualexclusion of the data flows within each partial block.Common guarded blocks in imperative languages arethe for-loop and if-then-else constructs.They can be simulated as follows.Consider the statement @sc{for(i=1;i<10;i++) print(i)}.The (optimized) MAL block to implement this becomes:@example i:= 1;barrier B:= i<10; io.print(i); i:= i+1;redo B:= i<10;exit B;@end exampleTranslation of the statement @sc{if(i<1) print("ok"); else print("wrong");}becomes:@example i:=1;barrier ifpart:= i<1; io.print("ok");exit ifpart;barrier elsepart:= i>=1; io.print("wrong");exit elsepart;@end exampleNote that both guarded blocks can be interchanged withoutaffecting the outcome. Moreover, neither block wouldhave been entered if the variable happens to be assigned @sc{nil}.The primitives are sufficient to model a wide variety of iterators,whose pattern look like:@examplebarrier i:= M.newIterator(T); elm:= M.getElement(T,i); ... leave i:= M.noMoreElements(T); ... redo i:= M.hasMoreElements(T);exit i:= M.exitIterator(T);@end exampleThe semantics obeyed by the iterator implementations is as follows.The redo expression updates the target variable @emph{ i} and controlproceeds at the first statement after the barrier when thebarrier is opened by @emph{ i}. If the barrier could not bere-opened, execution proceeds with the first statement afterthe redo.Likewise, the leave control statement skips to the exitwhen the control variable @emph{ i} shows a closed barrier block.Otherwise, it continues with the next instruction.Note, in both failed cases the control variable ispossibly changed.A recurring situation is to iterate over the elements ina BAT. This is supported by an iterator implementation for BATsas follows:@examplebarrier (idx,hd,tl):= bat.newIterator(B); ... redo (idx,hd,tl):= bat.hasMoreElements(B);exit (ids,hd,tl);@end exampleWhere idx is an integer to denote the row in the BAT,hd and tl denote values of the current element.@{@+ The MAL instruction recordsThe data structure to represent a MAL block is kept simple.It carries a sequence of MAL statements and a variable table.Each instruction contains references to elements in thesymbol table.The MAL instruction is a code byte string overlaid with theinstruction pattern. This method leads to a largeallocated block, which can be easily freed, andpattern makes it possible to accommodate a variable argument list.Variable- and stmt- block together describe thestatic part of a MAL procedure. It carries carry enoughinformation to producea listing and to aid symbolic debugging.Ideally, the listing of a MAL program is identical to the source.This costs some space, but will improve readability and permitsinstruction sequences generated internally also to be keptaround as ASCII text for later inclusion.WARNING. The way we lay out the instructions means thatyou can prepare only one instruction at a time, because you don'tknow how many arguments may be needed upfront.[TODO, rather then breaking the original input into pieces,it would be worthwhile to keep track of location of theinstruction in the source.]@h#ifndef _MAL_INSTR_H#define _MAL_INSTR_H#include "mal_type.h"#include "mal_stack.h"#include "mal_properties.h"#define isaSignature(P) ((P)->token >=COMMANDsymbol)#ifdef MALprofiler#ifdef HAVE_SYS_TIMES_H# include <sys/times.h>#endif#endif#define DEBUG_MAL_INSTR#define MAXARG 9#define STMT_INCREMENT 32#define STMT_MAXIMUM 1<<16#define MAXVARS 32typedef struct SYMDEF { struct SYMDEF *peer; /* where to look next */ struct SYMDEF *skip; /* skip to next different symbol */ str name; int kind; struct MALBLK *def; /* the details of the MAL fcn */} *Symbol, SymRecord;typedef struct VARRECORD { str name; /* argname or lexical value repr */ malType type; /* internal type signature */ int gdktype; /* for backend */ bit isaconstant; /* value cannot change */ bit isatypevar; /* denotes a type variable */ bit fixtype; /* the type has been fixed */ bit isudftype; /* type defined in program */ bit cleanup; /* remove upon function return */ bit isused; /* in-out argument to function */ int tmpindex; /* temporary variable */ short scope; /* block id where it is declared */ short depth; /* ... depth in nesting */ short beginLifespan, endLifespan, lastUpdate; /* for optimizers */ PropertySet props; /* private list of (name,value) pairs */ ValRecord value;} *VarPtr, VarRecord;/* type check status is kept around to improve type checking efficiency */#define TYPE_ERROR -1#define TYPE_UNKNOWN 0#define TYPE_DYNAMIC 1#define TYPE_BIND 2#define TYPE_RESOLVED 4#define QUICKCLEANUP 1#define GARBAGECONTROL 2#define VARARGS 1 /* deal with variable arguments */#define VARRETS 2/* all functions return a string */typedef str (*MALfcn) ();typedef struct { bit token; /* instruction type */ bit barrier; /* flow of control modifier takes: BARRIER, LEAVE, REDO, EXIT, CATCH, RAISE*/ bit typechk; /* type check status */ bit gc; /* garbage control flags */ bit polymorphic; /* complex type analysis */ bit varargs; /* variable number of arguments or targets */ MALfcn fcn; /* resolved function address */ struct MALBLK *blk; /* resolved MAL function address */ int jump; /* controlflow program counter */@-The MAL instruction representation. The strings should not be garbage collectedupon destruction of the definition. They are part of the global namespace.@h str modname; /* module context */ str fcnname; /* function name */ short argc, retc, maxarg; /* total and result argument count */ short argv[1]; /* at least one entry */} *InstrPtr, InstrRecord;@-For performance analysis we keep track of the number of callsand the total time spent while executing the instruction.(See mal_profiler.mx)The performance structures are separately administered, becausethey are only used in limited curcumstances.@htypedef struct PERF {#ifdef HAVE_TIMES struct tms timer; /* timing information */#endif time_t clock; /* clock */ lng clk; /* microseconds clock */ long counter; long ticks; /* micro seconds spent */ bit trace; /* facilitate filter-based profiling */} *ProfPtr, ProfRecord;typedef struct MALBLK { str binding; /* related C-function */ str help; /* supportive commentary */ PropertySet props; /* private */ struct MALBLK *alternative; int vtop; /* next free slot */ int vsize; /* size of variable arena */ VarRecord **var; /* Variable table */ int stop; /* next free slot */ int ssize; /* byte size of arena */ InstrPtr *stmt; /* Instruction location */ int errors; /* left over errors */ int typefixed; /* no undetermined instruction */ int flowfixed; /* all flow instructions are fixed */ ProfPtr profiler; struct MALBLK *history;/* of optimizer actions */} *MalBlkPtr, MalBlkRecord;@-Allocation of space assumes a rather exotic number of arguments.Access to module and function name are cast in macros to preparefor separate name space management.@h#define getModuleId(P) (P)->modname#define setModuleId(P,S) (P)->modname= S#define setModuleScope(P,S) {(P)->modname= (S)==NULL?NULL: (S)->name;}#define getFunctionId(P) (P)->fcnname#define setFunctionId(P,S) (P)->fcnname= S#define garbageControl(P) ((P)->gc & GARBAGECONTROL)#define needsCleanup(P) ((P)->gc & QUICKCLEANUP)#define getInstrPtr(M,I) (M)->stmt[I]#define getSignature(S) getInstrPtr((S)->def,0)#define getFcnName(M) getFunctionId(getInstrPtr(M,0))#define getArgCount(M) getInstrPtr(M,0)->argc#define getModName(M) getModuleId(getInstrPtr(M,0))#define getPrgSize(M) (M)->stop#define getVar(M,I) (M)->var[I]#define getVarTmp(M,I) (M)->var[I]->tmpindex#define isConstant(M,I) ((M)->var[I]->isaconstant)#define isTypeVar(M,I) ((M)->var[I]->isatypevar)#define isTmpVar(M,I) (!(M)->var[I]->name || *(M)->var[I]->name == TMPMARKER)#define getVarScope(M,I) ((M)->var[I]->scope)#define getVarProperties(M,I) ((M)->var[I]->props)#define getVarDepth(M,I) ((M)->var[I]->depth)#define isFixed(M,I) ((M)->var[I]->fixtype)#define freezeVarType(M,I) getVar(M,I)->isudftype = 1;#define setFixed(M,I) ((M)->var[I]->fixtype= 1)#define setVarCleanup(M,I) ((M)->var[I]->cleanup)#define isVarGarbage(M,I) ((M)->var[I]->cleanup)#define setVarUsed(M,I,V) ((M)->var[I]->isused= V)#define isVarUsed(M,I) ((M)->var[I]->isused)#define getVarConstant(M,I) ((M)->var[I]->value)#define getVarType(M,I) ((M)->var[I]->type)#define getVarGDKType(M,I) ((M)->var[I]->gdktype)#define getLastUpdate(M,I) ((M)->var[I]->lastUpdate)#define getEndLifespan(M,I) ((M)->var[I]->endLifespan)#define getBeginLifespan(M,I) ((M)->var[I]->beginLifespan)#define getDestVar(P) (P)->argv[0]#define setDestVar(P,X) (P)->argv[0] =X#define setDestType(M,P,V) setVarType((M),getDestVar(P),V)#define getDestType(M,P) destinationType((M),(P))#define getArg(P,I) (P)->argv[I]#define setArg(P,I,R) (P)->argv[I]= R#define getArgName(M,P,I) getVarName((M),(P)->argv[I])#define getArgType(M,P,I) getVarType((M),(P)->argv[I])mal_export InstrPtr newInstruction(MalBlkPtr mb, int kind);mal_export InstrPtr copyInstruction(InstrPtr p);mal_export void oldmoveInstruction(InstrPtr dst, InstrPtr src);mal_export void clrInstruction(InstrPtr p);mal_export void freeInstruction(InstrPtr p);mal_export void clrFunction(InstrPtr p);mal_export Symbol newSymbol(str nme, int kind);mal_export void freeSymbol(Symbol s);mal_export void freeSymbolList(Symbol s);mal_export void printSignature(stream *fd, Symbol s, int flg);mal_export MalBlkPtr newMalBlk(int maxvars, int maxstmts);mal_export void resetMalBlk(MalBlkPtr mb, int stop);mal_export void newMalBlkStmt(MalBlkPtr mb, int maxstmts);mal_export void prepareMalBlk(MalBlkPtr mb, str s);mal_export void freeMalBlk(MalBlkPtr mb);mal_export MalBlkPtr copyMalBlk(MalBlkPtr);mal_export void addtoMalBlkHistory(MalBlkPtr mb);mal_export void showMalBlkHistory(MalBlkPtr mb);mal_export MalBlkPtr getMalBlkHistory(MalBlkPtr mb,int idx);mal_export void expandMalBlk(MalBlkPtr mb, int lines);mal_export void trimMalBlk(MalBlkPtr mb);mal_export void moveInstruction(MalBlkPtr mb, int pc, int target);mal_export void insertInstruction(MalBlkPtr mb, InstrPtr p, int pc);mal_export void removeInstruction(MalBlkPtr mb, InstrPtr p);mal_export void removeInstructionBlock(MalBlkPtr mb, int pc, int cnt);mal_export str operatorName(int i);mal_export int findVariable(MalBlkPtr mb, str name);mal_export int findTmpVariable(MalBlkPtr mb, int type);mal_export int findVariableLength(MalBlkPtr mb, str name, int len);mal_export malType getType(MalBlkPtr mb, str nme);mal_export str getArgDefault(MalBlkPtr mb, InstrPtr p, int idx);mal_export str getVarName(MalBlkPtr mb, int i);mal_export str getRefName(MalBlkPtr mb, int i);mal_export int newVariable(MalBlkPtr mb, str name, malType type);mal_export void renameVariable(MalBlkPtr mb, int i, str name);mal_export void copyVariable(MalBlkPtr dst, MalBlkPtr src, VarPtr v);mal_export void removeVariable(MalBlkPtr mb, int varid);mal_export int newTmpVariable(MalBlkPtr mb, malType type);mal_export int newTmpSink(MalBlkPtr mb, malType type);mal_export int newTypeVariable(MalBlkPtr mb, malType type);mal_export void delVariable(MalBlkPtr mb, int varid);
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -