opt_mergetable.mx

来自「一个内存数据库的源代码这是服务器端还有客户端」· MX 代码 · 共 473 行
473 行
@' The contents of this file are subject to the MonetDB Public License@' Version 1.1 (the "License"); you may not use this file except in@' compliance with the License. You may obtain a copy of the License at@' http://monetdb.cwi.nl/Legal/MonetDBLicense-1.1.html@'@' Software distributed under the License is distributed on an "AS IS"@' basis, WITHOUT WARRANTY OF ANY KIND, either express or implied. See the@' License for the specific language governing rights and limitations@' under the License.@'@' The Original Code is the MonetDB Database System.@'@' The Initial Developer of the Original Code is CWI.@' Portions created by CWI are Copyright (C) 1997-2007 CWI.@' All Rights Reserved.@f opt_mergetable@a M. Kersten@- Merge Tables A merge association table (MAT) descriptor defines an ordered collection of type compatible BATs, whose union represents a single (virtual) BAT.The MAT may relate to a partitioned BAT, but could also be an arbitrarycollection of temporary BATs within a program fragment.The MAT definition lives within the scope of a single block.The MAT optimizer simply expands the plan todeal with its components on an individual basis.Only when a blocking or aggregate operator is encounted,the underlying BAT is materialized.The MAT object can not be passed as an argument to any functionwithout first being materialized. Simply becausethe MAT is not known by the type system and none of the lowerlevel operations is currently aware of their existence.In the first approach of the MAT optimizer we assume that the last BAT in the MAT sequence is used as an accumulator.Furthermore, no semantic knowledge is used to reduce thepossible superflous (semi)joins. Instead, we limit expansionto a single argument. The last one in the argument list.This is changed at a later stage when a cost-based evaluationbe decide differently.To illustrate, consider @example	m0:= bat.new(:void,:int);	m1:= bat.new(:void,:int);	m2:= bat.new(:void,:int);	b := mat.new(m0,m1,m2);	s := algebra.select(b,1,3);	i := algebra.count(s);	io.print(s);	io.print(i);	c0 := bat.new(:int,:int);	c1 := bat.new(:int,:int);	c := mat.new(c0,c1);	j := algebra.join(b,s);	io.print(j);@end exampleThe selection and aggregate operations can simply be rewritten into a MAT.@example	_33 := algebra.select(m0,1,3);	_34 := algebra.select(m1,1,3);	_35 := algebra.select(m2,1,3);    s := mat.new(_33,_34,_35);    i := 0;    _36 := aggr.count(_33);    i := calc.+(i,_36);    _37 := aggr.count(_34);    i := calc.+(i,_37);    _38 := aggr.count(_35);    i := calc.+(i,_38);    io.print(i);@end exampleThe print operation does not have MAT semantics yet, whichcalls for collection of the MAT components in a single BAT first.@example    s := mat.pack(_33,_34,_35);    io.print(s);@end exampleFor the join we have to generate all possible combinations,not knowing anything about the properties of the components.The current heuristic is to limit expansion to a singleargument. This leads to @example    b := mat.pack(m0,m1,m2);    _39 := algebra.join(b,c0);    _40 := algebra.join(b,c1);    j := mat.new(_39,_40);@end exampleThe drawback of this scheme is the explosion in MAL statements.It is choosen as a basis for experimentation. In a fullblownsystem we would also use iterators to avoid it.@{Also consider the MAT as a variable property. This is neededto pass a MAT as an argument to functions. If the function doesnot accept a MAT, the argument should be packed, otherwisewe can simply let it pass.We have to add routines to deal with duplicate elimination.Moreover, the other optimizers should become aware of the MAToperations. The operations mat.new() and mat.range() can beremoved as a last step in the optimizer. It is now handy fordebugging.@malpattern optimizer.mergetable():straddress OPTmergetable;pattern optimizer.mergetable(mod:str, fcn:str):straddress OPTmergetablecomment "Resolve the multi-table definitions";@h#ifndef _MAL_MERGETABLE_#define _MAL_MERGETABLE_#include "opt_prelude.h"#include "opt_support.h"#include "mal_builder.h"#include "pbm.h"/* #define DEBUG_OPT_MERGETABLE     show partial result */#endif@c#include "mal_config.h"#include "opt_mergetable.h"static intisMATalias(int idx, int mvar[], int top){	int i;	for(i =0; i<top; i++)		if( mvar[i]== idx) return i;	return -1;}@-Packing the BATs into a single one is handled here@cInstrPtr MATpackAll(MalBlkPtr mb, InstrPtr  r, int m, InstrPtr *mat, int *mvar, int mtop){	int l,k;	for(l=mat[m]->retc; l< mat[m]->argc; l++){		k= isMATalias( getArg(mat[m],l),mvar,mtop);		if( k< 0)			r= pushArgument(mb,r, getArg(mat[m],l));		else			r= MATpackAll(mb,r, k, mat,mvar,mtop);	}	return r;}void MATcollect(int m, InstrPtr *mat, int *mvar, int mtop, int *newvar, int *n){	int i,j,k;	for(i= mat[m]->retc; i<mat[m]->argc; i++)		if( (j= isMATalias( getArg(mat[m],i), mvar, mtop)) >= 0){			for(k=0; k<mtop; k++)			if( mvar[k]== j)				MATcollect(k,mat,mvar,mtop,newvar,n);		} else { newvar[*n]=  getArg(mat[m],i); *n = *n+1;}}intOPTmergetableImplementation(MalBlkPtr mb, MalStkPtr stk, InstrPtr p){	InstrPtr *old=0, q,r, mat[256];	int oldtop,i,j,k,m,mtop=0, mvar[256],tpe;	int size,match,actions=0;#ifdef DEBUG_OPT_MERGETABLE	stream_printf(GDKout,"Start MAT optimizer\n");	printFunction(GDKout, mb, 0);#endif	old = mb->stmt;	oldtop= mb->stop;	size = (mb->stop *1.2 < mb->ssize)? mb->ssize: mb->stop *1.2;	mb->stmt = (InstrPtr *) GDKzalloc(size  * sizeof(InstrPtr));	mb->ssize = size ;	mb->stop = 0;	for( i=0; i<oldtop; i++){		p= old[i];		if( (getModuleId(p)== matRef && getFunctionId(p)==newRef) ||			(getModuleId(p)== matRef && getFunctionId(p)==packRef) ){			mat[mtop]= p;			mvar[mtop] = getArg(p,0);			mtop++;			pushInstruction(mb,p);			continue;		}@-A mat.expand() operation looks up the components of the partitionedbat and prepares the instruction to expand it.@c		if( (getModuleId(p)== matRef && getFunctionId(p)==expandRef) &&			isConstant(mb,getArg(p,1)) && 			getVarType(mb,getArg(p,1))== TYPE_str){			openBox("pbm");			k= PBMfindPBAT(getVarConstant(mb,getArg(p,1)).val.sval);			if( k< 0){				pushInstruction(mb,p);				continue;			}			pushInstruction(mb,p);			p= copyInstruction(p);			p->argc--;			for(; k>=0; k= partitions[k].next){				r = newInstruction(mb, ASSIGNsymbol);				setModuleId(r,bbpRef);				setFunctionId(r,bindRef);				j= newTmpVariable(mb, partitions[k].type );				getArg(r,0)= j;				pushStr(mb,r,BBPname(partitions[k].bid));				pushInstruction(mb,r);				pushArgument(mb,p,j);			}			getFunctionId(p)=newRef;			pushInstruction(mb,p);			mat[mtop]= p;			mvar[mtop] = getArg(p,0);			mtop++;			actions++;			continue;		}@-If the instruction does not contain MAT references it can be added.Otherwise we have to decide on either packing them or replacement.@c		match=0;		for(j=p->retc; j<p->argc; j++)		if(  isMATalias(getArg(p,j),mvar,mtop) >= 0) {			m=i; match++;		}		if( match== 0 ){			pushInstruction(mb,p);			continue;		}@-Pack all but one MAT argument together.Remove their definitions afterwards.@c		if( match>1 ){			for( k= p->retc; k<p->argc && match>1; k++, match--)			if(	(m=isMATalias(getArg(p,k),mvar,mtop)) >= 0){#ifdef DEBUG_OPT_MERGETABLE				stream_printf(GDKout,"Dependency resolution k=%d\n",k);				printInstruction(GDKout,mb,p,0);#endif				r = newInstruction(mb, ASSIGNsymbol);				setModuleId(r,matRef);				setFunctionId(r,packRef);				getArg(r,0)= getArg(mat[m],0);				r= MATpackAll(mb, r, m, mat, mvar, mtop);				pushInstruction(mb,r);				getArg(p,k)= getArg(r,0);				for(j=m; j<mtop-1; j++){					mat[j]= mat[j+1];					mvar[j]= mvar[j+1];				}				mtop--; 			}			if( match== 0)				pushInstruction(mb,p);		}@-Look at the depth of the MAT definition to limit the explosion.@-		if( MATdepth(m,mat,mvar,mtop)>1){		}Left with an instruction with at most one MAT we can replace it.Not all instructions can be replaced by the sequence. We have togroup them and check for them individually.At first we look for the first argument. Others are dealt withthrough the default case.@c		if(( (getModuleId(p)== algebraRef && getFunctionId(p)== selectRef) ||			(getModuleId(p)== algebraRef && getFunctionId(p)==uselectRef) ||			(getModuleId(p)== algebraRef && getFunctionId(p)==likeselectRef) ||			(getModuleId(p)== batRef && getFunctionId(p)==reverseRef) ||			(getModuleId(p)== batRef && getFunctionId(p)==mirrorRef) ||			(getModuleId(p)== algebraRef && getFunctionId(p)==markTRef) ||			(getModuleId(p)== algebraRef && getFunctionId(p)==kdifferenceRef) ||			(getModuleId(p)== algebraRef && getFunctionId(p)== joinRef) ||			(getModuleId(p)== algebraRef && getFunctionId(p)== semijoinRef) ||			(getModuleId(p)== batRef && getFunctionId(p)== setAccessRef) ||			(getModuleId(p)== batRef && getFunctionId(p)== appendRef)  ||			(getModuleId(p)== batRef && getFunctionId(p)== deleteRef)  ||			getModuleId(p)== calcRef 		)  && (m=isMATalias(getArg(p,1),mvar,mtop)) >= 0){			r = newInstruction(mb, ASSIGNsymbol);			setModuleId(r,matRef);			setFunctionId(r,newRef);			getArg(r,0)= getArg(p,0);			tpe= getArgType(mb,p,0);			for(k=1; k< mat[m]->argc; k++){				if( k<mat[m]->argc-1 &&  (					getFunctionId(p)== insertRef ||					getFunctionId(p)== appendRef ))					continue;				q= copyInstruction(p);				getArg(q,1) = getArg(mat[m],k);				getArg(q,0) = newTmpVariable(mb, tpe);				pushInstruction(mb,q);				r= pushArgument(mb,r,getArg(q,0));			}			if( getFunctionId(p)== insertRef ||				getFunctionId(p)== appendRef ||				getFunctionId(p)== deleteRef )					match=1;			else {				match =0;				for(j=0;j<mtop; j++)					if( mvar[j]== getArg(p,0)) match++;				if( match == 0){					mvar[mtop] = getArg(p,0);					mat[mtop++]= r;					pushInstruction(mb,r);				}			}			if( match)				freeInstruction(r);			actions++;			continue;		} @-The insertions are sent to the last component of the MAT.Selection of the proper componetn based on range descriptors will follow.@c		if(getModuleId(p)== batRef && getFunctionId(p)== insertRef &&		(m=isMATalias(getArg(p,1),mvar,mtop)) >= 0){			getArg(p,1) = getArg(mat[m],mat[m]->argc-1);			pushInstruction(mb,p);			continue;		} @-The MAT argument could also occur as the second one.@c		if( getModuleId(p)==algebraRef && getFunctionId(p)== joinRef &&			(m=isMATalias(getArg(p,2),mvar,mtop)) >= 0){			r = newInstruction(mb, ASSIGNsymbol);			setModuleId(r,matRef);			setFunctionId(r,newRef);			getArg(r,0)= getArg(p,0);			tpe= getArgType(mb,p,0);			for(k=1; k< mat[m]->argc; k++){				q= copyInstruction(p);				getArg(q,2) = getArg(mat[m],k);				getArg(q,0) = newTmpVariable(mb, tpe);				pushInstruction(mb,q);				r= pushArgument(mb,r,getArg(q,0));			}			match =0;			for(j=0;j<mtop; j++)			if( mvar[j]== getArg(p,0)) match++;			if( match == 0){				mvar[mtop] = getArg(p,0);				mat[mtop++]= r;				pushInstruction(mb,r);			}			actions++;			continue;		} @-Handle the rewrite v:=aggr.count(b) and sum()@c		if( ((getModuleId(p)==aggrRef && getFunctionId(p)== countRef) || 			(getModuleId(p)==aggrRef && getFunctionId(p)==sumRef && p->argc==2)) &&			(m=isMATalias(getArg(p,1),mvar,mtop)) >= 0){			r = newInstruction(mb,ASSIGNsymbol);			getArg(r,0)= getArg(p,0);			pushInt(mb,r,0);			pushInstruction(mb,r);			for(k=1; k< mat[m]->argc; k++){				int v= newTmpVariable(mb,TYPE_int);				q= newInstruction(mb,ASSIGNsymbol);				setModuleId(q,aggrRef);				setFunctionId(q,getFunctionId(p));				getArg(q,0)= v;				q= pushArgument(mb,q,getArg(mat[m],k));				pushInstruction(mb,q);				q= newInstruction(mb,ASSIGNsymbol);				setModuleId(q,calcRef);				setFunctionId(q,plusRef);				getArg(q,0)= getArg(r,0);				q= pushArgument(mb,q,getArg(r,0));				q= pushArgument(mb,q,v);				pushInstruction(mb,q);			}			continue;		} @-And the min/max is as easy@c		if( ((getModuleId(p)==aggrRef && getFunctionId(p)== maxRef) || 			(getModuleId(p)==aggrRef && getFunctionId(p)==minRef )) &&			p->argc==3 &&			(m=isMATalias(getArg(p,1),mvar,mtop)) >= 0){			r = newInstruction(mb,ASSIGNsymbol);			getArg(r,0)= getArg(p,0);			pushNil(mb,r,getArgType(mb,p,0));			pushInstruction(mb,r);			for(k=1; k< mat[m]->argc; k++){				int v= newTmpVariable(mb,TYPE_int);				q= newInstruction(mb,ASSIGNsymbol);				setModuleId(q,aggrRef);				setFunctionId(q,getFunctionId(p));				getArg(q,0)= v;				q= pushArgument(mb,q,getArg(mat[m],k));				pushInstruction(mb,q);				q= newInstruction(mb,ASSIGNsymbol);				setModuleId(q,calcRef);				setFunctionId(q,plusRef);				getArg(q,0)= getArg(r,0);				q= pushArgument(mb,q,getArg(r,0));				q= pushArgument(mb,q,v);				pushInstruction(mb,q);			}			continue;		} @-All other instructions should be checked for a MAT dependency.It require the MAT to be materialized. We drop the MATafterwards for further consideration.@c		for( k= p->retc; k<p->argc; k++)		if(	(m=isMATalias(getArg(p,k),mvar,mtop)) >= 0){#ifdef DEBUG_OPT_MERGETABLE			stream_printf(GDKout,"Dependency resolution k=%d\n",k);			printInstruction(GDKout,mb,p,0);#endif			r = newInstruction(mb, ASSIGNsymbol);			setModuleId(r,matRef);			setFunctionId(r,packRef);			getArg(r,0)= getArg(mat[m],0);			r= MATpackAll(mb, r, m, mat, mvar, mtop);			if(	!( getModuleId(p) == matRef && getFunctionId(p)== newRef) )				pushInstruction(mb,r);			else freeInstruction(r);			for(j=m; j<mtop-1; j++){				mat[j]= mat[j+1];				mvar[j]= mvar[j+1];			}			mtop--; 			actions++;			break;		}		pushInstruction(mb,p);	}	GDKfree(old);@-As a final optimization, we could remove the mal.new definitions,because they are not needed for the execution.For the time being, they are no-ops.@c	(void) stk; #ifdef DEBUG_OPT_MERGETABLE	stream_printf(GDKout,"Result of multi table optimizer\n");	printFunction(GDKout, mb, 0);#endif	return actions;}@include optimizerWrapper.mx@h@:exportOptimizer(mergetable)@@c@:wrapOptimizer(mergetable,OPT_CHECK_ALL)@@}
opt_mergetable.mx - 源码说明

本页面展示了「一个内存数据库的源代码这是服务器端还有客户端」中的 opt_mergetable.mx 源码文件，采用 MX 编程语言编写，共 473 行代码。您可以在线阅读完整代码内容，也可以返回资源详情页下载完整源码包进行本地学习和开发。
虫虫下载站收录了大量与内存数据库相关的技术资源，包括源代码、技术文档、电路图等，是电子工程师和嵌入式开发者的专业学习平台。
⌨️ 快捷键说明

复制代码Ctrl + C
搜索代码Ctrl + F
全屏模式F11
增大字号Ctrl + =
减小字号Ctrl + -
显示快捷键?