📄 mal_mes.mx
字号:
@' The contents of this file are subject to the MonetDB Public License@' Version 1.1 (the "License"); you may not use this file except in@' compliance with the License. You may obtain a copy of the License at@' http://monetdb.cwi.nl/Legal/MonetDBLicense-1.1.html@'@' Software distributed under the License is distributed on an "AS IS"@' basis, WITHOUT WARRANTY OF ANY KIND, either express or implied. See the@' License for the specific language governing rights and limitations@' under the License.@'@' The Original Code is the MonetDB Database System.@'@' The Initial Developer of the Original Code is CWI.@' Portions created by CWI are Copyright (C) 1997-2007 CWI.@' All Rights Reserved.@f mal_mes@a M. Kersten@* Materialized Execution SchedulingOne of the 'short commings' of the BAT storage engine is that is usesa simplified look on the size of things. It simply maps huge files intomemory and its operators mindlessly constructs huge intermediate results.This design decision was consciously taken. Running into such resourcelimits should not be dealt with at all levels of the system, butproperly factored out to a level where better control is available.The infrastrucuture has been stabilized enough to move ahead intotackling these problems at the scheduler level. It is the placewhere runtime information becomes available as MAL instructionproperties, e.g. the size of the the BATs being accessed.To set the stage, consider the following MAL snippet@example r:= algebra.select(R, 0, 100000); s:= algebra.select(S, 0, 200000); t:= algebra.select(s, 0,30); p:= algebra.prod(r,s); u:= main.f("something"); w:= calc.+(p,4); v:= algebra.sum(w);barrier z:= algebra.count(p); ...@end exampleAssume that after this block p is not used anymore.To reduce resource contention a decision should be takenat statement p. Let's look at those decisions in more detail.If the product p is expected to become large, we have to break R and/or Sinto pieces, such that their partial result satisfies our limits. The chopper module provides the necessary functionality for this.If p is broken into pieces, then all the following steps in theflowgraph towards the consumers of p should comprise of 'p-invariant'steps and the destination of p has an incremental variant.Likewise, between the point where the operands are broken intopieces and consumed, it should have p-invariant statements.The latter is easy, we simply replace the product by a nested loop(no further optimization yet). This will ensure that the prelude is not affected.@example r:= select(R, 0, 100000); s:= select(S, 0, 200000); t:= select(s, 0,30);barrier (b1,rf):= chop.newChunkIterator(r,1024);barrier (b2,sf):= chop.newChunkIterator(s,1024); p:= prod(rf,sf); u:= f("something"); w:= +(p,4); v:= sum(w);barrier z:= count(p); ...redo (b2,sf):= chop.hasMoreChunks(s,1024);redo (b1,rf):= chop.hasMoreChunks(r,1024);exit b2;exit b1;@end exampleTo make this work, the function f() should be moved out of the way,e.g. by pushing it out of the loop, or determine it is side-effect freewith respect to p.Moving can be done when there is no interference with the other statements. This property often occurs in the optimizer framework and can be checkedusing the dataflow graph.The update statement +(p,4) now also becomes dependent on a partial result and all code depending on it should be moved into the bodyof the iteration as well.Barrier and catch blocks are tricky. For catch blocks we shoulddetect if an exception could be raised on which we have to react.End, the exception handling should have a global effect.For barrier loops, all operations become basically partial.The example implementation addresses only part of this problemspace. In particular, it will concentrate on plansgenerated by the SQL compiler, whichhave a limited operator set and relatively clean behavior.This means we don;t have blocks to deal with and the resultof an operation is often immediately consumed.The underlying action involves an exchange operator, which should beaware of the nested structure of the program.@+ The strawman's approachThe analysis of basic algebraic blocks is quite complicated, becauseit effectively requires semantic knowledge of both MAL executionand the pecularities of each (user defined) function.An easier way out of this problem runs as follows. If p is too largethe operations in which it is used receive the information piece-wise.This means we have to define those functions in a library anyway, becausethe GDK kernel implementation does not support incremental execution.Any occurrence of such a function call becomes the target of a rewriteby redoing the operation p(). Conceptually the first step in the caseabove is to transform it to.@example r:= select(R, 0, 100000); s:= select(S, 0, 200000); t:= select(s, 0,30); u:= f("something"); w:= bat(p);barrier (b1,rf):= chop.newChunkIterator(r,1024);barrier (b2,sf):= chop.newChunkIterator(s,1024); p:= prod(rf,sf); wi:= +(p,4); insert(w,wi)redo (b2,sf):= chop.hasMoreChunks(s,1024);redo (b1,rf):= chop.hasMoreChunks(r,1024);exit b2;exit b1; v:= sum(w);barrier z:= count(p); ...@end exampleOfcourse, this rewrites does not solve the problem, because nowthe accumulator bat w becomes too large and the process shouldbe recursively applied to its uses.The leads to@example r:= select(R, 0, 100000); s:= select(S, 0, 200000); t:= select(s, 0,30); u:= f("something"); w:= bat(p); v:= 0;barrier (b1,rf):= chop.newChunkIterator(r,1024);barrier (b2,sf):= chop.newChunkIterator(s,1024); p:= prod(rf,sf); wi:= +(p,4); vc:= sum(wi); v:= v+vc;redo (b2,sf):= chop.hasMoreChunks(s,1024);redo (b1,rf):= chop.hasMoreChunks(r,1024);exit b2;exit b1;barrier z:= count(p); ...@end exampleThe last statement should also be replaced, becauseit uses a large intermediate result as well.Since the operation v:=sum(w) could be anywhere in theplan, we should guarantee that the repeated execution ofthe iterator still produces the same result. This simplycalls for making a readonly copy of the operands going into theproduct operation.Second case.@example p:= prod(r,s); c:= count(p); print(c); t:= select(p,X,Y); c:= count(t);@end example@+ Optimization issuesOnce a target operation has been isolated for chopping it upinto a loop, it becomes mandatory to find an optimal choice.This could be focussed on minimizing the number of loop stepsup to minimizing the total footprint of its body.The latter seems identical to the knapsack problem and tooexpensive to consider in our setting. The first rule to apply is to maximize the volume covered in eachstep, which means you find fragment size P and Q, such that |r|x|s|/(PxQ) is minimized and still fits the resources provided,e.g. the system cache.This under the global constraint to leave room for the body to work, whichcould be determined calculating the maximum volume needed.(is a separate scheduler support routine)The strawman's approach is a classical exchange of storage forcpu cycles. It is used as our first implementation under theassumption the need for chopping the same operand twice is a rare event.The simplicity of the algorithm makes the optimizer much faster and lesserror prone.@+ Preliminary results.The test suite contains a simple test (tst950) that measuresthe gross impact of this process. It aims at a cross-productof 16Kx16K bats. In M4 this could not be accomodated due to memoryconsumption.@+ Implementation focusThe first implementation of this optimizer is geared at instructionsproduced by the SQL front-end.See for SQL specific optimization rules the corresponding source code,called sql_optimizer.@{@h#ifndef _MAL_MES_H#define _MAL_MES_H#include "mal.h"#include "mal_scheduler.h"#include "mal_interpreter.h" /* for showErrors() */#include "mal_function.h"#define MAL_MES_DEBUG 1#endif@c#include "mal_config.h"#include "mal_mes.h"#include "mal_builder.h"@-The cross operator is by large the most 'dangerous', because itspotential claim on memory is huge.Replacing it with an iterator version avoids materialization ofthis intermediate, before it is being consumed by the otheroperators.A critical issue is to respect the resources available.This will be collected at the start of the scheduler calland is available as global information.@c#ifdef WIN32#ifndef LIBMAL_MES#define mal_mes_export extern __declspec(dllimport)#else#define mal_mes_export extern __declspec(dllexport)#endif#else#define mal_mes_export extern#endifmal_mes_export str meo_loop_optimizer(MalBlkPtr mb, MalStkPtr stk, InstrPtr p);mal_mes_export str meo_loop_optimizerImpl(MalBlkPtr mb, MALfcn f, str name);strmeo_loop_optimizerImpl(MalBlkPtr mb, MALfcn f, str name){ int i, l, lb, r, rb; InstrPtr *oldblk; InstrPtr *garbage; InstrPtr ql, qr; int oldtop = 0, g = 0; InstrPtr p; (void) name; oldblk = (InstrPtr *) mb->stmt; oldtop = mb->stop; mb->stmt = alloca(mb->stop * 2 * sizeof(InstrPtr)); mb->stop = 0; garbage = alloca(mb->stop * sizeof(InstrPtr)); for (i = 0; i < oldtop; i++) { p = oldblk[i]; if (p->fcn != f) { pushInstruction(mb, p); continue; }#ifdef MAL_MES_DEBUG printf("meo_loop_optimizer attempt\n"); printInstruction(GDKout, mb, p, LIST_MAL_ALL);#endif /* craft the new code block */ lb = newTmpVariable(mb, TYPE_bit); l = newTmpVariable(mb, getVarType(mb, p->argv[1])); rb = newTmpVariable(mb, TYPE_bit); r = newTmpVariable(mb, getVarType(mb, p->argv[2])); ql = newFcnCall(mb, GDKstrdup("chop"), GDKstrdup("newIterator")); ql->argv[0] = lb; ql->argv[1] = l; ql->retc = 2; ql->argv[2] = p->argv[1]; ql->argc = 3; ql->barrier = BARRIERsymbol; qr = newFcnCall(mb, GDKstrdup("chop"), GDKstrdup("newIterator")); qr->argv[0] = rb; qr->argv[1] = r; qr->retc = 2; qr->argv[2] = p->argv[2]; qr->argc = 3; qr->barrier = BARRIERsymbol; /* and the intelligence should come here */ /* redo part */ qr = newFcnCall(mb, GDKstrdup("chop"), GDKstrdup("hasMoreElements")); qr->argv[0] = rb; qr->argv[1] = r; qr->retc = 2; qr->argc = 3; qr->barrier = REDOsymbol; ql = newFcnCall(mb, GDKstrdup("chop"), GDKstrdup("hasMoreElements")); ql->argv[0] = lb; ql->argv[1] = l; ql->retc = 2; ql->argc = 3; ql->barrier = REDOsymbol; /* exit part */ ql = newAssignment(mb); ql->argv[0] = l; ql->argc = 1; ql->barrier = EXITsymbol; qr = newAssignment(mb); qr->argv[0] = r; qr->argc = 1; qr->barrier = EXITsymbol; } /* if we fail, garbage collection is easy */ for (i = 0; i < g; i++) if (garbage[i]) { } return MAL_SUCCEED;}strmeo_loop_optimizer(MalBlkPtr mb, MalStkPtr stk, InstrPtr p){ MALfcn f; (void) stk; (void) p; f = findMALSymbol("algebra", "cross")->def->stmt[0]->fcn; meo_loop_optimizerImpl(mb, f, "cross"); f = findMALSymbol("algebra", "join")->def->stmt[0]->fcn; meo_loop_optimizerImpl(mb, f, "join"); return MAL_SUCCEED;}@- Strawman ImplementationThe strawman implementation requires a few distinctive parts to beglued together.@}
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -