📄 gdk_scanselect.mx
字号:
@' The contents of this file are subject to the MonetDB Public License@' Version 1.1 (the "License"); you may not use this file except in@' compliance with the License. You may obtain a copy of the License at@' http://monetdb.cwi.nl/Legal/MonetDBLicense-1.1.html@'@' Software distributed under the License is distributed on an "AS IS"@' basis, WITHOUT WARRANTY OF ANY KIND, either express or implied. See the@' License for the specific language governing rights and limitations@' under the License.@'@' The Original Code is the MonetDB Database System.@'@' The Initial Developer of the Original Code is CWI.@' Portions created by CWI are Copyright (C) 1997-2007 CWI.@' All Rights Reserved.@f gdk_scanselect@a S. Manegold, M. L. Kersten, P. Boncz@* Fast sequential scan@-This module provides BAT_scanselect, used by gdk_batop.mx's BAT_select to doselections that cannot be done any better than doing a sequential scan. Thefunction needed to be "sourced-out" from gdk_batop.mx, as the fully expandedcode grows to large to be (conveniently) compiled in a single file (seebelow).@+ Background and IdeaIn tests with Q1 of TPC-H showed, a range select that produces a big result(~98% of its input) was much slower than expected. A profiler told us,that the (CPU-)time was lost in BUNfastins, or more precisely in ATOMput:Thought being expanded on the selection attribute's type to usetype-specific predicates/compare functions, storing the result BUNs in theresult BAT was done in a generic way. Hence, type-checking, etc. was donefor each result BUN twice (head & tail). This type check (performed byATOMput) turned out to be the major CPU bottleneck. On medusa (R12k/300),the total cost of a uselect on [void,date] was made up by ~87% CPUtime and ~13% memory access time.The idea to solve the problem was to expand the function's code also for thetype being store in the result, both head & tail. Thinking about it, wethought why not completely eliminate all branches (except the actualpredicate, of course) from the inner loop (BATloopFast) --- we do have Mx,so let's (ab?)use it!This means@itemize@itemExpanding on whether we created the result BAT (at least) as big as the input BAT, or not (see BAT_select in gdk_batop.mx). In the first case ("nocheck"), we do not have to check for a "BAT overflow" in the inner loop. Further, we need to set bn->batBuns->free only once we're done.@itemExpanding on the output-tail-type which is either identical to input-tail-type or TYPE_void (uselect).@itemExpand on output-head-type; usually inherited from input-head-type.@itemExpanding on input-head-type for data read. Taking special care of void/oid.@itemExpanding on the type of predicate: either equi, or range. The latter with both, one, or none of upper bound and lower bound. In case of a range select we skip all nil.@itemExpanding on input-tail-type for data read and comparison. Most is straight forward. Taking special care of equi selects on str when there are no doubles in the heap.@end itemizeFor more details, see the comment next to the code below.@-Well, to get this feasible (i.e., both the Mx code still readable tomorrowand the generated C code compilable), we came up with some kind of "designidea":The toplevel BAT_scanselect get expanded by a set of cascading levels of Mxmacros that dispatch to the different alternatives. To give the compiler achance to "concentrate on the important parts", only branches are created,here. The actual working block (i.e., the inner BATloopFast) are enclosed ina separate function each. Each leaf of the decision tree calls therespective function, and each function is called by exactly on leaf.A second set of cascading levels of Mx macros take care of generating theworking functions. For convenience, the matching macros of both sets areplaced next to each other.Altogether, we currently generate 1230(!) different scanselect loops. Somecompilers (e.g., gcc 2.95.3 on IRIX) try to inline all these functions intothe decision-/dispatching- tree, and fail with "Branch at has out of rangetarget". We "help" these compiler a bit by explicitly switching offinlining for this file (it doesn't make sense here, anyway).And finally, what do we achieve by all this effort?Well, the on medusa, CPU costs are reduced by (up to) factor 8(!), resultingin an overall improvement of about factor 4.5.@{@+ Implementation@h#ifndef _GDK_SCANSELECT_H#define _GDK_SCANSELECT_H#include "gdk.h"@- "Tools"First, we need some tools. These are the type-expanded replacementsfor some generic macros from gdk.mx, gdk_bat.mx, and gdk_atom.mx.Eventually, they should be moved there to be used in other places as well.VOIDput & SIMPLEput should go to gdk_atoms where ATOMput is defined.@h#ifdef NOEXPAND_CHR#ifndef NOEXPAND_BIT#define NOEXPAND_BIT 1#endif#ifndef NOEXPAND_BTE#define NOEXPAND_BTE 1#endif#endif#ifdef NOEXPAND_INT#ifndef NOEXPAND_FLT#define NOEXPAND_FLT 1#endif#ifndef NOEXPAND_BAT#define NOEXPAND_BAT 1#endif#if SIZEOF_OID == SIZEOF_INT && !defined(NOEXPAND_OID)#define NOEXPAND_OID 1#endif#if SIZEOF_WRD == SIZEOF_INT && !defined(NOEXPAND_WRD)#define NOEXPAND_WRD 1#endif#if SIZEOF_PTR == SIZEOF_INT && !defined(NOEXPAND_PTR)#define NOEXPAND_PTR 1#endif#endif#ifdef NOEXPAND_LNG#ifndef NOEXPAND_DBL#define NOEXPAND_DBL 1#endif#if SIZEOF_OID == SIZEOF_LNG && !defined(NOEXPAND_OID)#define NOEXPAND_OID 1#endif#if SIZEOF_WRD == SIZEOF_LNG && !defined(NOEXPAND_WRD)#define NOEXPAND_WRD 1#endif#if SIZEOF_PTR == SIZEOF_LNG && !defined(NOEXPAND_PTR)#define NOEXPAND_PTR 1#endif#endif#if defined(NOEXPAND_CHR) || defined(SKIP_TYPE_EXPANSIONS)#define ATOM_PUT_CHR(tpe,hp,dst,src) \ if (tpe == TYPE_chr) { \ *(chr*) dst = *(chr*) src; \ } else#else#define ATOM_PUT_CHR(tpe,hp,dst,src)#endif#if defined(NOEXPAND_BTE) || defined(SKIP_TYPE_EXPANSIONS)#define ATOM_PUT_BTE(tpe,hp,dst,src) \ if (tpe == TYPE_bte) { \ *(bte*) dst = *(bte*) src; \ } else#else#define ATOM_PUT_BTE(tpe,hp,dst,src)#endif#if defined(NOEXPAND_SHT) || defined(SKIP_TYPE_EXPANSIONS)#define ATOM_PUT_SHT(tpe,hp,dst,src) \ if (tpe == TYPE_sht) { \ *(sht*) dst = *(sht*) src; \ } else#else#define ATOM_PUT_SHT(tpe,hp,dst,src)#endif#if defined(NOEXPAND_INT) || defined(SKIP_TYPE_EXPANSIONS)#define ATOM_PUT_INT(tpe,hp,dst,src) \ if (tpe == TYPE_int) { \ *(int*) dst = *(int*) src; \ } else#else#define ATOM_PUT_INT(tpe,hp,dst,src)#endif#if defined(NOEXPAND_LNG) || defined(SKIP_TYPE_EXPANSIONS)#define ATOM_PUT_LNG(tpe,hp,dst,src) \ if (tpe == TYPE_lng) { \ *(lng*) dst = *(lng*) src; \ } else#else#define ATOM_PUT_LNG(tpe,hp,dst,src)#endif#define ATOM_PUT(tpe,hp,dst,src) \ ATOM_PUT_CHR(tpe,hp,dst,src) \ ATOM_PUT_SHT(tpe,hp,dst,src) \ ATOM_PUT_INT(tpe,hp,dst,src) \ ATOM_PUT_LNG(tpe,hp,dst,src) \ if (tpe == TYPE_bat) { \ BBPincref(*(bat*) src, TRUE); \ *(bat*) dst = *(bat*) src; \ } else if (tpe == TYPE_str) { \ if (strPut(hp, (var_t*) dst, (str) src) == 0) { \ goto bunins_failed; \ } \ } else { \ if (BATatoms[tpe].atomPut) { \ if ((*BATatoms[tpe].atomPut)(hp, (var_t*) dst, (str) src) == 0) { \ goto bunins_failed; \ } \ } else { \ memcpy(dst, src, (size_t) ATOMsize(tpe)); \ } \ ATOMfix(tpe, src); \ }@h#define SIMPLE_PUT(tpe,hp,dst,src) { *(tpe*) (ptr) (dst) = *(tpe*) (ptr) (src); }#define VOID_PUT(tpe,hp,dst,src) {}@-Type-specificHT_bunfastins_nocheck_noinc, HT_bunfastins_nocheck & HT_bunfastinsshould go to gdk.mx, andHT_BUNfastinsshould go to gdk_bat.mxreplacing their generic versions there.@h@= T_BUNfastins#define @1@4_bunfastins_nocheck_noinc(b, p, h, t) \ do { \ @2_PUT(@3, (b)->hheap, BUNhloc(b, p), h); \ @5_PUT(@6, (b)->theap, BUNtloc(b, p), t); \ } while (0)#define @1@4_bunfastins_nocheck(b, p, h, t, s) \ do { \ (b)->batBuns->free += s; \ (b)->batCount ++; \ @1@4_bunfastins_nocheck_noinc(b, p, h, t); \ } while (0)#define @1@4_bunfastins(b, h, t) \ do { \ REGISTER BUN _p = BUNlast(b); \ REGISTER int _bunsize = BUNsize(b); \ if ((b)->batBuns->free + _bunsize > (b)->batBuns->size) { \ if (BATextend((b), BATgrows(b)) == NULL) { \ goto bunins_failed; \ } \ _p = BUNlast(b); \ } \ @1@4_bunfastins_nocheck(b, _p, h, t, _bunsize); \ } while (0)@-Expand on tail type@= H_BUNfastins @:T_BUNfastins(@1,@2,@3,void,VOID,void)@ @:T_BUNfastins(@1,@2,@3,chr,SIMPLE,chr)@ @:T_BUNfastins(@1,@2,@3,bte,SIMPLE,bte)@ @:T_BUNfastins(@1,@2,@3,sht,SIMPLE,sht)@ @:T_BUNfastins(@1,@2,@3,int,SIMPLE,int)@ @:T_BUNfastins(@1,@2,@3,lng,SIMPLE,lng)@ @:T_BUNfastins(@1,@2,@3,any,ATOM,(b)->ttype)@@-Expand on head type@h@:H_BUNfastins(void,VOID,void)@@:H_BUNfastins(chr,SIMPLE,chr)@@:H_BUNfastins(bte,SIMPLE,bte)@@:H_BUNfastins(sht,SIMPLE,sht)@@:H_BUNfastins(int,SIMPLE,int)@@:H_BUNfastins(lng,SIMPLE,lng)@@:H_BUNfastins(any,ATOM,(b)->htype)@@-Type-mappers for convenience@= HT_bunfastins_typemap#define @1@2_bunfastins_nocheck_noinc(b, p, h, t) @1@3_bunfastins_nocheck_noinc(b, p, h, t)#define @1@2_bunfastins_nocheck(b, p, h, t, s) @1@3_bunfastins_nocheck(b, p, h, t, s)#define @1@2_bunfastins(b, h, t) @1@3_bunfastins(b, h, t)#define @2@1_bunfastins_nocheck_noinc(b, p, h, t) @3@1_bunfastins_nocheck_noinc(b, p, h, t)#define @2@1_bunfastins_nocheck(b, p, h, t, s) @3@1_bunfastins_nocheck(b, p, h, t, s)#define @2@1_bunfastins(b, h, t) @3@1_bunfastins(b, h, t)@h@:HT_bunfastins_typemap(void,flt,int)@@:HT_bunfastins_typemap(chr,flt,int)@@:HT_bunfastins_typemap(sht,flt,int)@@:HT_bunfastins_typemap(int,flt,int)@@:HT_bunfastins_typemap(lng,flt,int)@@:HT_bunfastins_typemap(any,flt,int)@@:HT_bunfastins_typemap(void,dbl,lng)@@:HT_bunfastins_typemap(chr,dbl,lng)@@:HT_bunfastins_typemap(sht,dbl,lng)@@:HT_bunfastins_typemap(int,dbl,lng)@@:HT_bunfastins_typemap(lng,dbl,lng)@@:HT_bunfastins_typemap(any,dbl,lng)@#if SIZEOF_OID == SIZEOF_INT@:HT_bunfastins_typemap(void,oid,int)@@:HT_bunfastins_typemap(chr,oid,int)@@:HT_bunfastins_typemap(bte,oid,int)@@:HT_bunfastins_typemap(sht,oid,int)@@:HT_bunfastins_typemap(int,oid,int)@@:HT_bunfastins_typemap(wrd,oid,int)@@:HT_bunfastins_typemap(lng,oid,int)@@:HT_bunfastins_typemap(flt,oid,int)@@:HT_bunfastins_typemap(dbl,oid,int)@@:HT_bunfastins_typemap(any,oid,int)@#else@:HT_bunfastins_typemap(void,oid,lng)@@:HT_bunfastins_typemap(chr,oid,lng)@@:HT_bunfastins_typemap(bte,oid,lng)@@:HT_bunfastins_typemap(sht,oid,lng)@@:HT_bunfastins_typemap(int,oid,lng)@@:HT_bunfastins_typemap(wrd,oid,lng)@@:HT_bunfastins_typemap(lng,oid,lng)@@:HT_bunfastins_typemap(flt,oid,lng)@@:HT_bunfastins_typemap(dbl,oid,lng)@@:HT_bunfastins_typemap(any,oid,lng)@#endif /* SIZEOF_OID == SIZEOF_INT */#if SIZEOF_WRD == SIZEOF_INT@:HT_bunfastins_typemap(void,wrd,int)@@:HT_bunfastins_typemap(chr,wrd,int)@/*@:HT_bunfastins_typemap(bte,wrd,int)@*/@:HT_bunfastins_typemap(sht,wrd,int)@@:HT_bunfastins_typemap(int,wrd,int)@@:HT_bunfastins_typemap(lng,wrd,int)@/*@:HT_bunfastins_typemap(flt,wrd,int)@*//*@:HT_bunfastins_typemap(dbl,wrd,int)@*/@:HT_bunfastins_typemap(any,wrd,int)@#else@:HT_bunfastins_typemap(void,wrd,lng)@@:HT_bunfastins_typemap(chr,wrd,lng)@/*@:HT_bunfastins_typemap(bte,wrd,lng)@*/@:HT_bunfastins_typemap(sht,wrd,lng)@@:HT_bunfastins_typemap(int,wrd,lng)@@:HT_bunfastins_typemap(lng,wrd,lng)@/*@:HT_bunfastins_typemap(flt,wrd,lng)@*//*@:HT_bunfastins_typemap(dbl,wrd,lng)@*/@:HT_bunfastins_typemap(any,wrd,lng)@#endif /* SIZEOF_WRD == SIZEOF_INT */@-For backward compatibility:(also to be move to gdk.mx and gdk_bat.mx, respectively)@h/*#define bunfastins_nocheck(b, p, h, t, s) anyany_bunfastins_nocheck(b, p, h, t, s)#define bunfastins(b, h, t) anyany_bunfastins(b, h, t)BAT *BUNfastins(BAT *b, ptr h, ptr t) { return anyany_BUNfastins(b,h,t);}*/BAT *BAT_scanselect(BAT *b, BAT *bn, ptr tl, ptr th, bit li, bit hi, int equi, int lval, int hval, int nocheck);#endif /* _GDK_SCANSELECT_H */@c#include "monetdb_config.h"#include "gdk.h"#if defined(__sgi) && defined(__GNUC__) && (SIZEOF_VOID_P == 8)#define SKIP_TYPE_EXPANSIONS#endif#include "gdk_scanselect.h"@-With all the expansions enabled, (our) GNU compiler (gcc 3.2.1)on medusa (IRIX64) fails to produce (correctly) working 64-bit code.Apparently, the jumps from the return statements in the leaves of thedecision tree to the end of the dispatcher function (BAT_scanselect())get too long --- at least the Mserver segfaults on these returns.An attempt to use labels and goto's to provide "stepping stones" did nothelp.Hence, we skip some type-expansions in this very case.@c@- The actual BAT_scanselect@= seqscanPrologue BUN p,q; int xx, t = b->ttype; ptr nil = ATOMnilptr(t); (void) nil; (void) tl; (void) th;@= seqscanEpilogue if (!bn->batDirty) bn->batDirty = TRUE; /* Just to silence compilers (Intel's icc) that otherwise might * complain about "declared but never referenced" labels * (condition should never be true). * (A "dead" goto between the return and the label makes (other) * compilers (Sun) complain about never reached code...) */ if (!bn) goto bunins_failed; return bn;bunins_failed: BBPreclaim(bn); return NULL;@c@-The templates for the inner loops that do the actual work.
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -