📄 readme
字号:
Proposal for function-manager redesign 19-Nov-2000--------------------------------------We know that the existing mechanism for calling Postgres functions needsto be redesigned. It has portability problems because it makesassumptions about parameter passing that violate ANSI C; it fails tohandle NULL arguments and results cleanly; and "function handlers" thatsupport a class of functions (such as fmgr_pl) can only be done via areally ugly, non-reentrant kluge. (Global variable set during everyfunction call, forsooth.) Here is a proposal for fixing these problems.In the past, the major objections to redoing the function-managerinterface have been (a) it'll be quite tedious to implement, since everybuilt-in function and everyplace that calls such functions will need tobe touched; (b) such wide-ranging changes will be difficult to make inparallel with other development work; (c) it will break existinguser-written loadable modules that define "C language" functions. WhileI have no solution to the "tedium" aspect, I believe I see an answer tothe other problems: by use of function handlers, we can support both oldand new interfaces in parallel for both callers and callees, at somesmall efficiency cost for the old styles. That way, most of the changescan be done on an incremental file-by-file basis --- we won't need a"big bang" where everything changes at once. Support for calleeswritten in the old style can be left in place indefinitely, to providebackward compatibility for user-written C functions.Changes in pg_proc (system data about a function)-------------------------------------------------A new column "proisstrict" will be added to the system pg_proc table.This is a boolean value which will be TRUE if the function is "strict",that is it always returns NULL when any of its inputs are NULL. Thefunction manager will check this field and skip calling the function whenit's TRUE and there are NULL inputs. This allows us to remove explicitNULL-value tests from many functions that currently need them (not tomention fixing many more that need them but don't have them). A functionthat is not marked "strict" is responsible for checking whether its inputsare NULL or not. Most builtin functions will be marked "strict".An optional WITH parameter will be added to CREATE FUNCTION to allowspecification of whether user-defined functions are strict or not. I aminclined to make the default be "not strict", since that seems to be themore useful case for functions expressed in SQL or a PL language, butam open to arguments for the other choice.The new function-manager interface----------------------------------The core of the new design is revised data structures for representingthe result of a function lookup and for representing the parameterspassed to a specific function invocation. (We want to keep functionlookup separate from function call, since many parts of the system applythe same function over and over; the lookup overhead should be paid onceper query, not once per tuple.)When a function is looked up in pg_proc, the result is represented astypedef struct{ PGFunction fn_addr; /* pointer to function or handler to be called */ Oid fn_oid; /* OID of function (NOT of handler, if any) */ short fn_nargs; /* 0..FUNC_MAX_ARGS, or -1 if variable arg count */ bool fn_strict; /* function is "strict" (NULL in => NULL out) */ bool fn_retset; /* function returns a set (over multiple calls) */ void *fn_extra; /* extra space for use by handler */ MemoryContext fn_mcxt; /* memory context to store fn_extra in */ Node *fn_expr; /* expression parse tree for call, or NULL */} FmgrInfo;For an ordinary built-in function, fn_addr is just the address of the Croutine that implements the function. Otherwise it is the address of ahandler for the class of functions that includes the target function.The handler can use the function OID and perhaps also the fn_extra slotto find the specific code to execute. (fn_oid = InvalidOid can be usedto denote a not-yet-initialized FmgrInfo struct. fn_extra will alwaysbe NULL when an FmgrInfo is first filled by the function lookup code, buta function handler could set it to avoid making repeated lookups of itsown when the same FmgrInfo is used repeatedly during a query.) fn_nargsis the number of arguments expected by the function, fn_strict is itsstrictness flag, and fn_retset shows whether it returns a set; all ofthese values come from the function's pg_proc entry. If the function isbeing called as part of a SQL expression, fn_expr will point to theexpression parse tree for the function call; this can be used to extractparse-time knowledge about the actual arguments.FmgrInfo already exists in the current code, but has fewer fields. Thischange should be transparent at the source-code level.During a call of a function, the following data structure is createdand passed to the function:typedef struct{ FmgrInfo *flinfo; /* ptr to lookup info used for this call */ Node *context; /* pass info about context of call */ Node *resultinfo; /* pass or return extra info about result */ bool isnull; /* function must set true if result is NULL */ short nargs; /* # arguments actually passed */ Datum arg[FUNC_MAX_ARGS]; /* Arguments passed to function */ bool argnull[FUNC_MAX_ARGS]; /* T if arg[i] is actually NULL */} FunctionCallInfoData;typedef FunctionCallInfoData* FunctionCallInfo;flinfo points to the lookup info used to make the call. Ordinary functionswill probably ignore this field, but function class handlers will need itto find out the OID of the specific function being called.context is NULL for an "ordinary" function call, but may point to additionalinfo when the function is called in certain contexts. (For example, thetrigger manager will pass information about the current trigger event here.)If context is used, it should point to some subtype of Node; the particularkind of context is indicated by the node type field. (A callee shouldalways check the node type before assuming it knows what kind of context isbeing passed.) fmgr itself puts no other restrictions on the use of thisfield.resultinfo is NULL when calling any function from which a simple Datumresult is expected. It may point to some subtype of Node if the functionreturns more than a Datum. (For example, resultinfo is used when calling afunction that returns a set, as discussed below.) Like the context field,resultinfo is a hook for expansion; fmgr itself doesn't constrain the useof the field.nargs, arg[], and argnull[] hold the arguments being passed to the function.Notice that all the arguments passed to a function (as well as its resultvalue) will now uniformly be of type Datum. As discussed below, callersand callees should apply the standard Datum-to-and-from-whatever macrosto convert to the actual argument types of a particular function. Thevalue in arg[i] is unspecified when argnull[i] is true.It is generally the responsibility of the caller to ensure that thenumber of arguments passed matches what the callee is expecting; exceptfor callees that take a variable number of arguments, the callee willtypically ignore the nargs field and just grab values from arg[].The isnull field will be initialized to "false" before the call. Onreturn from the function, isnull is the null flag for the function result:if it is true the function's result is NULL, regardless of the actualfunction return value. Note that simple "strict" functions can ignoreboth isnull and argnull[], since they won't even get called when thereare any TRUE values in argnull[].FunctionCallInfo replaces FmgrValues plus a bunch of ad-hoc parameterconventions, global variables (fmgr_pl_finfo and CurrentTriggerData atleast), and other uglinesses.Callees, whether they be individual functions or function handlers,shall always have this signature:Datum function (FunctionCallInfo fcinfo);which is represented by the typedeftypedef Datum (*PGFunction) (FunctionCallInfo fcinfo);The function is responsible for setting fcinfo->isnull appropriatelyas well as returning a result represented as a Datum. Note that sinceall callees will now have exactly the same signature, and will be calledthrough a function pointer declared with exactly that signature, weshould have no portability or optimization problems.Function coding conventions---------------------------As an example, int4 addition goes from old-styleint32int4pl(int32 arg1, int32 arg2){ return arg1 + arg2;}to new-styleDatumint4pl(FunctionCallInfo fcinfo){ /* we assume the function is marked "strict", so we can ignore * NULL-value handling */ return Int32GetDatum(DatumGetInt32(fcinfo->arg[0]) + DatumGetInt32(fcinfo->arg[1]));}This is, of course, much uglier than the old-style code, but we canimprove matters with some well-chosen macros for the boilerplate parts.I propose below macros that would make the code look likeDatumint4pl(PG_FUNCTION_ARGS){ int32 arg1 = PG_GETARG_INT32(0); int32 arg2 = PG_GETARG_INT32(1); PG_RETURN_INT32( arg1 + arg2 );}This is still more code than before, but it's fairly readable, and it'salso amenable to machine processing --- for example, we could probablywrite a script that scans code like this and extracts argument and resulttype info for comparison to the pg_proc table.For the standard data types float4, float8, and int8, these macros shouldhide the indirection and space allocation involved, so that the function'scode is not explicitly aware that these types are pass-by-reference. Thiswill offer a considerable gain in readability, and it also opens up theopportunity to make these types be pass-by-value on machines where it'sfeasible to do so. (For example, on an Alpha it's pretty silly to make int8be pass-by-ref, since Datum is going to be 64 bits anyway. float4 couldbecome pass-by-value on all machines...)Here are the proposed macros and coding conventions:The definition of an fmgr-callable function will always look likeDatumfunction_name(PG_FUNCTION_ARGS){ ...}"PG_FUNCTION_ARGS" just expands to "FunctionCallInfo fcinfo". The mainreason for using this macro is to make it easy for scripts to spot functiondefinitions. However, if we ever decide to change the calling conventionagain, it might come in handy to have this macro in place.A nonstrict function is responsible for checking whether each individualargument is null or not, which it can do with PG_ARGISNULL(n) (which isjust "fcinfo->argnull[n]"). It should avoid trying to fetch the valueof any argument that is null.Both strict and nonstrict functions can return NULL, if needed, with PG_RETURN_NULL();which expands to { fcinfo->isnull = true; return (Datum) 0; }Argument values are ordinarily fetched using code like int32 name = PG_GETARG_INT32(number);For float4, float8, and int8, the PG_GETARG macros will hide the pass-by-reference nature of the data types; for example PG_GETARG_FLOAT4 expands to (* (float4 *) DatumGetPointer(fcinfo->arg[number]))and would typically be called like this: float4 arg = PG_GETARG_FLOAT4(0);Note that "float4" and "float8" are the recommended typedefs to use, not"float32data" and "float64data", and the macros are named accordingly.But 64-bit ints should be declared as "int64".Non-null values are returned with a PG_RETURN_XXX macro of the appropriatetype. For example, PG_RETURN_INT32 expands to return Int32GetDatum(x)PG_RETURN_FLOAT4, PG_RETURN_FLOAT8, and PG_RETURN_INT64 hide the pass-by-reference nature of their datatypes.fmgr.h will provide PG_GETARG and PG_RETURN macros for all the basic datatypes. Modules or header files that define specialized SQL datatypes(eg, timestamp) should define appropriate macros for those types, so thatfunctions manipulating the types can be coded in the standard style.For non-primitive data types (particularly variable-length types) it won'tbe very practical to hide the pass-by-reference nature of the data type,so the PG_GETARG and PG_RETURN macros for those types won't do much morethan DatumGetPointer/PointerGetDatum plus the appropriate typecast (but seeTOAST discussion, below). Functions returning such types will need to
⌨️ 快捷键说明
复制代码
Ctrl + C
搜索代码
Ctrl + F
全屏模式
F11
切换主题
Ctrl + Shift + D
显示快捷键
?
增大字号
Ctrl + =
减小字号
Ctrl + -