readme

来自「PostgreSQL7.4.6 for Linux」· 代码 · 共 408 行 · 第 1/2 页

TXT
408
字号
from prepared statements simply reference the prepared statements' trees,and won't actually need any storage allocated in their private contexts.Transient contexts during execution-----------------------------------When creating a prepared statement, the parse and plan trees will be builtin a temporary context that's a child of MessageContext (so that it willgo away automatically upon error).  On success, the finished plan iscopied to the prepared statement's private context, and the temp contextis released; this allows planner temporary space to be recovered beforeexecution begins.  (In simple-Query mode we'll not bother with the extracopy step, so the planner temp space stays around till end of query.)The top-level executor routines, as well as most of the "plan node"execution code, will normally run in a context that is created byExecutorStart and destroyed by ExecutorEnd; this context also holds the"plan state" tree built during ExecutorStart.  Most of the memoryallocated in these routines is intended to live until end of query,so this is appropriate for those purposes.  The executor's top contextis a child of PortalContext, that is, the per-portal context of theportal that represents the query's execution.The main improvement needed in the executor is that expression evaluation--- both for qual testing and for computation of targetlist entries ---needs to not leak memory.  To do this, each ExprContext (expression-evalcontext) created in the executor will now have a private memory contextassociated with it, and we'll arrange to switch into that context whenevaluating expressions in that ExprContext.  The plan node that owns theExprContext is responsible for resetting the private context to emptywhen it no longer needs the results of expression evaluations.  Typicallythe reset is done at the start of each tuple-fetch cycle in the plan node.Note that this design gives each plan node its own expression-eval memorycontext.  This appears necessary to handle nested joins properly, sincean outer plan node might need to retain expression results it has computedwhile obtaining the next tuple from an inner node --- but the inner nodemight execute many tuple cycles and many expressions before returning atuple.  The inner node must be able to reset its own expression contextmore often than once per outer tuple cycle.  Fortunately, memory contextsare cheap enough that giving one to each plan node doesn't seem like aproblem.A problem with running index accesses and sorts in a query-lifespan contextis that these operations invoke datatype-specific comparison functions,and if the comparators leak any memory then that memory won't be recoveredtill end of query.  The comparator functions all return bool or int32,so there's no problem with their result data, but there can be a problemwith leakage of internal temporary data.  In particular, comparatorfunctions that operate on TOAST-able data types will need to be carefulnot to leak detoasted versions of their inputs.  This is annoying, butit appears a lot easier to make the comparators conform than to fix theindex and sort routines, so that's what I propose to do for 7.1.  Furthercleanup can be left for another day.There will be some special cases, such as aggregate functions.  nodeAgg.cneeds to remember the results of evaluation of aggregate transitionfunctions from one tuple cycle to the next, so it can't just discardall per-tuple state in each cycle.  The easiest way to handle this seemsto be to have two per-tuple contexts in an aggregate node, and toping-pong between them, so that at each tuple one is the active allocationcontext and the other holds any results allocated by the prior cycle'stransition function.Executor routines that switch the active CurrentMemoryContext may needto copy data into their caller's current memory context before returning.I think there will be relatively little need for that, because of theconvention of resetting the per-tuple context at the *start* of anexecution cycle rather than at its end.  With that rule, an executionnode can return a tuple that is palloc'd in its per-tuple context, andthe tuple will remain good until the node is called for another tupleor told to end execution.  This is pretty much the same state of affairsthat exists now, since a scan node can return a direct pointer to a tuplein a disk buffer that is only guaranteed to remain good that long.A more common reason for copying data will be to transfer a result fromper-tuple context to per-run context; for example, a Unique node willsave the last distinct tuple value in its per-run context, requiring acopy step.Another interesting special case is VACUUM, which needs to allocateworking space that will survive its forced transaction commits, yetbe released on error.  Currently it does that through a "portal",which is essentially a child context of TopMemoryContext.  While thatway still works, it's ugly since xact abort needs special processingto delete the portal.  Better would be to use a context that's a childof PortalContext and hence is certain to go away as part of normalprocessing.  (Eventually we might have an even better solution fromnested transactions, but this'll do fine for now.)Mechanisms to allow multiple types of contexts----------------------------------------------We may want several different types of memory contexts with differentallocation policies but similar external behavior.  To handle this,memory allocation functions will be accessed via function pointers,and we will require all context types to obey the conventions given here.(This is not very far different from the existing code.)A memory context will be represented by an object liketypedef struct MemoryContextData{    NodeTag        type;           /* identifies exact kind of context */    MemoryContextMethods methods;    MemoryContextData *parent;     /* NULL if no parent (toplevel context) */    MemoryContextData *firstchild; /* head of linked list of children */    MemoryContextData *nextchild;  /* next child of same parent */    char          *name;           /* context name (just for debugging) */} MemoryContextData, *MemoryContext;This is essentially an abstract superclass, and the "methods" pointer isits virtual function table.  Specific memory context types will usederived structs having these fields as their first fields.  All thecontexts of a specific type will have methods pointers that point to thesame static table of function pointers, which will look liketypedef struct MemoryContextMethodsData{    Pointer     (*alloc) (MemoryContext c, Size size);    void        (*free_p) (Pointer chunk);    Pointer     (*realloc) (Pointer chunk, Size newsize);    void        (*reset) (MemoryContext c);    void        (*delete) (MemoryContext c);} MemoryContextMethodsData, *MemoryContextMethods;Alloc, reset, and delete requests will take a MemoryContext pointeras parameter, so they'll have no trouble finding the method pointerto call.  Free and realloc are trickier.  To make those work, we willrequire all memory context types to produce allocated chunks thatare immediately preceded by a standard chunk header, which has thelayouttypedef struct StandardChunkHeader{    MemoryContext mycontext;         /* Link to owning context object */    Size          size;              /* Allocated size of chunk */};It turns out that the existing aset.c memory context type does thisalready, and probably any other kind of context would need to have thesame data available to support realloc, so this is not really creatingany additional overhead.  (Note that if a context type needs more per-allocated-chunk information than this, it can make an additionalnonstandard header that precedes the standard header.  So we're notconstraining context-type designers very much.)Given this, the pfree routine will look something like    StandardChunkHeader * header =         (StandardChunkHeader *) ((char *) p - sizeof(StandardChunkHeader));    (*header->mycontext->methods->free_p) (p);We could do it as a macro, but the macro would have to evaluate itsargument twice, which seems like a bad idea (the current pfree macrodoes not do that).  This is already saving two levels of function callcompared to the existing code, so I think we're doing fine withoutsqueezing out that last little bit ...More control over aset.c behavior---------------------------------Currently, aset.c allocates an 8K block upon the first allocation ina context, and doubles that size for each successive block request.That's good behavior for a context that might hold *lots* of data, andthe overhead wasn't bad when we had only a few contexts in existence.With dozens if not hundreds of smaller contexts in the system, we willwant to be able to fine-tune things a little better.The creator of a context will be able to specify an initial block sizeand a maximum block size.  Selecting smaller values will prevent wastageof space in contexts that aren't expected to hold very much (an example isthe relcache's per-relation contexts).Also, it will be possible to specify a minimum context size.  If thisvalue is greater than zero then a block of that size will be grabbedimmediately upon context creation, and cleared but not released duringcontext resets.  This feature is needed for ErrorContext (see above),but will most likely not be used for other contexts.We expect that per-tuple contexts will be reset frequently and typicallywill not allocate very much space per tuple cycle.  To make this usagepattern cheap, the first block allocated in a context is not givenback to malloc() during reset, but just cleared.  This avoids mallocthrashing.Other notes-----------The original version of this proposal suggested that functions returningpass-by-reference datatypes should be required to return a value freshlypalloc'd in their caller's memory context, never a pointer to an inputvalue.  I've abandoned that notion since it clearly is prone to error.In the current proposal, it is possible to discover which context achunk of memory is allocated in (by checking the required standard chunkheader), so nodeAgg can determine whether or not it's safe to resetits working context; it doesn't have to rely on the transition functionto do what it's expecting.

⌨️ 快捷键说明

复制代码Ctrl + C
搜索代码Ctrl + F
全屏模式F11
增大字号Ctrl + =
减小字号Ctrl + -
显示快捷键?