Re: [HACKERS] Parallel bitmap heap scan - Mailing list pgsql-hackers
From | Dilip Kumar |
---|---|
Subject | Re: [HACKERS] Parallel bitmap heap scan |
Date | |
Msg-id | CAFiTN-uqMdXQdKDkD94WVYWoPnRCmA+sy_23--iq2ZPemniz6Q@mail.gmail.com Whole thread Raw |
In response to | Re: [HACKERS] Parallel bitmap heap scan (Robert Haas <robertmhaas@gmail.com>) |
Responses |
Re: [HACKERS] Parallel bitmap heap scan
|
List | pgsql-hackers |
On Tue, Feb 14, 2017 at 10:18 PM, Robert Haas <robertmhaas@gmail.com> wrote: > What is the point of having a TBMSharedIterator contain a TIDBitmap > pointer? All the values in that TIDBitmap are populated from the > TBMSharedInfo, but it seems to me that the fields that are copied over > unchanged (like status and nchunks) could just be used directly from > the TBMSharedInfo, and the fields that are computed (like relpages and > relchunks) could be stored directly in the TBMSharedIterator. > tbm_shared_iterate() is already separate code from tbm_iterate(), so > it can be changed to refer to whichever data structure contains the > data it needs. Done > > Why is TBMSharedInfo separate from TBMSharedIteratorState? It seems > like you could merge those two into a single structure. Done > > I think we can simplify things here a bit by having shared iterators > not support single-page mode. In the backend-private case, > tbm_begin_iterate() really doesn't need to do anything with the > pagetable in the TBM_ONE_PAGE case, but for a shared iterator, we've > got to anyway copy the pagetable into shared memory. So it seems to > me that it would be simpler just to transform it into a standard > iteration array while we're at it, instead of copying it into entry1. > In other words, I suggest removing both "entry1" and "status" from > TBMSharedInfo and making tbm_prepare_shared_iterate smarter to > compensate. Done > > I think "dsa_data" is poorly named; it should be something like > "dsapagetable" in TIDBitmap and just "pagetable" in TBMSharedInfo. I > think you should should move the "base", "relpages", and "relchunks" > into TBMSharedIterator and give them better names, like "ptbase", > "ptpages" and "ptchunks". "base" isn't clear that we're talking about > the pagetable's base as opposed to anything else, and "relpages" and > "relchunks" are named based on the fact that the pointers are relative > rather than named based on what data they point at, which doesn't seem > like the right choice. Done > > I suggest putting the parallel functions next to each other in the > file: tbm_begin_iterate(), tbm_prepare_shared_iterate(), > tbm_iterate(), tbm_shared_iterate(), tbm_end_iterate(), > tbm_end_shared_iterate(). Done > > + if (tbm->dsa == NULL) > + return pfree(pointer); > > Don't try to return a value of type void. The correct spelling of > this is { pfree(pointer); return; }. Formatted appropriately across 4 > lines, of course. Fixed > > + /* > + * If TBM is in iterating phase that means pagetable is already created > + * and we have come here during tbm_free. By this time we are already > + * detached from the DSA because the GatherNode would have been shutdown. > + */ > + if (tbm->iterating) > + return; > > This seems like something of a kludge, although not a real easy one to > fix. The problem here is that tidbitmap.c ideally shouldn't have to > know that it's being used by the executor or that there's a Gather > node involved, and certainly not the order of operations around > shutdown. It should really be the problem of 0002 to handle this kind > of problem, without 0001 having to know anything about it. It strikes > me that it would probably be a good idea to have Gather clean up its > children before it gets cleaned up itself. So in ExecShutdownNode, we > could do something like this: > > diff --git a/src/backend/executor/execProcnode.c > b/src/backend/executor/execProcnode.c > index 0dd95c6..5ccc2e8 100644 > --- a/src/backend/executor/execProcnode.c > +++ b/src/backend/executor/execProcnode.c > @@ -815,6 +815,8 @@ ExecShutdownNode(PlanState *node) > if (node == NULL) > return false; > > + planstate_tree_walker(node, ExecShutdownNode, NULL); > + > switch (nodeTag(node)) > { > case T_GatherState: > @@ -824,5 +826,5 @@ ExecShutdownNode(PlanState *node) > break; > } > > - return planstate_tree_walker(node, ExecShutdownNode, NULL); > + return false; > } > > Also, in ExecEndGather, something like this: > > diff --git a/src/backend/executor/nodeGather.c > b/src/backend/executor/nodeGather.c > index a1a3561..32c97d3 100644 > --- a/src/backend/executor/nodeGather.c > +++ b/src/backend/executor/nodeGather.c > @@ -229,10 +229,10 @@ ExecGather(GatherState *node) > void > ExecEndGather(GatherState *node) > { > + ExecEndNode(outerPlanState(node)); /* let children clean up first */ > ExecShutdownGather(node); > ExecFreeExprContext(&node->ps); > ExecClearTuple(node->ps.ps_ResultTupleSlot); > - ExecEndNode(outerPlanState(node)); > } > > /* > > With those changes and an ExecShutdownBitmapHeapScan() called from > ExecShutdownNode(), it should be possible (I think) for us to always > have the bitmap heap scan node shut down before the Gather node shuts > down, which I think would let you avoid having a special case for this > inside the TBM code. Done (gather_shutdown_children_first.patch does what you have mentioned above and 0002 implement the BitmapHeapScanShutdown function). > > + char *ptr; > + dsaptr = dsa_allocate(tbm->dsa, size + sizeof(dsa_pointer)); > + tbm->dsa_data = dsaptr; > + ptr = dsa_get_address(tbm->dsa, dsaptr); > + memset(ptr, 0, size + sizeof(dsa_pointer)); > + *((dsa_pointer *) ptr) = dsaptr; > > Hmm. Apparently, we need a dsa_allocate_and_zero() or dsa_allocate0() > function. This pattern seems likely to come up a lot, and I don't > think we should require every caller to deal with it. Done > > I don't see why you think you need to add sizeof(dsa_pointer) to the > allocation and store an extra copy of the dsa_pointer in the > additional space. You are already storing it in tbm->dsa_data and Done with the help of two pointers as discussed in another mail [1]. [1] https://www.postgresql.org/message-id/CA%2BTgmoaWsCJ5L2du9i1RQaegaNLgOYF2KgRFu%2B7sUAeQc_xBFg%40mail.gmail.com > > It seems shaky to me that tbm->iterating can be set when we've created > either a shared or a backend-private iterator. For example, > tbm_iterate() asserts that tbm->iterating is set as a way of checking > that spages and schunks will be set. But that's not guaranteed any > more with these changes, because we might've built the shared version > of the iteration arrays rather than the backend-private version. > Maybe you ought to change it from a bool to an enum: > TBM_NOT_ITERATING, TBM_ITERATING_PRIVATE, TBM_ITERATING_SHARED. And > then update all of the asserts and tests to check for the specific > state they care about. Done > > + while (schunkbit < PAGES_PER_CHUNK) > + { > + int wordnum = WORDNUM(schunkbit); > + int bitnum = BITNUM(schunkbit); > + > + if ((chunk->words[wordnum] & ((bitmapword) 1 << bitnum)) != 0) > + break; > + schunkbit++; > + } > > How about refactoring this chunk of code into a static inline function > and having both tbm_iterate() and tbm_shared_iterate() call it? > Probably something like static inline void > tbm_advance_schunkbit(PageTableEntry *chunk, int *schunkbit). Good idea, done this way > > + /* scan bitmap to extract individual offset numbers */ > + ntuples = 0; > + for (wordnum = 0; wordnum < WORDS_PER_PAGE; wordnum++) > + { > + bitmapword w = page->words[wordnum]; > + > + if (w != 0) > + { > + int off = wordnum * BITS_PER_BITMAPWORD + 1; > + > + while (w != 0) > + { > + if (w & 1) > + output->offsets[ntuples++] = (OffsetNumber) off; > + off++; > + w >>= 1; > + } > + } > + } > > Similarly, this looks like it could be refactored into a shared static > inline function as well, instead of duplicating it. Done. Attached patches: interface_dsa_allocate0.patch : Provides new interface dsa_allocate0, which is used by 0001 gather_shutdown_childeren_first.patch : Processing the child node before shuting down the gather, 0002 will use that part to shutdown BitmapNode before gather. 0001: tidbimap change to provide the interfaces for shared iterator. 0002: actual parallel bitmap heap scan by using interfaces of 0001. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Attachment
pgsql-hackers by date: