Re: [HACKERS] Parallel bitmap heap scan - Mailing list pgsql-hackers
From | Dilip Kumar |
---|---|
Subject | Re: [HACKERS] Parallel bitmap heap scan |
Date | |
Msg-id | CAFiTN-tGOBVVLRiBs7aS3hyUh=XJKdkzbWs879nXCgUL0JQNdg@mail.gmail.com Whole thread Raw |
In response to | Re: [HACKERS] Parallel bitmap heap scan (Robert Haas <robertmhaas@gmail.com>) |
Responses |
Re: [HACKERS] Parallel bitmap heap scan
Re: [HACKERS] Parallel bitmap heap scan |
List | pgsql-hackers |
On Wed, Feb 1, 2017 at 11:09 PM, Robert Haas <robertmhaas@gmail.com> wrote: Here are the latest version of the patch, which address all the comments given by Robert. > https://www.postgresql.org/message-id/20161017201558.cr34stc6zvxy3mst@alap3.anarazel.de > that you should try to share only the iteration arrays, I believe that > he meant the results of tbm_begin_iterate() -- that is, > iterator->spageptr, chunkptr, schunkbit, spages, and schunks. I'm > perhaps putting words into his mouth, but what he said was that we > could avoid sharing "the whole underlying hash". But the patch set > you've implemented here doesn't behave that way. Instead, you've got > the array containing the hash table elements shared (which is good) > plus there's a sort of dummy hash table in every worker which copies > some but not all of the members of the original hash table, leading to > the otherwise-unnecessary if-test that Haribabu is complaining about. > So the hash table is kinda-shared-kinda-not-shared, which I don't > *think* is really what Andres had in mind. > > In short, I think SH_COPY (implemented here as pagetable_copy) needs > to go away. The work of tbm_begin_iterate() should be done before we > begin the shared scan and the results of that work should be shared. I have removed the SH_COPY and now leader performs the tbm_begin_shared_iterate before waking up the worker. Basically, the leader will create the page and chunk array and that is the array of the "relptr" (offlist, suggested by Robert). > What happens right now (it appears) is that every backend does that > work based on the same hash table and we just assume they all get the > same answer. And we really do assume that, because > pbms_parallel_iterate() assumes it can shuttle private state back and > forth between iterator in different backends and nothing will break; > but those backends aren't actually using the same iteration arrays. > They are using different iteration arrays that should have the same > contents because they were all derived from the same semi-shared hash > table. That's pretty fragile, and a waste of CPU cycles if the hash > table is large (every backend does its own sort). > > On a related note, I think it's unacceptable to make the internal > details of TBMIterator public. You've moved it from tidbitmap.c to > tidbitmap.h so that nodeBitmapHeapScan.c can tinker with the guts of > the TBMIterator, but that's not OK. Those details need to remain > private to tidbitmap.c. Fixed pbms_parallel_iterate() is a nasty kludge; we > need some better solution. The knowledge of how a shared iterator > should iterate needs to be private to tidbitmap.c, not off in the > executor someplace. And I think the entries need to actually be > updated in shared memory directly, not copied back and forth between a > backend-private iterator and a shared iterator. I have fixed this, now there is new function called tbm_shared_iterate which will directly iterate using shared iterator. So now no need to copy member back and forth between local and shared iterator. > > Also, pbms_parallel_iterate() can't hold a spinlock around a call to > tbm_iterate(). Note the coding rules for spinlocks mentioned in > spin.h and src/backend/storage/lmgr/README. I think the right thing > to do here is to use an LWLock in a new tranche (add an entry to > BuiltinTrancheIds). Done that way. > > In 0002, you have this: > > + tb->alloc = palloc(sizeof(SH_ALLOCATOR)); > > This should be using MemoryContextAlloc(ctx, ...) rather than palloc. > Otherwise the allocator object is in a different context from > everything else in the hash table. But TBH, I don't really see why we > want this to be a separate object. Why not just put > HashAlloc/HashFree/args into SH_TYPE directly? That avoids some > pointer chasing and doesn't really seem to cost anything (except that > SH_CREATE will grow a slightly longer argument sequence). Done as suggested > > I think maybe we should rename the functions to element_allocate, > element_free, and element_allocator_ctx, or something like that. The > current names aren't capitalized consistently with other things in > this module, and putting the word "element" in there would make it > more clear what the purpose of this thing is. Fixed as per the suggestion -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Attachment
pgsql-hackers by date: