Re: Parallel bitmap heap scan - Mailing list pgsql-hackers
From | Dilip Kumar |
---|---|
Subject | Re: Parallel bitmap heap scan |
Date | |
Msg-id | CAFiTN-txEPkB8pJEYqe8E-HRi_4BYo9kKXvbfVm1w7fOxgEdYw@mail.gmail.com Whole thread Raw |
In response to | Re: Parallel bitmap heap scan (Dilip Kumar <dilipbalaut@gmail.com>) |
Responses |
Re: [HACKERS] Parallel bitmap heap scan
|
List | pgsql-hackers |
On Tue, Nov 22, 2016 at 9:05 AM, Dilip Kumar <dilipbalaut@gmail.com> wrote: > On Fri, Nov 18, 2016 at 9:59 AM, Amit Khandekar <amitdkhan.pg@gmail.com> wrote: > > Thanks for the review.. I have worked on these comments.. > >> In pbms_is_leader() , I didn't clearly understand the significance of >> the for-loop. If it is a worker, it can call >> ConditionVariablePrepareToSleep() followed by >> ConditionVariableSleep(). Once it comes out of >> ConditionVariableSleep(), isn't it guaranteed that the leader has >> finished the bitmap ? If yes, then it looks like it is not necessary >> to again iterate and go back through the pbminfo->state checking. >> Also, with this, variable queuedSelf also might not be needed. But I >> might be missing something here. > > I have taken this logic from example posted on conditional variable thread[1] > > for (;;) > { > ConditionVariablePrepareToSleep(cv); > if (condition for which we are waiting is satisfied) > break; > ConditionVariableSleep(); > } > ConditionVariableCancelSleep(); > > [1] https://www.postgresql.org/message-id/CA%2BTgmoaj2aPti0yho7FeEf2qt-JgQPRWb0gci_o1Hfr%3DC56Xng%40mail.gmail.com > > So it appears to me even if we come out of ConditionVariableSleep(); > we need to verify our condition and then only break. > > Not sure what happens if worker calls >> ConditionVariable[Prepare]Sleep() but leader has already called >> ConditionVariableBroadcast(). Does the for loop have something to do >> with this ? But this can happen even with the current for-loop, it >> seems. > If leader has already called ConditionVariableBroadcast, but after > ConditionVariablePrepareToSleep we will check the condition again > before calling ConditionVariableSleep. And condition check is under > SpinLockAcquire(&pbminfo->state_mutex); > > However I think there is one problem in my code (I think you might be > pointing same), that after ConditionVariablePrepareToSleep, if > pbminfo->state is already PBM_FINISHED, I am not resetting needWait to > false, and which can lead to the problem. I have fixed the defect what I have mentioned above, > >> >> >>> #3. Bitmap processing (Iterate and process the pages). >>> In this phase each worker will iterate over page and chunk array and >>> select heap pages one by one. If prefetch is enable then there will be >>> two iterator. Since multiple worker are iterating over same page and >>> chunk array we need to have a shared iterator, so we grab a spin lock >>> and iterate within a lock, so that each worker get and different page >>> to process. >> >> tbm_iterate() call under SpinLock : >> For parallel tbm iteration, tbm_iterate() is called while SpinLock is >> held. Generally we try to keep code inside Spinlock call limited to a >> few lines, and that too without occurrence of a function call. >> Although tbm_iterate() code itself looks safe under a spinlock, I was >> checking if we can squeeze SpinlockAcquire() and SpinLockRelease() >> closer to each other. One thought is : in tbm_iterate(), acquire the >> SpinLock before the while loop that iterates over lossy chunks. Then, >> if both chunk and per-page data remain, release spinlock just before >> returning (the first return stmt). And then just before scanning >> bitmap of an exact page, i.e. just after "if (iterator->spageptr < >> tbm->npages)", save the page handle, increment iterator->spageptr, >> release Spinlock, and then use the saved page handle to iterate over >> the page bitmap. > > Main reason to keep Spin lock out of this function to avoid changes > inside this function, and also this function takes local iterator as > input which don't have spin lock reference to it. But that can be > changed, we can pass shared iterator to it. > > I will think about this logic and try to update in next version. Still this issue is not addressed. Logic inside tbm_iterate is using same variable, like spageptr, multiple places. IMHO this complete logic needs to be done under one spin lock. > >> >> prefetch_pages() call under Spinlock : >> Here again, prefetch_pages() is called while pbminfo->prefetch_mutex >> Spinlock is held. Effectively, heavy functions like PrefetchBuffer() >> would get called while under the Spinlock. These can even ereport(). >> One option is to use mutex lock for this purpose. But I think that >> would slow things down. Moreover, the complete set of prefetch pages >> would be scanned by a single worker, and others might wait for this >> one. Instead, what I am thinking is: grab the pbminfo->prefetch_mutex >> Spinlock only while incrementing pbminfo->prefetch_pages. The rest >> part viz : iterating over the prefetch pages, and doing the >> PrefetchBuffer() need not be synchronised using this >> pgbinfo->prefetch_mutex Spinlock. pbms_parallel_iterate() already has >> its own iterator spinlock. Only thing is, workers may not do the >> actual PrefetchBuffer() sequentially. One of them might shoot ahead >> and prefetch 3-4 pages while the other is lagging with the >> sequentially lesser page number; but I believe this is fine, as long >> as they all prefetch all the required blocks. > > I agree with your point, will try to fix this as well. I have fixed this part. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
pgsql-hackers by date: