Re: [HACKERS] Parallel bitmap heap scan - Mailing list pgsql-hackers

From Dilip Kumar
Subject Re: [HACKERS] Parallel bitmap heap scan
Date
Msg-id CAFiTN-su5SW6ekgrOYV7SLMNDuZzMT4wqP1Sp1=NLXpFAGnUQA@mail.gmail.com
Whole thread Raw
In response to Re: [HACKERS] Parallel bitmap heap scan  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: [HACKERS] Parallel bitmap heap scan
Re: [HACKERS] Parallel bitmap heap scan
List pgsql-hackers
On Sun, Feb 26, 2017 at 9:26 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> I'm not entirely sure about the calling convention for
> tbm_free_shared_area() but the rest seems OK.

Changed
>
>> 2. Now tbm_free is not freeing any of the shared members which can be
>> accessed by other worker so tbm_free is safe to call from
>> ExecEndBitmapHeapScan without any safety check or ref count.
>
> That also seems fine. We ended up with something very similar in the
> Parallel Index Scan patch.
>
>> 0002:
>> 1. We don't need ExecShutdownBitmapHeapScan anymore because now we are
>> not freeing any shared member in ExecEndBitmapHeapScan.
>> 2. In ExecReScanBitmapHeapScan we will call tbm_free_shared_area to
>> free the shared members of the TBM.
>> 3. After that, we will free TBMSharedIteratorState what we allocated
>> using tbm_prepare_shared_iterate.
>
> Check.  But I think tbm_free_shared_area() should also free the object
> itself, instead of making the caller do that separately.

Right, done that way.
>
> +        if (DsaPointerIsValid(node->pstate->tbmiterator))
> +        {
> +            /* First we free the shared TBM members using the shared state */
> +            tbm_free_shared_area(dsa, node->pstate->tbmiterator);
> +            dsa_free(dsa, node->pstate->tbmiterator);
> +        }
> +
> +        if (DsaPointerIsValid(node->pstate->prefetch_iterator))
> +            dsa_free(dsa, node->pstate->prefetch_iterator);
>
> The fact that these cases aren't symmetric suggests that your
> abstraction is leaky.  I'm guessing that you can't call
> tbm_free_shared_area because the two iterators share one copy of the
> underlying iteration arrays, and the TBM code isn't smart enough to
> avoid freeing them twice.  You're going to have to come up with a
> better solution to that problem; nodeBitmapHeapScan.c shouldn't know
> about the way the underlying storage details are managed.  (Maybe you
> need to reference-count the iterator arrays?)

Converted iterator arrays to structure and maintained the refcount. I
had to do the same thing for pagetable also because that is also
shared across iterator.

>
> +    if (node->inited)
> +        goto start_iterate;
>
> My first programming teacher told me not to use goto.  I've
> occasionally violated that rule, but I need a better reason than you
> have here.  It looks very easy to avoid.

Changed
>
> +            pbms_set_parallel(outerPlanState(node));
>
> I think this should be a flag in the plan, and the planner should set
> it correctly, instead of having it be a flag in the executor that the
> executor sets.  Also, the flag itself should really be called
> something that involves the word "shared" rather than "parallel",
> because the bitmap will not be created in parallel, but it will be
> shared.
Done
>
> Have you checked whether this patch causes any regression in the
> non-parallel case?  It adds a bunch of "if" statements, so it might.
> Hopefully not, but it needs to be carefully tested.

With new patch I tested in my local machine, perform multiple
executions and it doesn't show any regression.  Attached file
perf_result contains the explain analyze output for one of the query
(head, patch with 0 workers and patch with 2 workers).  I will perform
the test with all the TPC-H queries which using bitmap plan on the
bigger machine and will post the results next week.
>
> @@ -48,10 +48,11 @@
>  #include "utils/snapmgr.h"
>  #include "utils/tqual.h"
>
> -
>  static TupleTableSlot *BitmapHeapNext(BitmapHeapScanState *node);
>
> Unnecessary.
Fixed
>
> +static bool pbms_is_leader(ParallelBitmapState *pbminfo);
> +static void pbms_set_parallel(PlanState *node);
>
> I don't think this "pbms" terminology is very good.  It's dissimilar
> to the way other functions in this file are named.  Maybe
> BitmapShouldInitializeSharedState().
Changed
>
> I think that some of the bits that this function makes conditional on
> pstate should be factored out into inline functions.  Like:
>
> -            if (node->prefetch_target >= node->prefetch_maximum)
> -                 /* don't increase any further */ ;
> -            else if (node->prefetch_target >= node->prefetch_maximum / 2)
> -                node->prefetch_target = node->prefetch_maximum;
> -            else if (node->prefetch_target > 0)
> -                node->prefetch_target *= 2;
> -            else
> -                node->prefetch_target++;
> +            if (!pstate)
> +            {
> +                if (node->prefetch_target >= node->prefetch_maximum)
> +                     /* don't increase any further */ ;
> +                else if (node->prefetch_target >= node->prefetch_maximum / 2)
> +                    node->prefetch_target = node->prefetch_maximum;
> +                else if (node->prefetch_target > 0)
> +                    node->prefetch_target *= 2;
> +                else
> +                    node->prefetch_target++;
> +            }
> +            else if (pstate->prefetch_target < node->prefetch_maximum)
> +            {
> +                SpinLockAcquire(&pstate->prefetch_mutex);
> +                if (pstate->prefetch_target >= node->prefetch_maximum)
> +                     /* don't increase any further */ ;
> +                else if (pstate->prefetch_target >=
> +                         node->prefetch_maximum / 2)
> +                    pstate->prefetch_target = node->prefetch_maximum;
> +                else if (pstate->prefetch_target > 0)
> +                    pstate->prefetch_target *= 2;
> +                else
> +                    pstate->prefetch_target++;
> +                SpinLockRelease(&pstate->prefetch_mutex);
> +            }
>
> I suggest creating an inline function like BitmapAdjustPrefetch() for
> this logic, and letting the code call that.  The function can look
> something like this: if (pstate == NULL) { non-parallel stuff; return;
> } parallel stuff follows...
>
> And similarly for the other cases where you've made the logic
> conditional.  This will make it more clear what's happening
> post-patch, I think, and will also help keep the level of indentation
> from getting out-of-control in certain places.  In fact, maybe you
> should submit a preliminary refactoring patch that moves these chunks
> of logic into functions and then the main patch can apply over top of
> that.

Done.

>
> +    bool                 inited;
>
> Suggest: initialized

Done
>
> - * ----------------
> + *        pscan_len           size of the shared memory for parallel bitmap
> + *        inited               is node is ready to iterate
> + *        stbmiterator       shared iterator
> + *        sprefetch_iterator shared iterator for prefetching
> + *        pstate               shared state for parallel bitmap scan
> + *----------------
>
> No need to change number of dashes.
Fixed
>
> + *     PBMState information : Current status of the TIDBitmap creation during
> + *                            parallel bitmap heap scan.
>
> If you look for existing places where comments are formatted like
> this, I bet you won't find many.  Copy the surrounding style more.

Done as surrounding structure, also changed the name to
SharedBitmapState, I think that is better name for the purpose.

>
> +    dsa_pointer    tbmiterator;
> +    dsa_pointer    prefetch_iterator;
> +    slock_t        prefetch_mutex;
> +    int            prefetch_pages;
> +    int            prefetch_target;
> +    bool        prefetching;
> +    slock_t        state_mutex;
> +    PBMState    state;
> +    ConditionVariable cv;
> +    char        phs_snapshot_data[FLEXIBLE_ARRAY_MEMBER];
>
> I think it is probably not a good idea to have two separate mutexes
> here.  They'll be in the same cache line, so it won't be much faster
> than having one mutex, and the state mutex won't be acquired very
> often so you won't really gain anything anyway.  I think you can just
> merge the mutexes into one called 'mutex'.

Done.
>
> +                    /* Allow only one process to prefetch */
>
> If this is a good idea, there should be a comment explaining why.

Done
>
> +    TBMSharedIterator    *stbmiterator;
> +    TBMSharedIterator    *sprefetch_iterator;
>
> Maybe shared_iterator and shared_prefetch_iterator.
Done


0001- same as previous with some changes for freeing the shared memory stuff.
0002- nodeBitmapHeapScan refactoring, this applies independently
0003- actual parallel bitmap stuff applies on top of 0001 and 0002


-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

pgsql-hackers by date:

Previous
From: Stephen Frost
Date:
Subject: Re: [HACKERS] IF (NOT) EXISTS in psql-completion
Next
From: Dilip Kumar
Date:
Subject: Re: [HACKERS] Proposal : Parallel Merge Join