Re: [HACKERS] Parallel Append implementation - Mailing list pgsql-hackers

From amul sul
Subject Re: [HACKERS] Parallel Append implementation
Date
Msg-id CAAJ_b97kLNW8Z9nvc_JUUG5wVQUXvG=f37WsX8ALF0A=KAHh3w@mail.gmail.com
Whole thread Raw
In response to Re: [HACKERS] Parallel Append implementation  (Amit Khandekar <amitdkhan.pg@gmail.com>)
Responses Re: [HACKERS] Parallel Append implementation  (Robert Haas <robertmhaas@gmail.com>)
Re: [HACKERS] Parallel Append implementation  (Rafia Sabih <rafia.sabih@enterprisedb.com>)
List pgsql-hackers
On Tue, Nov 21, 2017 at 2:22 PM, Amit Khandekar <amitdkhan.pg@gmail.com> wrote:
> On 21 November 2017 at 12:44, Rafia Sabih <rafia.sabih@enterprisedb.com> wrote:
>> On Mon, Nov 13, 2017 at 12:54 PM, Amit Khandekar <amitdkhan.pg@gmail.com> wrote:
>>> Thanks a lot Robert for the patch. I will have a look. Quickly tried
>>> to test some aggregate queries with a partitioned pgbench_accounts
>>> table, and it is crashing. Will get back with the fix, and any other
>>> review comments.
>>>
>>> Thanks
>>> -Amit Khandekar
>>
>> I was trying to get the performance of this patch at commit id -
>> 11e264517dff7a911d9e6494de86049cab42cde3 and TPC-H scale factor 20
>> with the following parameter settings,
>> work_mem = 1 GB
>> shared_buffers = 10GB
>> effective_cache_size = 10GB
>> max_parallel_workers_per_gather = 4
>> enable_partitionwise_join = on
>>
>> and the details of the partitioning scheme is as follows,
>> tables partitioned = lineitem on l_orderkey and orders on o_orderkey
>> number of partitions in each table = 10
>>
>> As per the explain outputs PA was used in following queries- 1, 3, 4,
>> 5, 6, 7, 8, 10, 12, 14, 15, 18, and 21.
>> Unfortunately, at the time of executing any of these query, it is
>> crashing with the following information in  core dump of each of the
>> workers,
>>
>> Program terminated with signal 11, Segmentation fault.
>> #0  0x0000000010600984 in pg_atomic_read_u32_impl (ptr=0x3ffffec29294)
>> at ../../../../src/include/port/atomics/generic.h:48
>> 48 return ptr->value;
>>
>> In case this a different issue as you pointed upthread, you may want
>> to have a look at this as well.
>> Please let me know if you need any more information in this regard.
>
> Right, for me the crash had occurred with a similar stack, although
> the real crash happened in one of the workers. Attached is the script
> file
> pgbench_partitioned.sql to create a schema with which I had reproduced
> the crash.
>
> The query that crashed :
> select sum(aid), avg(aid) from pgbench_accounts;
>
> Set max_parallel_workers_per_gather to at least 5.
>
> Also attached is v19 patch rebased.
>

I've spent little time to debug this crash. The crash happens in ExecAppend()
due to subnode in node->appendplans array is referred using incorrect
array index (out of bound value) in the following code:

        /*
         * figure out which subplan we are currently processing
         */
        subnode = node->appendplans[node->as_whichplan];

This incorrect value to node->as_whichplan is get assigned in the
choose_next_subplan_for_worker().

By doing following change on the v19 patch does the fix for me:

--- a/src/backend/executor/nodeAppend.c
+++ b/src/backend/executor/nodeAppend.c
@@ -489,11 +489,9 @@ choose_next_subplan_for_worker(AppendState *node)
    }

    /* Pick the plan we found, and advance pa_next_plan one more time. */
-   node->as_whichplan = pstate->pa_next_plan;
+   node->as_whichplan = pstate->pa_next_plan++;
    if (pstate->pa_next_plan == node->as_nplans)
        pstate->pa_next_plan = append->first_partial_plan;
-   else
-       pstate->pa_next_plan++;

    /* If non-partial, immediately mark as finished. */
    if (node->as_whichplan < append->first_partial_plan)

Attaching patch does same changes to Amit's ParallelAppend_v19_rebased.patch.

Regards,
Amul

Attachment

pgsql-hackers by date:

Previous
From: Amit Khandekar
Date:
Subject: Re: [HACKERS] UPDATE of partition key
Next
From: David CARLIER
Date:
Subject: [PATCH] using arc4random for strong randomness matters.