Thread: [HACKERS] Re: [BUGS] BUG #14657: Server process segmentation fault in v10, May10th dev snapshot

On 2017/05/18 10:49, Amit Langote wrote:
> On 2017/05/18 2:14, Dilip Kumar wrote:
>> On Wed, May 17, 2017 at 7:41 PM,  <sveinn.sveinsson@gmail.com> wrote:
>>> (gdb) bt
>>> #0  0x000000000061ab1b in list_nth ()
>>> #1  0x00000000005e4081 in ExecLockNonLeafAppendTables ()
>>> #2  0x00000000005f4d52 in ExecInitMergeAppend ()
>>> #3  0x00000000005e0365 in ExecInitNode ()
>>> #4  0x00000000005f35a7 in ExecInitLimit ()
>>> #5  0x00000000005e00f3 in ExecInitNode ()
>>> #6  0x00000000005dd207 in standard_ExecutorStart ()
>>> #7  0x00000000006f96d2 in PortalStart ()
>>> #8  0x00000000006f5c7f in exec_simple_query ()
>>> #9  0x00000000006f6fac in PostgresMain ()
>>> #10 0x0000000000475cdc in ServerLoop ()
>>> #11 0x0000000000692ffa in PostmasterMain ()
>>> #12 0x0000000000476600 in main ()
> 
> Thanks for the test case Sveinn and thanks Dilip for analyzing.
> 
>> Seems like the issue is that the plans under multiple subroots are
>> pointing to the same partitioned_rels.
> 
> That's correct.
> 
>> If I am not getting it wrong "set_plan_refs(PlannerInfo *root, Plan
>> *plan, int rtoffset)" the rtoffset is specific to the subroot. Now,
>> problem is that set_plan_refs called for different subroot is updating
>> the same partition_rel info and make this value completely wrong which
>> will ultimately make ExecLockNonLeafAppendTables to access the out of
>> bound "rte" index.
> 
> Yes.
> 
>> set_plan_refs
>> {
>> [clipped]
>> case T_MergeAppend:
>> {
>>     [clipped]
>>
>>     foreach(l, splan->partitioned_rels)
>>     {
>>          lfirst_int(l) += rtoffset;
>>
>>
>> I think the solution should be that create_merge_append_path make the
>> copy of partitioned_rels list?
> 
> Yes, partitioned_rels should be copied.
> 
>> Attached patch fixes the problem but I am not completely sure about the fix.
> 
> Thanks for creating the patch, although I think a better fix would be to
> make get_partitioned_child_rels() do the list_copy.  That way, any other
> users of partitioned_rels will not suffer the same issue.  Attached patch
> implements that, along with a regression test.
> 
> Added to the open items.

Oops, forgot to cc -hackers.  Patch attached again.

Thanks,
Amit

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment
The patch fixed the problem, thanks a lot.
Regards,
Sveinn.

On fim 18.maí 2017 01:53, Amit Langote wrote:
> On 2017/05/18 10:49, Amit Langote wrote:
>> On 2017/05/18 2:14, Dilip Kumar wrote:
>>> On Wed, May 17, 2017 at 7:41 PM,  <sveinn.sveinsson@gmail.com> wrote:
>>>> (gdb) bt
>>>> #0  0x000000000061ab1b in list_nth ()
>>>> #1  0x00000000005e4081 in ExecLockNonLeafAppendTables ()
>>>> #2  0x00000000005f4d52 in ExecInitMergeAppend ()
>>>> #3  0x00000000005e0365 in ExecInitNode ()
>>>> #4  0x00000000005f35a7 in ExecInitLimit ()
>>>> #5  0x00000000005e00f3 in ExecInitNode ()
>>>> #6  0x00000000005dd207 in standard_ExecutorStart ()
>>>> #7  0x00000000006f96d2 in PortalStart ()
>>>> #8  0x00000000006f5c7f in exec_simple_query ()
>>>> #9  0x00000000006f6fac in PostgresMain ()
>>>> #10 0x0000000000475cdc in ServerLoop ()
>>>> #11 0x0000000000692ffa in PostmasterMain ()
>>>> #12 0x0000000000476600 in main ()
>> Thanks for the test case Sveinn and thanks Dilip for analyzing.
>>
>>> Seems like the issue is that the plans under multiple subroots are
>>> pointing to the same partitioned_rels.
>> That's correct.
>>
>>> If I am not getting it wrong "set_plan_refs(PlannerInfo *root, Plan
>>> *plan, int rtoffset)" the rtoffset is specific to the subroot. Now,
>>> problem is that set_plan_refs called for different subroot is updating
>>> the same partition_rel info and make this value completely wrong which
>>> will ultimately make ExecLockNonLeafAppendTables to access the out of
>>> bound "rte" index.
>> Yes.
>>
>>> set_plan_refs
>>> {
>>> [clipped]
>>> case T_MergeAppend:
>>> {
>>>     [clipped]
>>>
>>>     foreach(l, splan->partitioned_rels)
>>>     {
>>>          lfirst_int(l) += rtoffset;
>>>
>>>
>>> I think the solution should be that create_merge_append_path make the
>>> copy of partitioned_rels list?
>> Yes, partitioned_rels should be copied.
>>
>>> Attached patch fixes the problem but I am not completely sure about the fix.
>> Thanks for creating the patch, although I think a better fix would be to
>> make get_partitioned_child_rels() do the list_copy.  That way, any other
>> users of partitioned_rels will not suffer the same issue.  Attached patch
>> implements that, along with a regression test.
>>
>> Added to the open items.
> Oops, forgot to cc -hackers.  Patch attached again.
>
> Thanks,
> Amit




On Thu, May 18, 2017 at 7:23 AM, Amit Langote
<Langote_Amit_f8@lab.ntt.co.jp> wrote:
> On 2017/05/18 10:49, Amit Langote wrote:
>> On 2017/05/18 2:14, Dilip Kumar wrote:
>>> On Wed, May 17, 2017 at 7:41 PM,  <sveinn.sveinsson@gmail.com> wrote:
>>>> (gdb) bt
>>>> #0  0x000000000061ab1b in list_nth ()
>>>> #1  0x00000000005e4081 in ExecLockNonLeafAppendTables ()
>>>> #2  0x00000000005f4d52 in ExecInitMergeAppend ()
>>>> #3  0x00000000005e0365 in ExecInitNode ()
>>>> #4  0x00000000005f35a7 in ExecInitLimit ()
>>>> #5  0x00000000005e00f3 in ExecInitNode ()
>>>> #6  0x00000000005dd207 in standard_ExecutorStart ()
>>>> #7  0x00000000006f96d2 in PortalStart ()
>>>> #8  0x00000000006f5c7f in exec_simple_query ()
>>>> #9  0x00000000006f6fac in PostgresMain ()
>>>> #10 0x0000000000475cdc in ServerLoop ()
>>>> #11 0x0000000000692ffa in PostmasterMain ()
>>>> #12 0x0000000000476600 in main ()
>>
>> Thanks for the test case Sveinn and thanks Dilip for analyzing.
>>
>>> Seems like the issue is that the plans under multiple subroots are
>>> pointing to the same partitioned_rels.
>>
>> That's correct.
>>
>>> If I am not getting it wrong "set_plan_refs(PlannerInfo *root, Plan
>>> *plan, int rtoffset)" the rtoffset is specific to the subroot. Now,
>>> problem is that set_plan_refs called for different subroot is updating
>>> the same partition_rel info and make this value completely wrong which
>>> will ultimately make ExecLockNonLeafAppendTables to access the out of
>>> bound "rte" index.
>>
>> Yes.
>>
>>> set_plan_refs
>>> {
>>> [clipped]
>>> case T_MergeAppend:
>>> {
>>>     [clipped]
>>>
>>>     foreach(l, splan->partitioned_rels)
>>>     {
>>>          lfirst_int(l) += rtoffset;
>>>
>>>
>>> I think the solution should be that create_merge_append_path make the
>>> copy of partitioned_rels list?
>>
>> Yes, partitioned_rels should be copied.
>>
>>> Attached patch fixes the problem but I am not completely sure about the fix.
>>
>> Thanks for creating the patch, although I think a better fix would be to
>> make get_partitioned_child_rels() do the list_copy.  That way, any other
>> users of partitioned_rels will not suffer the same issue.  Attached patch
>> implements that, along with a regression test.
>>
>> Added to the open items.
>
> Oops, forgot to cc -hackers.  Patch attached again.

May be we should add a comment as to why the copy is needed.

We still have the same copy shared across multiple append paths and
set_plan_refs would change change it underneath those. May not be a
problem right now but may be a problem in the future. Another option,
which consumes a bit less memory is to make a copy at the time of
planning if the path gets selected as the cheapest path.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company



On Fri, May 19, 2017 at 6:07 AM, Ashutosh Bapat
<ashutosh.bapat@enterprisedb.com> wrote:
> We still have the same copy shared across multiple append paths and
> set_plan_refs would change change it underneath those. May not be a
> problem right now but may be a problem in the future.

I agree.  I think it's better for the path-creation functions to copy
the list, so that there is no surprising sharing of substructure.
set_plan_refs() obviously expects this data to be unshared, and this
seems like the best way to ensure that's true in all cases.

Committed that way.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company