Re: BUG in 10.1 - dsa_area could not attach to a segment that hasbeen freed - Mailing list pgsql-bugs

From Thomas Munro
Subject Re: BUG in 10.1 - dsa_area could not attach to a segment that hasbeen freed
Date
Msg-id CAEepm=0gw062AW2WgiZ5c7sfPU-aLUkv35AQg0PJO8ViRoPepA@mail.gmail.com
Whole thread Raw
In response to Re: BUG in 10.1 - dsa_area could not attach to a segment that hasbeen freed  (Thomas Munro <thomas.munro@enterprisedb.com>)
Responses Re: BUG in 10.1 - dsa_area could not attach to a segment that hasbeen freed  (Alexander Voytsekhovskyy <av@mobile-ua.com>)
List pgsql-bugs
On Thu, Nov 30, 2017 at 10:18 AM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:
> On Thu, Nov 30, 2017 at 9:34 AM, Alexander Voytsekhovskyy
> <young.inbox@gmail.com> wrote:
>> Thanks for helping, here is one more try
>>
>> #0  get_segment_by_index (area=area@entry=0x556026700be8, index=1) at
>> /build/postgresql-10-qAeTPy/postgresql-10-10.1/build/../src/backend/utils/mmgr/dsa.c:1736
>> #1  0x00005560252c2b90 in dsa_get_address (area=area@entry=0x556026700be8,
>> dp=dp@entry=1099511685280) at
>> /build/postgresql-10-qAeTPy/postgresql-10-10.1/build/../src/backend/utils/mmgr/dsa.c:945
>> #2  0x00005560250a2c2b in tbm_attach_shared_iterate
>> (dsa=dsa@entry=0x556026700be8, dp=1099511685280) at
>> /build/postgresql-10-qAeTPy/postgresql-10-10.1/build/../src/backend/nodes/tidbitmap.c:1503
>> #3  0x0000556025066c7b in BitmapHeapNext (node=node@entry=0x556026460710) at
>> /build/postgresql-10-qAeTPy/postgresql-10-10.1/build/../src/backend/executor/nodeBitmapHeapscan.c:176
>
> Thank you for the report and the back trace.  I think this might be a
> manifestation of the problem I just described[1] on -hackers.
> Depending on the shape of a multi-Gather query plan and therefore the
> order of control flow, you might finish up using the DSA area that
> belongs to a different Gather node and then find that it goes away too
> soon.  Investigating.

I haven't managed to reproduce this, but I was coincidentally
investigating a bug that appears to explain it.  I think what happened
is that a background worker was first to execute BitmapHeapNext and
allocated a dsa_pointer, and then the leader process reached
BitmapHeapNext and called tbm_attach_shared_iterate which tried to
deference it, but it had es_query_dsa set to another gather node's DSA
area (whichever Gather most recently ran ExecInitParallelPlan).  That
requires a certain order of execution and timing that I'm not sure how
to reach.  I have posted a patch that should fix it over here:

https://www.postgresql.org/message-id/CAEepm%3D0Mv9BigJPpribGQhnHqVGYo2%2BkmzekGUVJJc9Y_ZVaYA%40mail.gmail.com

Are you able to provide a minimal reproducer, an anonymised partial
dump, or perhaps try out the patch on a copy of your database?

-- 
Thomas Munro
http://www.enterprisedb.com


pgsql-bugs by date:

Previous
From: Ratnakar Tripathy
Date:
Subject: PostgreSQL Installation Errors
Next
From: jasc@gmx.net
Date:
Subject: BUG #14948: cost overflow