Re: dsa_allocate() faliure - Mailing list pgsql-performance

From Rick Otten
Subject Re: dsa_allocate() faliure
Date
Msg-id CAMAYy4J7EZaDephWc4vRxVq4RCuz9xLw_OQXgyTGVRYreW-UEA@mail.gmail.com
Whole thread Raw
In response to Re: dsa_allocate() faliure  (Thomas Munro <thomas.munro@enterprisedb.com>)
Responses Re: dsa_allocate() faliure
List pgsql-performance
If I do a "set max_parallel_workers_per_gather=0;" before I run the query in that session, it runs just fine.
If I set it to 2, the query dies with the dsa_allocate error.

I'll use that as a work around until 10.2 comes out.  Thanks!  I have something that will help.


On Mon, Jan 29, 2018 at 3:52 PM, Thomas Munro <thomas.munro@enterprisedb.com> wrote:
On Tue, Jan 30, 2018 at 5:37 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Rick Otten <rottenwindfish@gmail.com> writes:
>> I'm wondering if there is anything I can tune in my PG 10.1 database to
>> avoid these errors:
>
>> $  psql -f failing_query.sql
>> psql:failing_query.sql:46: ERROR:  dsa_allocate could not find 7 free pages
>> CONTEXT:  parallel worker
>
> Hmm.  There's only one place in the source code that emits that message
> text:
>
>         /*
>          * Ask the free page manager for a run of pages.  This should always
>          * succeed, since both get_best_segment and make_new_segment should
>          * only return a non-NULL pointer if it actually contains enough
>          * contiguous freespace.  If it does fail, something in our backend
>          * private state is out of whack, so use FATAL to kill the process.
>          */
>         if (!FreePageManagerGet(segment_map->fpm, npages, &first_page))
>             elog(FATAL,
>                  "dsa_allocate could not find %zu free pages", npages);
>
> Now maybe that comment is being unreasonably optimistic, but it sure
> appears that this is supposed to be a can't-happen case, in which case
> you've found a bug.

This is probably the bug fixed here:

https://www.postgresql.org/message-id/E1eQzIl-0004wM-K3%40gemulon.postgresql.org

That was back patched, so 10.2 will contain the fix.  The bug was not
in dsa.c itself, but in the parallel query code that mixed up DSA
areas, corrupting them.  The problem comes up when the query plan has
multiple Gather nodes (and a particular execution pattern) -- is that
the case here, in the EXPLAIN output?  That seems plausible given the
description of a 50-branch UNION.  The only workaround until 10.2
would be to reduce max_parallel_workers_per_gather to 0 to prevent
parallelism completely for this query.

--
Thomas Munro
http://www.enterprisedb.com

pgsql-performance by date:

Previous
From: Thomas Munro
Date:
Subject: Re: dsa_allocate() faliure
Next
From: pavan95
Date:
Subject: Re: 8.2 Autovacuum BUG ?