Thread: dsa_allocate() faliure

dsa_allocate() faliure

From
Rick Otten
Date:
I'm wondering if there is anything I can tune in my PG 10.1 database to avoid these errors:

$  psql -f failing_query.sql
psql:failing_query.sql:46: ERROR:  dsa_allocate could not find 7 free pages
CONTEXT:  parallel worker

I tried throttling back the number of parallel workers to just 2, that didn't help.

The query is joining two views that each have 50 or so underlying queries, unioned, in them.  Unfortunately due to an invalid index, it is sequence scanning some of the tables.   I can't fix the indexes until a few create materialized view commands that are currently running (and have been running for 6 days) finish or I kill them, because they are holding a lock that is blocking any attempt to reindex.

So that leaves me looking for some tunable (hopefully one that doesn't require a restart) which will fix this by adding sufficient resources to the system to allow the dsa_allocate() to find enough (contiguous?) pages.  My system seems to have plenty of extra capacity.

There was a thread on pghackers in December where someone else was seeing a similar error, but couldn't reproduce it consistently.   I've run the above query hundreds of times over the last 24 hours, but just the one fails when I select just the right parameters - and fails every time I run it with those parameters.

In that thread someone speculated it had to do with running many parallel bitmap heap scans in one query.  I count 98 in the query plan.

I'm hoping there is a "magic X tunable" which I just need to bump up a little to let queries like this run without the fatal failure.

Re: dsa_allocate() faliure

From
Tom Lane
Date:
Rick Otten <rottenwindfish@gmail.com> writes:
> I'm wondering if there is anything I can tune in my PG 10.1 database to
> avoid these errors:

> $  psql -f failing_query.sql
> psql:failing_query.sql:46: ERROR:  dsa_allocate could not find 7 free pages
> CONTEXT:  parallel worker

Hmm.  There's only one place in the source code that emits that message
text:

        /*
         * Ask the free page manager for a run of pages.  This should always
         * succeed, since both get_best_segment and make_new_segment should
         * only return a non-NULL pointer if it actually contains enough
         * contiguous freespace.  If it does fail, something in our backend
         * private state is out of whack, so use FATAL to kill the process.
         */
        if (!FreePageManagerGet(segment_map->fpm, npages, &first_page))
            elog(FATAL,
                 "dsa_allocate could not find %zu free pages", npages);

Now maybe that comment is being unreasonably optimistic, but it sure
appears that this is supposed to be a can't-happen case, in which case
you've found a bug.

cc'ing the DSA authors for comment.

            regards, tom lane


Re: dsa_allocate() faliure

From
Thomas Munro
Date:
On Tue, Jan 30, 2018 at 5:37 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Rick Otten <rottenwindfish@gmail.com> writes:
>> I'm wondering if there is anything I can tune in my PG 10.1 database to
>> avoid these errors:
>
>> $  psql -f failing_query.sql
>> psql:failing_query.sql:46: ERROR:  dsa_allocate could not find 7 free pages
>> CONTEXT:  parallel worker
>
> Hmm.  There's only one place in the source code that emits that message
> text:
>
>         /*
>          * Ask the free page manager for a run of pages.  This should always
>          * succeed, since both get_best_segment and make_new_segment should
>          * only return a non-NULL pointer if it actually contains enough
>          * contiguous freespace.  If it does fail, something in our backend
>          * private state is out of whack, so use FATAL to kill the process.
>          */
>         if (!FreePageManagerGet(segment_map->fpm, npages, &first_page))
>             elog(FATAL,
>                  "dsa_allocate could not find %zu free pages", npages);
>
> Now maybe that comment is being unreasonably optimistic, but it sure
> appears that this is supposed to be a can't-happen case, in which case
> you've found a bug.

This is probably the bug fixed here:

https://www.postgresql.org/message-id/E1eQzIl-0004wM-K3%40gemulon.postgresql.org

That was back patched, so 10.2 will contain the fix.  The bug was not
in dsa.c itself, but in the parallel query code that mixed up DSA
areas, corrupting them.  The problem comes up when the query plan has
multiple Gather nodes (and a particular execution pattern) -- is that
the case here, in the EXPLAIN output?  That seems plausible given the
description of a 50-branch UNION.  The only workaround until 10.2
would be to reduce max_parallel_workers_per_gather to 0 to prevent
parallelism completely for this query.

-- 
Thomas Munro
http://www.enterprisedb.com


Re: dsa_allocate() faliure

From
Rick Otten
Date:
If I do a "set max_parallel_workers_per_gather=0;" before I run the query in that session, it runs just fine.
If I set it to 2, the query dies with the dsa_allocate error.

I'll use that as a work around until 10.2 comes out.  Thanks!  I have something that will help.


On Mon, Jan 29, 2018 at 3:52 PM, Thomas Munro <thomas.munro@enterprisedb.com> wrote:
On Tue, Jan 30, 2018 at 5:37 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Rick Otten <rottenwindfish@gmail.com> writes:
>> I'm wondering if there is anything I can tune in my PG 10.1 database to
>> avoid these errors:
>
>> $  psql -f failing_query.sql
>> psql:failing_query.sql:46: ERROR:  dsa_allocate could not find 7 free pages
>> CONTEXT:  parallel worker
>
> Hmm.  There's only one place in the source code that emits that message
> text:
>
>         /*
>          * Ask the free page manager for a run of pages.  This should always
>          * succeed, since both get_best_segment and make_new_segment should
>          * only return a non-NULL pointer if it actually contains enough
>          * contiguous freespace.  If it does fail, something in our backend
>          * private state is out of whack, so use FATAL to kill the process.
>          */
>         if (!FreePageManagerGet(segment_map->fpm, npages, &first_page))
>             elog(FATAL,
>                  "dsa_allocate could not find %zu free pages", npages);
>
> Now maybe that comment is being unreasonably optimistic, but it sure
> appears that this is supposed to be a can't-happen case, in which case
> you've found a bug.

This is probably the bug fixed here:

https://www.postgresql.org/message-id/E1eQzIl-0004wM-K3%40gemulon.postgresql.org

That was back patched, so 10.2 will contain the fix.  The bug was not
in dsa.c itself, but in the parallel query code that mixed up DSA
areas, corrupting them.  The problem comes up when the query plan has
multiple Gather nodes (and a particular execution pattern) -- is that
the case here, in the EXPLAIN output?  That seems plausible given the
description of a 50-branch UNION.  The only workaround until 10.2
would be to reduce max_parallel_workers_per_gather to 0 to prevent
parallelism completely for this query.

--
Thomas Munro
http://www.enterprisedb.com

Re: dsa_allocate() faliure

From
Sand Stone
Date:
>>dsa_allocate could not find 7 free pages
I just this error message again on all of my worker nodes (I am using
Citus 7.4 rel). The PG core is my own build of release_10_stable
(10.4) out of GitHub on Ubuntu.

What's the best way to debug this? I am running pre-production tests
for the next few days, so I could gather info. if necessary (I cannot
pinpoint a query to repro this yet, as we have 10K queries running
concurrently).




On Mon, Jan 29, 2018 at 1:35 PM, Rick Otten <rottenwindfish@gmail.com> wrote:
> If I do a "set max_parallel_workers_per_gather=0;" before I run the query in
> that session, it runs just fine.
> If I set it to 2, the query dies with the dsa_allocate error.
>
> I'll use that as a work around until 10.2 comes out.  Thanks!  I have
> something that will help.
>
>
> On Mon, Jan 29, 2018 at 3:52 PM, Thomas Munro
> <thomas.munro@enterprisedb.com> wrote:
>>
>> On Tue, Jan 30, 2018 at 5:37 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> > Rick Otten <rottenwindfish@gmail.com> writes:
>> >> I'm wondering if there is anything I can tune in my PG 10.1 database to
>> >> avoid these errors:
>> >
>> >> $  psql -f failing_query.sql
>> >> psql:failing_query.sql:46: ERROR:  dsa_allocate could not find 7 free
>> >> pages
>> >> CONTEXT:  parallel worker
>> >
>> > Hmm.  There's only one place in the source code that emits that message
>> > text:
>> >
>> >         /*
>> >          * Ask the free page manager for a run of pages.  This should
>> > always
>> >          * succeed, since both get_best_segment and make_new_segment
>> > should
>> >          * only return a non-NULL pointer if it actually contains enough
>> >          * contiguous freespace.  If it does fail, something in our
>> > backend
>> >          * private state is out of whack, so use FATAL to kill the
>> > process.
>> >          */
>> >         if (!FreePageManagerGet(segment_map->fpm, npages, &first_page))
>> >             elog(FATAL,
>> >                  "dsa_allocate could not find %zu free pages", npages);
>> >
>> > Now maybe that comment is being unreasonably optimistic, but it sure
>> > appears that this is supposed to be a can't-happen case, in which case
>> > you've found a bug.
>>
>> This is probably the bug fixed here:
>>
>>
>> https://www.postgresql.org/message-id/E1eQzIl-0004wM-K3%40gemulon.postgresql.org
>>
>> That was back patched, so 10.2 will contain the fix.  The bug was not
>> in dsa.c itself, but in the parallel query code that mixed up DSA
>> areas, corrupting them.  The problem comes up when the query plan has
>> multiple Gather nodes (and a particular execution pattern) -- is that
>> the case here, in the EXPLAIN output?  That seems plausible given the
>> description of a 50-branch UNION.  The only workaround until 10.2
>> would be to reduce max_parallel_workers_per_gather to 0 to prevent
>> parallelism completely for this query.
>>
>> --
>> Thomas Munro
>> http://www.enterprisedb.com
>
>


Re: dsa_allocate() faliure

From
Thomas Munro
Date:
On Wed, May 23, 2018 at 4:10 PM, Sand Stone <sand.m.stone@gmail.com> wrote:
>>>dsa_allocate could not find 7 free pages
> I just this error message again on all of my worker nodes (I am using
> Citus 7.4 rel). The PG core is my own build of release_10_stable
> (10.4) out of GitHub on Ubuntu.

At which commit ID?

All of your worker nodes... so this happened at the same time or at
different times?  I don't know much about Citus -- do you mean that
these were separate PostgreSQL clusters, and they were all running the
same query and they all crashed like this?

> What's the best way to debug this? I am running pre-production tests
> for the next few days, so I could gather info. if necessary (I cannot
> pinpoint a query to repro this yet, as we have 10K queries running
> concurrently).

Any chance of an EXPLAIN plan for the query that crashed like this?
Do you know if it's using multiple Gather[Merge] nodes and parallel
bitmap heap scans?  Was it a regular backend process or a parallel
worker process (or a Citus worker process, if that is a thing?) that
raised the error?

-- 
Thomas Munro
http://www.enterprisedb.com


Re: dsa_allocate() faliure

From
Sand Stone
Date:
>> At which commit ID?
83fcc615020647268bb129cbf86f7661feee6412 (5/6)

>>do you mean that these were separate PostgreSQL clusters, and they were all running the same query and they all
crashedlike this?
 
A few worker nodes, a table is hash partitioned by "aTable.did" by
Citus, and further partitioned by PG10 by time range on field "ts". As
far as I could tell, Citus just does a query rewrite, and execute the
same type of queries to all nodes.

>>so this happened at the same time or at different times?
At the same time. The queries are simple count and sum queries, here
is the relevant part from one of the worker nodes:
2018-05-23 01:24:01.492 UTC [130536] ERROR:  dsa_allocate could not
find 7 free pages
2018-05-23 01:24:01.492 UTC [130536] CONTEXT:  parallel worker
STATEMENT:  COPY (SELECT count(1) AS count, sum(worker_column_1) AS
sum FROM (SELECT subquery.avg AS worker_column_1 FROM (SELECT
aTable.did, avg((aTable.sum OPERATOR(pg_catalog./)
(aTable.count)::double precision)) AS avg FROM public.aTable_102117
aTable WHERE ((aTable.ts OPERATOR(pg_catalog.>=) '2018-04-25
00:00:00+00'::timestamp with time zone) AND (aTable.ts
OPERATOR(pg_catalog.<=) '2018-04-30 00:00:00+00'::timestamp with time
zone) AND (aTable.v OPERATOR(pg_catalog.=) 12345)) GROUP BY
aTable.did) subquery) worker_subquery) TO STDOUT WITH (FORMAT binary)


>> a parallel worker process
I think this is more of PG10 parallel bg worker issue. I don't think
Citus just lets each worker PG server do its own planning.

I will try to do more experiments about this, and see if there is any
specific query to cause the parallel query execution to fail. As far
as I can tell, the level of concurrency triggered this issue. That is
executing 10s of queries as shown on the worker nodes, depending on
the stats, the PG10 core may or may not spawn more bg workers.

Thanks for your time!





On Tue, May 22, 2018 at 9:44 PM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:
> On Wed, May 23, 2018 at 4:10 PM, Sand Stone <sand.m.stone@gmail.com> wrote:
>>>>dsa_allocate could not find 7 free pages
>> I just this error message again on all of my worker nodes (I am using
>> Citus 7.4 rel). The PG core is my own build of release_10_stable
>> (10.4) out of GitHub on Ubuntu.
>
> At which commit ID?
>
> All of your worker nodes... so this happened at the same time or at
> different times?  I don't know much about Citus -- do you mean that
> these were separate PostgreSQL clusters, and they were all running the
> same query and they all crashed like this?
>
>> What's the best way to debug this? I am running pre-production tests
>> for the next few days, so I could gather info. if necessary (I cannot
>> pinpoint a query to repro this yet, as we have 10K queries running
>> concurrently).
>
> Any chance of an EXPLAIN plan for the query that crashed like this?
> Do you know if it's using multiple Gather[Merge] nodes and parallel
> bitmap heap scans?  Was it a regular backend process or a parallel
> worker process (or a Citus worker process, if that is a thing?) that
> raised the error?
>
> --
> Thomas Munro
> http://www.enterprisedb.com


Re: dsa_allocate() faliure

From
Sand Stone
Date:
Just as a follow up. I tried the parallel execution again (in a stress
test environment). Now the crash seems gone. I will keep an eye on
this for the next few weeks.

My theory is that the Citus cluster created and shut down a lot of TCP
connections between coordinator and workers. If running on untuned
Linux machines, the TCP ports might run out.

Of course, I am using "newer" PG10 bits and Citus7.5 this time.
On Wed, May 23, 2018 at 7:06 AM Sand Stone <sand.m.stone@gmail.com> wrote:
>
> >> At which commit ID?
> 83fcc615020647268bb129cbf86f7661feee6412 (5/6)
>
> >>do you mean that these were separate PostgreSQL clusters, and they were all running the same query and they all
crashedlike this?
 
> A few worker nodes, a table is hash partitioned by "aTable.did" by
> Citus, and further partitioned by PG10 by time range on field "ts". As
> far as I could tell, Citus just does a query rewrite, and execute the
> same type of queries to all nodes.
>
> >>so this happened at the same time or at different times?
> At the same time. The queries are simple count and sum queries, here
> is the relevant part from one of the worker nodes:
> 2018-05-23 01:24:01.492 UTC [130536] ERROR:  dsa_allocate could not
> find 7 free pages
> 2018-05-23 01:24:01.492 UTC [130536] CONTEXT:  parallel worker
> STATEMENT:  COPY (SELECT count(1) AS count, sum(worker_column_1) AS
> sum FROM (SELECT subquery.avg AS worker_column_1 FROM (SELECT
> aTable.did, avg((aTable.sum OPERATOR(pg_catalog./)
> (aTable.count)::double precision)) AS avg FROM public.aTable_102117
> aTable WHERE ((aTable.ts OPERATOR(pg_catalog.>=) '2018-04-25
> 00:00:00+00'::timestamp with time zone) AND (aTable.ts
> OPERATOR(pg_catalog.<=) '2018-04-30 00:00:00+00'::timestamp with time
> zone) AND (aTable.v OPERATOR(pg_catalog.=) 12345)) GROUP BY
> aTable.did) subquery) worker_subquery) TO STDOUT WITH (FORMAT binary)
>
>
> >> a parallel worker process
> I think this is more of PG10 parallel bg worker issue. I don't think
> Citus just lets each worker PG server do its own planning.
>
> I will try to do more experiments about this, and see if there is any
> specific query to cause the parallel query execution to fail. As far
> as I can tell, the level of concurrency triggered this issue. That is
> executing 10s of queries as shown on the worker nodes, depending on
> the stats, the PG10 core may or may not spawn more bg workers.
>
> Thanks for your time!
>
>
>
>
>
> On Tue, May 22, 2018 at 9:44 PM, Thomas Munro
> <thomas.munro@enterprisedb.com> wrote:
> > On Wed, May 23, 2018 at 4:10 PM, Sand Stone <sand.m.stone@gmail.com> wrote:
> >>>>dsa_allocate could not find 7 free pages
> >> I just this error message again on all of my worker nodes (I am using
> >> Citus 7.4 rel). The PG core is my own build of release_10_stable
> >> (10.4) out of GitHub on Ubuntu.
> >
> > At which commit ID?
> >
> > All of your worker nodes... so this happened at the same time or at
> > different times?  I don't know much about Citus -- do you mean that
> > these were separate PostgreSQL clusters, and they were all running the
> > same query and they all crashed like this?
> >
> >> What's the best way to debug this? I am running pre-production tests
> >> for the next few days, so I could gather info. if necessary (I cannot
> >> pinpoint a query to repro this yet, as we have 10K queries running
> >> concurrently).
> >
> > Any chance of an EXPLAIN plan for the query that crashed like this?
> > Do you know if it's using multiple Gather[Merge] nodes and parallel
> > bitmap heap scans?  Was it a regular backend process or a parallel
> > worker process (or a Citus worker process, if that is a thing?) that
> > raised the error?
> >
> > --
> > Thomas Munro
> > http://www.enterprisedb.com


Re: dsa_allocate() faliure

From
Thomas Munro
Date:
On Thu, Aug 16, 2018 at 8:32 AM, Sand Stone <sand.m.stone@gmail.com> wrote:
> Just as a follow up. I tried the parallel execution again (in a stress
> test environment). Now the crash seems gone. I will keep an eye on
> this for the next few weeks.

Thanks for the report.  That's great news, but it'd be good to
understand why it was happening.

> My theory is that the Citus cluster created and shut down a lot of TCP
> connections between coordinator and workers. If running on untuned
> Linux machines, the TCP ports might run out.

I'm not sure how that's relevant, unless perhaps it causes executor
nodes to be invoked in a strange sequence that commit fd7c0fa7 didn't
fix?  I wonder if there could be something different about the control
flow with custom scans, or something about the way Citus worker nodes
invoke plan fragments, or some error path that I failed to consider...
It's a clue that all of your worker nodes reliably crashed at the same
time on the same/similar queries (presumably distributed query
fragments for different shards), making it seem more like a
common-or-garden bug rather than some kind of timing-based heisenbug.
If you ever manage to reproduce it, an explain plan and a back trace
would be very useful.

> Of course, I am using "newer" PG10 bits and Citus7.5 this time.

Hmm.  There weren't any relevant commits to REL_10_STABLE that I can
think of.  And (with the proviso that I know next to nothing about
Citus) I just cloned https://github.com/citusdata/citus.git and
skimmed through "git diff origin/release-7.4..origin/release-7.5", and
nothing is jumping out at me.  Can you still see the problem with
Citus 7.4?

-- 
Thomas Munro
http://www.enterprisedb.com


Re: dsa_allocate() faliure

From
Sand Stone
Date:
>Can you still see the problem with Citus 7.4?
Hi, Thomas. I actually went back to the cluster with Citus7.4 and
PG10.4. And modified the parallel param. So far, I haven't seen any
server crash.

The main difference between crashes observed and no crash, is the set
of Linux TCP time out parameters (to release the ports faster).
Unfortunately, I cannot "undo" the Linux params and run the stress
tests anymore, as this is a multi-million $ cluster and people are
doing more useful things on it. I will keep an eye on any parallel
execution issue.


On Wed, Aug 15, 2018 at 3:43 PM Thomas Munro
<thomas.munro@enterprisedb.com> wrote:
>
> On Thu, Aug 16, 2018 at 8:32 AM, Sand Stone <sand.m.stone@gmail.com> wrote:
> > Just as a follow up. I tried the parallel execution again (in a stress
> > test environment). Now the crash seems gone. I will keep an eye on
> > this for the next few weeks.
>
> Thanks for the report.  That's great news, but it'd be good to
> understand why it was happening.
>
> > My theory is that the Citus cluster created and shut down a lot of TCP
> > connections between coordinator and workers. If running on untuned
> > Linux machines, the TCP ports might run out.
>
> I'm not sure how that's relevant, unless perhaps it causes executor
> nodes to be invoked in a strange sequence that commit fd7c0fa7 didn't
> fix?  I wonder if there could be something different about the control
> flow with custom scans, or something about the way Citus worker nodes
> invoke plan fragments, or some error path that I failed to consider...
> It's a clue that all of your worker nodes reliably crashed at the same
> time on the same/similar queries (presumably distributed query
> fragments for different shards), making it seem more like a
> common-or-garden bug rather than some kind of timing-based heisenbug.
> If you ever manage to reproduce it, an explain plan and a back trace
> would be very useful.
>
> > Of course, I am using "newer" PG10 bits and Citus7.5 this time.
>
> Hmm.  There weren't any relevant commits to REL_10_STABLE that I can
> think of.  And (with the proviso that I know next to nothing about
> Citus) I just cloned https://github.com/citusdata/citus.git and
> skimmed through "git diff origin/release-7.4..origin/release-7.5", and
> nothing is jumping out at me.  Can you still see the problem with
> Citus 7.4?
>
> --
> Thomas Munro
> http://www.enterprisedb.com


Re: dsa_allocate() faliure

From
Sand Stone
Date:
I attached a query (and its query plan) that caused the crash: "dsa_allocate could not find 13 free pages" on one of the worker nodes. I anonymised the query text a bit.  Interestingly, this time only one (same one) of the nodes is crashing. Since this is a production environment, I cannot get the stack trace. Once turned off parallel execution for this node. The whole query finished just fine. So the parallel query plan is from one of the nodes not crashed, hopefully the same plan would have been executed on the crashed node. In theory, every worker node has the same bits, and very similar data. 

===
psql (10.4)
\dx
                       List of installed extensions
      Name      | Version |   Schema   |            Description            
----------------+---------+------------+-----------------------------------
 citus          | 7.4-3   | pg_catalog | Citus distributed database
 hll            | 2.10    | public     | type for storing hyperloglog data
plpgsql        | 1.0     | pg_catalog | PL/pgSQL procedural language


On Sat, Aug 25, 2018 at 7:46 AM Sand Stone <sand.m.stone@gmail.com> wrote:
>Can you still see the problem with Citus 7.4?
Hi, Thomas. I actually went back to the cluster with Citus7.4 and
PG10.4. And modified the parallel param. So far, I haven't seen any
server crash.

The main difference between crashes observed and no crash, is the set
of Linux TCP time out parameters (to release the ports faster).
Unfortunately, I cannot "undo" the Linux params and run the stress
tests anymore, as this is a multi-million $ cluster and people are
doing more useful things on it. I will keep an eye on any parallel
execution issue.


On Wed, Aug 15, 2018 at 3:43 PM Thomas Munro
<thomas.munro@enterprisedb.com> wrote:
>
> On Thu, Aug 16, 2018 at 8:32 AM, Sand Stone <sand.m.stone@gmail.com> wrote:
> > Just as a follow up. I tried the parallel execution again (in a stress
> > test environment). Now the crash seems gone. I will keep an eye on
> > this for the next few weeks.
>
> Thanks for the report.  That's great news, but it'd be good to
> understand why it was happening.
>
> > My theory is that the Citus cluster created and shut down a lot of TCP
> > connections between coordinator and workers. If running on untuned
> > Linux machines, the TCP ports might run out.
>
> I'm not sure how that's relevant, unless perhaps it causes executor
> nodes to be invoked in a strange sequence that commit fd7c0fa7 didn't
> fix?  I wonder if there could be something different about the control
> flow with custom scans, or something about the way Citus worker nodes
> invoke plan fragments, or some error path that I failed to consider...
> It's a clue that all of your worker nodes reliably crashed at the same
> time on the same/similar queries (presumably distributed query
> fragments for different shards), making it seem more like a
> common-or-garden bug rather than some kind of timing-based heisenbug.
> If you ever manage to reproduce it, an explain plan and a back trace
> would be very useful.
>
> > Of course, I am using "newer" PG10 bits and Citus7.5 this time.
>
> Hmm.  There weren't any relevant commits to REL_10_STABLE that I can
> think of.  And (with the proviso that I know next to nothing about
> Citus) I just cloned https://github.com/citusdata/citus.git and
> skimmed through "git diff origin/release-7.4..origin/release-7.5", and
> nothing is jumping out at me.  Can you still see the problem with
> Citus 7.4?
>
> --
> Thomas Munro
> http://www.enterprisedb.com
Attachment

Re: dsa_allocate() faliure

From
Thomas Munro
Date:
On Wed, Aug 29, 2018 at 5:48 PM Sand Stone <sand.m.stone@gmail.com> wrote:
> I attached a query (and its query plan) that caused the crash: "dsa_allocate could not find 13 free pages" on one of
theworker nodes. I anonymised the query text a bit.  Interestingly, this time only one (same one) of the nodes is
crashing.Since this is a production environment, I cannot get the stack trace. Once turned off parallel execution for
thisnode. The whole query finished just fine. So the parallel query plan is from one of the nodes not crashed,
hopefullythe same plan would have been executed on the crashed node. In theory, every worker node has the same bits,
andvery similar data. 

I wonder if this was a different symptom of the problem fixed here:

https://www.postgresql.org/message-id/flat/194c0706-c65b-7d81-ab32-2c248c3e2344%402ndquadrant.com

Can you still reproduce it on current master, REL_11_STABLE or REL_10_STABLE?

--
Thomas Munro
http://www.enterprisedb.com


Re: dsa_allocate() faliure

From
Jakub Glapa
Date:
Hi, I'm also experiencing the problem: dsa_allocate could not find 7 free pages CONTEXT: parallel worker

I'm running: PostgreSQL 10.5 (Ubuntu 10.5-1.pgdg16.04+1) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 5.4.0-6ubuntu1~16.04.10) 5.4.0 20160609, 64-bit

query plan: (select statement over parent table to many partitions):
select ... 
from fa 
where c_id in (<ID_LIST>) and 
datetime >= '2018/01/01' 
and ((dims ? 'p' and dims ? 'mcp') 
or (datasource in (FA', 'GA'))) 
and not datasource = 'm' 
GROUP BY datasource, dims ->'ct', dims ->'mcp', dims -> 'p', dims -> 'sp':
Finalize GroupAggregate  (cost=31514757.77..31519357.77 rows=40000 width=223)
   Group Key: fa.datasource, ((fa.dims -> 'ct'::text)), ((fa.dims -> 'mcp'::text)), ((fa.dims -> 'p'::text)), ((fa.dims -> 'sp'::text))
   ->  Sort  (cost=31514757.77..31515057.77 rows=120000 width=223)
         Sort Key: fa.datasource, ((fa.dims -> 'ct'::text)), ((fa.dims -> 'mcp'::text)), ((fa.dims -> 'p'::text)), ((fa.dims -> 'sp'::text))
         ->  Gather  (cost=31491634.17..31504634.17 rows=120000 width=223)
               Workers Planned: 3
               ->  Partial HashAggregate  (cost=31490634.17..31491634.17 rows=40000 width=223)
                     Group Key: fa.datasource, (fa.dims -> 'ct'::text), (fa.dims -> 'mcp'::text), (fa.dims -> 'p'::text), (fa.dims -> 'sp'::text)
                     ->  Result  (cost=0.00..31364713.39 rows=5596479 width=175)
                           ->  Append  (cost=0.00..31252783.81 rows=5596479 width=659)
                                 ->  Parallel Seq Scan on fa  (cost=0.00..0.00 rows=1 width=580)
                                       Filter: ((datetime >= '2018-01-01 00:00:00+01'::timestamp with time zone) AND ((datasource)::text <> 'M'::text) AND (((dims ? 'p'::text) AND (dims ? 'mcp'::text)) OR ((datasource)::text =
ANY ('{"FA","GA"}'::text[]))) AND (c_id = ANY ('{<ID_LIST>}'::bigint[])))
                                 ->  Parallel Bitmap Heap Scan on fa_10  (cost=1226.36..53641.49 rows=1 width=1290)
                                       Recheck Cond: (datetime >= '2018-01-01 00:00:00+01'::timestamp with time zone)
                                       Filter: (((datasource)::text <> 'M'::text) AND (((dims ? 'p'::text) AND (dims ? 'mcp'::text)) OR ((datasource)::text = ANY ('<ID_LIST>'::bigint[])))
                                       ->  Bitmap Index Scan on fa_10_rangestart  (cost=0.00..1226.36 rows=32259 width=0)
                                             Index Cond: (datetime >= '2018-01-01 00:00:00+01'::timestamp with time zone)
                                 ->  Parallel Seq Scan on fa_105  (cost=0.00..11.99 rows=1 width=580)
                                       Filter: ((datetime >= '2018-01-01 00:00:00+01'::timestamp with time zone) AND ((datasource)::text <> 'M'::text) AND (((dims ? 'p'::text) AND (dims ? 'mcp'::text)) OR ((datasource)::text =
ANY ('{"FA","GA"}'::text[]))) AND (c_id = ANY ('<ID_LIST>'::bigint[])))
                                 ->  Parallel Seq Scan on fa_106  (cost=0.00..11.99 rows=1 width=580)
                                       Filter: ((datetime >= '2018-01-01 00:00:00+01'::timestamp with time zone) AND ((datasource)::text <> 'M'::text) AND (((dims ? 'p'::text) AND (dims ? 'mcp'::text)) OR ((datasource)::text =
ANY ('{"FA","GA"}'::text[]))) AND (c_id = ANY ('<ID_LIST>..........


--
regards,
Jakub Glapa

Re: dsa_allocate() faliure

From
Jakub Glapa
Date:
Looks like my email didn't match the right thread: https://www.postgresql.org/message-id/flat/CAMAYy4%2Bw3NTBM5JLWFi8twhWK4%3Dk_5L4nV5%2BbYDSPu8r4b97Zg%40mail.gmail.com
Any chance to get some feedback on this?
--
regards,
Jakub Glapa


On Tue, Nov 13, 2018 at 2:08 PM Jakub Glapa <jakub.glapa@gmail.com> wrote:
Hi, I'm also experiencing the problem: dsa_allocate could not find 7 free pages CONTEXT: parallel worker

I'm running: PostgreSQL 10.5 (Ubuntu 10.5-1.pgdg16.04+1) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 5.4.0-6ubuntu1~16.04.10) 5.4.0 20160609, 64-bit

query plan: (select statement over parent table to many partitions):
select ... 
from fa 
where c_id in (<ID_LIST>) and 
datetime >= '2018/01/01' 
and ((dims ? 'p' and dims ? 'mcp') 
or (datasource in (FA', 'GA'))) 
and not datasource = 'm' 
GROUP BY datasource, dims ->'ct', dims ->'mcp', dims -> 'p', dims -> 'sp':
Finalize GroupAggregate  (cost=31514757.77..31519357.77 rows=40000 width=223)
   Group Key: fa.datasource, ((fa.dims -> 'ct'::text)), ((fa.dims -> 'mcp'::text)), ((fa.dims -> 'p'::text)), ((fa.dims -> 'sp'::text))
   ->  Sort  (cost=31514757.77..31515057.77 rows=120000 width=223)
         Sort Key: fa.datasource, ((fa.dims -> 'ct'::text)), ((fa.dims -> 'mcp'::text)), ((fa.dims -> 'p'::text)), ((fa.dims -> 'sp'::text))
         ->  Gather  (cost=31491634.17..31504634.17 rows=120000 width=223)
               Workers Planned: 3
               ->  Partial HashAggregate  (cost=31490634.17..31491634.17 rows=40000 width=223)
                     Group Key: fa.datasource, (fa.dims -> 'ct'::text), (fa.dims -> 'mcp'::text), (fa.dims -> 'p'::text), (fa.dims -> 'sp'::text)
                     ->  Result  (cost=0.00..31364713.39 rows=5596479 width=175)
                           ->  Append  (cost=0.00..31252783.81 rows=5596479 width=659)
                                 ->  Parallel Seq Scan on fa  (cost=0.00..0.00 rows=1 width=580)
                                       Filter: ((datetime >= '2018-01-01 00:00:00+01'::timestamp with time zone) AND ((datasource)::text <> 'M'::text) AND (((dims ? 'p'::text) AND (dims ? 'mcp'::text)) OR ((datasource)::text =
ANY ('{"FA","GA"}'::text[]))) AND (c_id = ANY ('{<ID_LIST>}'::bigint[])))
                                 ->  Parallel Bitmap Heap Scan on fa_10  (cost=1226.36..53641.49 rows=1 width=1290)
                                       Recheck Cond: (datetime >= '2018-01-01 00:00:00+01'::timestamp with time zone)
                                       Filter: (((datasource)::text <> 'M'::text) AND (((dims ? 'p'::text) AND (dims ? 'mcp'::text)) OR ((datasource)::text = ANY ('<ID_LIST>'::bigint[])))
                                       ->  Bitmap Index Scan on fa_10_rangestart  (cost=0.00..1226.36 rows=32259 width=0)
                                             Index Cond: (datetime >= '2018-01-01 00:00:00+01'::timestamp with time zone)
                                 ->  Parallel Seq Scan on fa_105  (cost=0.00..11.99 rows=1 width=580)
                                       Filter: ((datetime >= '2018-01-01 00:00:00+01'::timestamp with time zone) AND ((datasource)::text <> 'M'::text) AND (((dims ? 'p'::text) AND (dims ? 'mcp'::text)) OR ((datasource)::text =
ANY ('{"FA","GA"}'::text[]))) AND (c_id = ANY ('<ID_LIST>'::bigint[])))
                                 ->  Parallel Seq Scan on fa_106  (cost=0.00..11.99 rows=1 width=580)
                                       Filter: ((datetime >= '2018-01-01 00:00:00+01'::timestamp with time zone) AND ((datasource)::text <> 'M'::text) AND (((dims ? 'p'::text) AND (dims ? 'mcp'::text)) OR ((datasource)::text =
ANY ('{"FA","GA"}'::text[]))) AND (c_id = ANY ('<ID_LIST>..........


--
regards,
Jakub Glapa

Re: dsa_allocate() faliure

From
Justin Pryzby
Date:
On Wed, Nov 21, 2018 at 03:26:42PM +0100, Jakub Glapa wrote:
> Looks like my email didn't match the right thread:
>
https://www.postgresql.org/message-id/flat/CAMAYy4%2Bw3NTBM5JLWFi8twhWK4%3Dk_5L4nV5%2BbYDSPu8r4b97Zg%40mail.gmail.com
> Any chance to get some feedback on this?

In the related thread, it looks like Thomas backpatched a fix to v10, and so I
guess this should be resolved in 10.6, which was released couple weeks ago.
https://www.postgresql.org/message-id/CAEepm%3D0QxoUSkFqYbvmxi2eNvvU6BkqH6fTOu4oOzc1MRAT4Dw%40mail.gmail.com

Could you upgrade and check ?

38763d67784c6563d08dbea5c9f913fa174779b8 in master

|commit ba20d392584cdecc2808fe936448d127f43f2c07
|Author: Thomas Munro <tmunro@postgresql.org>
|Date:   Thu Sep 20 15:52:39 2018 +1200
|
|    Fix segment_bins corruption in dsa.c.

Justin


Re: dsa_allocate() faliure

From
Jakub Glapa
Date:
Hi Justin, I've upgrade to 10.6 but the error still shows up:

psql db@host as user => select version();
                                                                   version                                                                   
─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
 PostgreSQL 10.6 (Ubuntu 10.6-1.pgdg16.04+1) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 5.4.0-6ubuntu1~16.04.10) 5.4.0 20160609, 64-bit
(1 row)

Time: 110.512 ms

psql db@host as user  => select <COLUMNS> from fa where client_id in (<IDS>) and datetime >= '2018/01/01' and ((dims ? 'p' and dimensions ? 'mcp') or (datasource in ('FA', 'GA'))) and not datasource = 'M' GROUP BY datasource, dims ->'ct', dimensions ->'mct', dims -> 'p', dims -> 'sp';
ERROR:  XX000: dsa_allocate could not find 7 free pages
CONTEXT:  parallel worker
LOCATION:  dsa_allocate_extended, dsa.c:729
Time: 131400.831 ms (02:11.401)

the above is execute with max_parallel_workers=8
If I set it to max_parallel_workers=0 I also get and my connection is being closed (but the server is alive):

psql db@host as user => set max_parallel_workers=0;
SET
Time: 89.542 ms
psql db@host as user => SELECT <QUERY>;
FATAL:  XX000: dsa_allocate could not find 7 free pages
LOCATION:  dsa_allocate_extended, dsa.c:729
SSL connection has been closed unexpectedly
The connection to the server was lost. Attempting reset: Succeeded.
Time: 200390.466 ms (03:20.390)



--
regards,
Jakub Glapa


On Thu, Nov 22, 2018 at 5:10 PM Justin Pryzby <pryzby@telsasoft.com> wrote:
On Wed, Nov 21, 2018 at 03:26:42PM +0100, Jakub Glapa wrote:
> Looks like my email didn't match the right thread:
> https://www.postgresql.org/message-id/flat/CAMAYy4%2Bw3NTBM5JLWFi8twhWK4%3Dk_5L4nV5%2BbYDSPu8r4b97Zg%40mail.gmail.com
> Any chance to get some feedback on this?

In the related thread, it looks like Thomas backpatched a fix to v10, and so I
guess this should be resolved in 10.6, which was released couple weeks ago.
https://www.postgresql.org/message-id/CAEepm%3D0QxoUSkFqYbvmxi2eNvvU6BkqH6fTOu4oOzc1MRAT4Dw%40mail.gmail.com

Could you upgrade and check ?

38763d67784c6563d08dbea5c9f913fa174779b8 in master

|commit ba20d392584cdecc2808fe936448d127f43f2c07
|Author: Thomas Munro <tmunro@postgresql.org>
|Date:   Thu Sep 20 15:52:39 2018 +1200
|
|    Fix segment_bins corruption in dsa.c.

Justin

Re: dsa_allocate() faliure

From
Justin Pryzby
Date:
On Fri, Nov 23, 2018 at 03:31:41PM +0100, Jakub Glapa wrote:
> Hi Justin, I've upgrade to 10.6 but the error still shows up:
> 
> If I set it to max_parallel_workers=0 I also get and my connection is being
> closed (but the server is alive):
> 
> psql db@host as user => set max_parallel_workers=0;

Can you show the plan (explain without analyze) for the nonparallel case?

Also, it looks like the server crashed in that case (even if it restarted
itself quickly).  Can you confirm ?

For example: dmesg |tail might show "postmaster[8582]: segfault [...]" or
similar.  And other clients would've been disconnected.  (For example, you'd
get an error in another, previously-connected session the next time you run:
SELECT 1).

In any case, could you try to find a minimal way to reproduce the problem ?  I
mean, is the dataset and query small and something you can publish, or can you
reproduce with data generated from (for example) generate_series() ?

Thanks,
Justin


Re: dsa_allocate() faliure

From
Jakub Glapa
Date:
So, the issue occurs only on production db an right now I cannot reproduce it.
I had a look at dmesg and indeed I see something like:


--
regards,
pozdrawiam,
Jakub Glapa


On Fri, Nov 23, 2018 at 5:10 PM Justin Pryzby <pryzby@telsasoft.com> wrote:
On Fri, Nov 23, 2018 at 03:31:41PM +0100, Jakub Glapa wrote:
> Hi Justin, I've upgrade to 10.6 but the error still shows up:
>
> If I set it to max_parallel_workers=0 I also get and my connection is being
> closed (but the server is alive):
>
> psql db@host as user => set max_parallel_workers=0;

Can you show the plan (explain without analyze) for the nonparallel case?

Also, it looks like the server crashed in that case (even if it restarted
itself quickly).  Can you confirm ?

For example: dmesg |tail might show "postmaster[8582]: segfault [...]" or
similar.  And other clients would've been disconnected.  (For example, you'd
get an error in another, previously-connected session the next time you run:
SELECT 1).

In any case, could you try to find a minimal way to reproduce the problem ?  I
mean, is the dataset and query small and something you can publish, or can you
reproduce with data generated from (for example) generate_series() ?

Thanks,
Justin

Re: dsa_allocate() faliure

From
Jakub Glapa
Date:
sorry, the message was sent out to early.

So, the issue occurs only on production db an right now I cannot reproduce it.
I had a look at dmesg and indeed I see something like:

postgres[30667]: segfault at 0 ip 0000557834264b16 sp 00007ffc2ce1e030 error 4 in postgres[557833db7000+6d5000]

and AFAIR other sessions I had opened at that time were indeed disconnected.

When it comes to the execution plan for max_parallel_workers=0. 
There is no real difference.
I guess max_parallel_workers has no effect and max_parallel_workers_per_gather should have been used.
 Why it caused a server crash is unknown right now.

I cannot really give a reproducible recipe. 
My case is that I have a parent table with ~300 partitions. 
And I initiate a select on ~100 of them with select [...] from fa where client_id(<IDS>) and [filters].  
I know this is not effective. Every partition has several indexes and this query acquires a lot of locks... even for relations not used in the query. 
PG11 should have better partition pruning mechanism but I'm not there yet to upgrade.
Some of the partitions have millions of rows.

I'll keep observing maybe I'l find a pattern when this occurs.


--
regards,
pozdrawiam,
Jakub Glapa


On Mon, Nov 26, 2018 at 4:26 PM Jakub Glapa <jakub.glapa@gmail.com> wrote:
So, the issue occurs only on production db an right now I cannot reproduce it.
I had a look at dmesg and indeed I see something like:


--
regards,
Jakub Glapa


On Fri, Nov 23, 2018 at 5:10 PM Justin Pryzby <pryzby@telsasoft.com> wrote:
On Fri, Nov 23, 2018 at 03:31:41PM +0100, Jakub Glapa wrote:
> Hi Justin, I've upgrade to 10.6 but the error still shows up:
>
> If I set it to max_parallel_workers=0 I also get and my connection is being
> closed (but the server is alive):
>
> psql db@host as user => set max_parallel_workers=0;

Can you show the plan (explain without analyze) for the nonparallel case?

Also, it looks like the server crashed in that case (even if it restarted
itself quickly).  Can you confirm ?

For example: dmesg |tail might show "postmaster[8582]: segfault [...]" or
similar.  And other clients would've been disconnected.  (For example, you'd
get an error in another, previously-connected session the next time you run:
SELECT 1).

In any case, could you try to find a minimal way to reproduce the problem ?  I
mean, is the dataset and query small and something you can publish, or can you
reproduce with data generated from (for example) generate_series() ?

Thanks,
Justin

Re: dsa_allocate() faliure

From
Justin Pryzby
Date:
Hi, thanks for following through.

On Mon, Nov 26, 2018 at 04:38:35PM +0100, Jakub Glapa wrote:
> I had a look at dmesg and indeed I see something like:
> 
> postgres[30667]: segfault at 0 ip 0000557834264b16 sp 00007ffc2ce1e030
> error 4 in postgres[557833db7000+6d5000]

That's useful, I think "at 0" means a null pointer dereferenced.

Can you check /var/log/messages (or ./syslog or similar) and verify the
timestamp matches the time of the last crash (and not an unrelated crash) ?

The logs might also indicate if the process dumped a core file anywhere. 

I don't know what distribution/OS you're using, but it might be good to install
abrt (RHEL) or apport (ubuntu) or other mechanism to save coredumps, or to
manually configure /proc/sys/kernel/core_pattern.

On centos, I usually set:
/etc/abrt/abrt-action-save-package-data.conf
OpenGPGCheck = no

Also, it might be good to install debug symbols, in case you do find a core
dump now or get one later.

On centos: yum install postgresql10-debuginfo or debuginfo-install postgresql10-server
Make sure this exactly matches the debug symbols exactly match the server version.

Justin


Re: dsa_allocate() faliure

From
Jakub Glapa
Date:
Justin thanks for the information! 
I'm running Ubuntu 16.04. 
I'll try to prepare for the next crash. 
Couldn't find anything this time.


--
regards,
Jakub Glapa


On Mon, Nov 26, 2018 at 4:52 PM Justin Pryzby <pryzby@telsasoft.com> wrote:
Hi, thanks for following through.

On Mon, Nov 26, 2018 at 04:38:35PM +0100, Jakub Glapa wrote:
> I had a look at dmesg and indeed I see something like:
>
> postgres[30667]: segfault at 0 ip 0000557834264b16 sp 00007ffc2ce1e030
> error 4 in postgres[557833db7000+6d5000]

That's useful, I think "at 0" means a null pointer dereferenced.

Can you check /var/log/messages (or ./syslog or similar) and verify the
timestamp matches the time of the last crash (and not an unrelated crash) ?

The logs might also indicate if the process dumped a core file anywhere.

I don't know what distribution/OS you're using, but it might be good to install
abrt (RHEL) or apport (ubuntu) or other mechanism to save coredumps, or to
manually configure /proc/sys/kernel/core_pattern.

On centos, I usually set:
/etc/abrt/abrt-action-save-package-data.conf
OpenGPGCheck = no

Also, it might be good to install debug symbols, in case you do find a core
dump now or get one later.

On centos: yum install postgresql10-debuginfo or debuginfo-install postgresql10-server
Make sure this exactly matches the debug symbols exactly match the server version.

Justin

Re: dsa_allocate() faliure

From
Alvaro Herrera
Date:
On 2018-Nov-26, Jakub Glapa wrote:

> Justin thanks for the information!
> I'm running Ubuntu 16.04.
> I'll try to prepare for the next crash.
> Couldn't find anything this time.

As I recall, the appport stuff in Ubuntu is terrible ... I've seen it
take 40 minutes to write the crash dump to disk, during which the
database was "down".  I don't know why it is so slow (it's a rather
silly python script that apparently processes the core dump one byte at
a time, and you can imagine that with a few gigabytes of shared memory
that takes a while).  Anyway my recommendation was to *remove* that
stuff from the server and make sure the core file is saved by normal
means.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


RE: dsa_allocate() faliure

From
Arne Roland
Date:
Hello,

I'm not sure whether this is connected at all, but I'm facing the same error with a generated query on postgres 10.6.
It works with parallel query disabled and  gives "dsa_allocate could not find 7 free pages" otherwise.

I've attached query and strace. The table is partitioned on (o, date). It's not depended on the precise lists I'm
using,while it obviously does depend on the fact that the optimizer chooses a parallel query. 
 

Regards
Arne Roland

-----Original Message-----
From: Thomas Munro <thomas.munro@enterprisedb.com> 
Sent: Friday, October 5, 2018 4:17 AM
To: Sand Stone <sand.m.stone@gmail.com>
Cc: Rick Otten <rottenwindfish@gmail.com>; Tom Lane <tgl@sss.pgh.pa.us>; pgsql-performance@lists.postgresql.org; Robert
Haas<robertmhaas@gmail.com>
 
Subject: Re: dsa_allocate() faliure

On Wed, Aug 29, 2018 at 5:48 PM Sand Stone <sand.m.stone@gmail.com> wrote:
> I attached a query (and its query plan) that caused the crash: "dsa_allocate could not find 13 free pages" on one of
theworker nodes. I anonymised the query text a bit.  Interestingly, this time only one (same one) of the nodes is
crashing.Since this is a production environment, I cannot get the stack trace. Once turned off parallel execution for
thisnode. The whole query finished just fine. So the parallel query plan is from one of the nodes not crashed,
hopefullythe same plan would have been executed on the crashed node. In theory, every worker node has the same bits,
andvery similar data.
 

I wonder if this was a different symptom of the problem fixed here:

https://www.postgresql.org/message-id/flat/194c0706-c65b-7d81-ab32-2c248c3e2344%402ndquadrant.com

Can you still reproduce it on current master, REL_11_STABLE or REL_10_STABLE?

-- 
Thomas Munro
http://www.enterprisedb.com





Attachment

RE: dsa_allocate() faliure

From
Arne Roland
Date:
Hello,

does anybody have any idea what goes wrong here? Is there some additional information that could be helpful?

All the best
Arne Roland

Re: dsa_allocate() faliure

From
Thomas Munro
Date:
On Tue, Jan 29, 2019 at 2:50 AM Arne Roland <A.Roland@index.de> wrote:
> does anybody have any idea what goes wrong here? Is there some additional information that could be helpful?

Hi Arne,

This seems to be a bug; that error should not be reached.  I wonder if
it is a different manifestation of the bug reported as #15585 (ie some
type of rare corruption).  Are you able to reproduce this
consistently?  Can you please show the query plan?

-- 
Thomas Munro
http://www.enterprisedb.com


Re: dsa_allocate() faliure

From
Fabio Isabettini
Date:
Hello,
 we are facing a similar issue on a Production system using a Postgresql 10.6:

org.postgresql.util.PSQLException: ERROR: EXCEPTION on getstatistics ; ID: EXCEPTION on getstatistics_media ; ID: uidatareader.
run_query_media(2): [a1] REMOTE FATAL: dsa_allocate could not find 7 free pages


The query reads remotely (via pl/proxy) tables containing a lot of data (up to millions or rows for each table/node) after a remote “group by" returns to the caller “master” node only a few hundreds of rows from each “slave” node.
The tables are partitioned using the INHERITANCE method that we are using since years with no issue. All tables have the same columns structure and number, about 300 columns. In the query there are no join, only a variable set of partitions depending on the date range.
The “REMOTE FATAL” refers to the pl/proxy that runs on 2 different slaves, [a0] and [a1], nodes with identical configuration and database structure, but it seems to fail only on node [a1].
When we get the error if we reduce the date range and therefore the quantity of data read, the error disappears, the same if we set max_parallel_workers_per_gather = 0.
Obviously we cannot force the user to use short periods of time to avoid the error and so we have disabled the parallel query feature for the time being.
It is difficult to reproduce the issue because not always the user gets the error, furthermore re-running the same query in different moments/days it usually works. It is a kind of weird.
We would like not to stop the Production system and upgrade it to PG11. And even though would this guarantee a permanent fix? 
Any suggestion? 


Regards,
Fabio Isabettini
Voipfuture (Germany)



The failing node [a1] configuration:

OS: Centos 7 kernerl 3.10.0-862.11.6.el7.x86_64
Postgres: postgres-10.5-862.11.6.1
RAM: 256 GB (The main server containing the master node and [a0] node, the slave that has no issue, has 384 GB of RAM)
CPU cores: 32

shared_buffers = 64GB
max_worker_processes = 32
max_parallel_workers_per_gather = 8
max_parallel_workers = 32


On 28. Jan 2019, at 19:56:01, Thomas Munro <thomas.munro@enterprisedb.com> wrote:

On Tue, Jan 29, 2019 at 2:50 AM Arne Roland <A.Roland@index.de> wrote:
does anybody have any idea what goes wrong here? Is there some additional information that could be helpful?

Hi Arne,

This seems to be a bug; that error should not be reached.  I wonder if
it is a different manifestation of the bug reported as #15585 (ie some
type of rare corruption).  Are you able to reproduce this
consistently?  Can you please show the query plan?

--
Thomas Munro
http://www.enterprisedb.com


Re: dsa_allocate() faliure

From
Thomas Munro
Date:
On Tue, Jan 29, 2019 at 10:32 PM Fabio Isabettini
<fisabettini@voipfuture.com> wrote:
>  we are facing a similar issue on a Production system using a Postgresql 10.6:
>
> org.postgresql.util.PSQLException: ERROR: EXCEPTION on getstatistics ; ID: EXCEPTION on getstatistics_media ; ID:
uidatareader.
> run_query_media(2): [a1] REMOTE FATAL: dsa_allocate could not find 7 free pages

> We would like not to stop the Production system and upgrade it to PG11. And even though would this guarantee a
permanentfix?
 
> Any suggestion?

Hi Fabio,

Thanks for your report.  Could you please also show the query plan
that runs on the "remote" node (where the error occurred)?

There is no indication that upgrading to PG11 would help here.  It
seems we have an undiagnosed bug (in 10 and 11), and so far no one has
been able to reproduce it at will.  I personally have chewed a lot of
CPU time on several machines trying various plan shapes and not seen
this or the possibly related symptom from bug #15585 even once.  But
we have about three reports of each of the two symptoms.  One reporter
wrote to me off-list to say that they'd seen #15585 twice, the second
time by running the same query in a tight loop for 8 hours, and then
not seen it again in the past 3 weeks.  Clearly there is issue needing
a fix here, but I don't yet know what it is.

-- 
Thomas Munro
http://www.enterprisedb.com


Re: dsa_allocate() faliure

From
Fabio Isabettini
Date:
Hi Thomas,
it is a Production system and we don’t have permanent access to it.
Also to have an auto_explain feature always on, is not an option in production.
I will ask the customer to give us notice asap the error present itself to connect immediately and try to get a query
plan.

Regards

Fabio Isabettini
www.voipfuture.com

> On 30. Jan 2019, at 04:13:14, Thomas Munro <thomas.munro@enterprisedb.com> wrote:
>
> On Tue, Jan 29, 2019 at 10:32 PM Fabio Isabettini
> <fisabettini@voipfuture.com> wrote:
>> we are facing a similar issue on a Production system using a Postgresql 10.6:
>>
>> org.postgresql.util.PSQLException: ERROR: EXCEPTION on getstatistics ; ID: EXCEPTION on getstatistics_media ; ID:
uidatareader.
>> run_query_media(2): [a1] REMOTE FATAL: dsa_allocate could not find 7 free pages
>
>> We would like not to stop the Production system and upgrade it to PG11. And even though would this guarantee a
permanentfix? 
>> Any suggestion?
>
> Hi Fabio,
>
> Thanks for your report.  Could you please also show the query plan
> that runs on the "remote" node (where the error occurred)?
>
> There is no indication that upgrading to PG11 would help here.  It
> seems we have an undiagnosed bug (in 10 and 11), and so far no one has
> been able to reproduce it at will.  I personally have chewed a lot of
> CPU time on several machines trying various plan shapes and not seen
> this or the possibly related symptom from bug #15585 even once.  But
> we have about three reports of each of the two symptoms.  One reporter
> wrote to me off-list to say that they'd seen #15585 twice, the second
> time by running the same query in a tight loop for 8 hours, and then
> not seen it again in the past 3 weeks.  Clearly there is issue needing
> a fix here, but I don't yet know what it is.
>
> --
> Thomas Munro
> http://www.enterprisedb.com
>




RE: dsa_allocate() faliure

From
Arne Roland
Date:
Hi Thomas,

this is reproducible, while it's highly sensitive to the change of plans (i.e. the precise querys that do break change
withevery new analyze). Disabling parallel query seems to solve the problem (as expected).
 
At some point even the simple query
select count(*) from test_tab where (o = '0' and date >= '30.01.2019'::date-interval '14 days' or o = '1' and date >=
'30.01.2019'::date-interval'14 days') and coalesce(fid,fallback) >=6 and coalesce(fid,fallback) <=6
 
was reported to fail (with the same error) at the live database, but I wasn't able to obtain a plan, since it works
againwith the current live data (maybe autoanalyze is at fault here). 
 
The table test_tab has roughly 70 children that inherit from it. The children and the corresponding indexes should be
namedlike '%part%'.
 

I attached a query with a plan that fails on my test database.

I don't want to rule out the possibility that it could be related to #15585; at least both issues seem to be related to
ParallelBitmap and inheritance/partitioned tables, but the error occurs relatively quickly in my case and every one of
myprocesses (the children and the master) are failing with 'FATAL:  dsa_allocate could not find 7 free pages'.
 

Regards
Arne




Attachment

Re: dsa_allocate() faliure

From
Justin Pryzby
Date:
On Thu, Jan 31, 2019 at 06:19:54PM +0000, Arne Roland wrote:
> this is reproducible, while it's highly sensitive to the change of plans (i.e. the precise querys that do break
changewith every new analyze). Disabling parallel query seems to solve the problem (as expected).
 
> At some point even the simple query
> select count(*) from test_tab where (o = '0' and date >= '30.01.2019'::date-interval '14 days' or o = '1' and date >=
'30.01.2019'::date-interval'14 days') and coalesce(fid,fallback) >=6 and coalesce(fid,fallback) <=6
 
> was reported to fail (with the same error) at the live database, but I wasn't able to obtain a plan, since it works
againwith the current live data (maybe autoanalyze is at fault here). 
 
> The table test_tab has roughly 70 children that inherit from it. The children and the corresponding indexes should be
namedlike '%part%'.
 
> 
> I attached a query with a plan that fails on my test database.

Thanks - note that previously Thomas said:

On Mon, Dec 03, 2018 at 11:45:00AM +1300, Thomas Munro wrote:
> On Sat, Dec 1, 2018 at 9:46 AM Justin Pryzby <pryzby@telsasoft.com> wrote:
> >                         elog(FATAL,
> >                                  "dsa_allocate could not find %zu free pages", npages);
> > +                       abort()
> 
> If anyone can reproduce this problem with a debugger, it'd be
> interesting to see the output of dsa_dump(area), and
> FreePageManagerDump(segment_map->fpm).  This error condition means

Are you able to cause the error in a test/devel/non-production environment to
run under a debugger, or could you compile with "abort();" after that elog() to
save a corefile ?

Justin


Re: dsa_allocate() faliure

From
Jakub Glapa
Date:
Hi Thomas,
I was one of the reporter in the early Dec last year.
I somehow dropped the ball and forgot about the issue.
Anyhow I upgraded the clusters to pg11.1 and nothing changed. I also have a rule to coredump but a segfault does not happen while this is occurring.
I see the error showing up every night on 2 different servers. But it's a bit of a heisenbug because If I go there now it won't be reproducible.
It was suggested by Justin Pryzby that I recompile pg src with his patch that would cause a coredump.
But I don't feel comfortable doing this especially if I would have to run this with prod data.
My question is. Can I do anything like increasing logging level or enable some additional options?
It's a production server but I'm willing to sacrifice a bit of it's performance if that would help.


--
regards,
pozdrawiam,
Jakub Glapa


On Wed, Jan 30, 2019 at 4:13 AM Thomas Munro <thomas.munro@enterprisedb.com> wrote:
On Tue, Jan 29, 2019 at 10:32 PM Fabio Isabettini
<fisabettini@voipfuture.com> wrote:
>  we are facing a similar issue on a Production system using a Postgresql 10.6:
>
> org.postgresql.util.PSQLException: ERROR: EXCEPTION on getstatistics ; ID: EXCEPTION on getstatistics_media ; ID: uidatareader.
> run_query_media(2): [a1] REMOTE FATAL: dsa_allocate could not find 7 free pages

> We would like not to stop the Production system and upgrade it to PG11. And even though would this guarantee a permanent fix?
> Any suggestion?

Hi Fabio,

Thanks for your report.  Could you please also show the query plan
that runs on the "remote" node (where the error occurred)?

There is no indication that upgrading to PG11 would help here.  It
seems we have an undiagnosed bug (in 10 and 11), and so far no one has
been able to reproduce it at will.  I personally have chewed a lot of
CPU time on several machines trying various plan shapes and not seen
this or the possibly related symptom from bug #15585 even once.  But
we have about three reports of each of the two symptoms.  One reporter
wrote to me off-list to say that they'd seen #15585 twice, the second
time by running the same query in a tight loop for 8 hours, and then
not seen it again in the past 3 weeks.  Clearly there is issue needing
a fix here, but I don't yet know what it is.

--
Thomas Munro
http://www.enterprisedb.com

Re: dsa_allocate() faliure

From
Thomas Munro
Date:
On Mon, Feb 4, 2019 at 6:52 PM Jakub Glapa <jakub.glapa@gmail.com> wrote:
> I see the error showing up every night on 2 different servers. But it's a bit of a heisenbug because If I go there
nowit won't be reproducible.
 

Huh.  Ok well that's a lot more frequent that I thought.  Is it always
the same query?  Any chance you can get the plan?  Are there more
things going on on the server, like perhaps concurrent parallel
queries?

> It was suggested by Justin Pryzby that I recompile pg src with his patch that would cause a coredump.

Small correction to Justin's suggestion: don't abort() after
elog(ERROR, ...), it'll never be reached.

> But I don't feel comfortable doing this especially if I would have to run this with prod data.
> My question is. Can I do anything like increasing logging level or enable some additional options?
> It's a production server but I'm willing to sacrifice a bit of it's performance if that would help.

If you're able to run a throwaway copy of your production database on
another system that you don't have to worry about crashing, you could
just replace ERROR with PANIC and run a high-speed loop of the query
that crashed in product, or something.  This might at least tell us
whether it's reach that condition via something dereferencing a
dsa_pointer or something manipulating the segment lists while
allocating/freeing.

In my own 100% unsuccessful attempts to reproduce this I was mostly
running the same query (based on my guess at what ingredients are
needed), but perhaps it requires a particular allocation pattern that
will require more randomness to reach... hmm.

-- 
Thomas Munro
http://www.enterprisedb.com


RE: dsa_allocate() faliure

From
Arne Roland
Date:
It's definitely a quite a relatively complex pattern. The query I set you last time was minimal with respect to
predicates(so removing any single one of the predicates converted that one into a working query).
 
> Huh.  Ok well that's a lot more frequent that I thought.  Is it always the same query?  Any chance you can get the
plan? Are there more things going on on the server, like perhaps concurrent parallel queries?
 
I had this bug occurring while I was the only one working on the server. I checked there was just one transaction with
asnapshot at all and it was a autovacuum busy with a totally unrelated relation my colleague was working on.
 

The bug is indeed behaving like a ghost.
One child relation needed a few new rows to test a particular application a colleague of mine was working on. The
inserttriggered an autoanalyze and the explain changed slightly:
 
Besides row and cost estimates the change is that the line
Recheck Cond: (((COALESCE((fid)::bigint, fallback) ) >= 1) AND ((COALESCE((fid)::bigint, fallback) ) <= 1) AND (gid &&
'{853078,853080,853082}'::integer[]))
is now 
Recheck Cond: ((gid && '{853078,853080,853082}'::integer[]) AND ((COALESCE((fid)::bigint, fallback) ) >= 1) AND
((COALESCE((fid)::bigint,fallback) ) <= 1))
 
and the error vanished.

I could try to hunt down another query by assembling seemingly random queries. I don't see a very clear pattern from
thequeries aborting with this error on our production servers. I'm not surprised that bug is had to chase on production
servers.They usually are quite lively.
 

>If you're able to run a throwaway copy of your production database on another system that you don't have to worry
aboutcrashing, you could just replace ERROR with PANIC and run a high-speed loop of the query that crashed in product,
orsomething.  This might at least tell us whether it's reach that condition via something dereferencing a dsa_pointer
orsomething manipulating the segment lists while allocating/freeing.
 

I could take a backup and restore the relevant tables on a throwaway system. You are just suggesting to replace line
728
elog(FATAL,
                                 "dsa_allocate could not find %zu free pages", npages);
by
elog(PANIC,
                                 "dsa_allocate could not find %zu free pages", npages);
correct? Just for my understanding: why would the shutdown of the whole instance create more helpful logging?

All the best
Arne

Re: dsa_allocate() faliure

From
Justin Pryzby
Date:
On Mon, Feb 04, 2019 at 08:31:47PM +0000, Arne Roland wrote:
> I could take a backup and restore the relevant tables on a throwaway system. You are just suggesting to replace line
728
> elog(FATAL,
>                                  "dsa_allocate could not find %zu free pages", npages);
> by
> elog(PANIC,
>                                  "dsa_allocate could not find %zu free pages", npages);
> correct? Just for my understanding: why would the shutdown of the whole instance create more helpful logging?

You'd also start with pg_ctl -c, which would allow it to dump core, which could
be inspected with GDB to show a backtrace and other internals, which up to now
nobody (including myself) has been able to provide.

Justin


Re: dsa_allocate() faliure

From
Justin Pryzby
Date:
Hi,

On Mon, Nov 26, 2018 at 09:52:07AM -0600, Justin Pryzby wrote:
> Hi, thanks for following through.
> 
> On Mon, Nov 26, 2018 at 04:38:35PM +0100, Jakub Glapa wrote:
> > I had a look at dmesg and indeed I see something like:
> > 
> > postgres[30667]: segfault at 0 ip 0000557834264b16 sp 00007ffc2ce1e030
> > error 4 in postgres[557833db7000+6d5000]
> 
> That's useful, I think "at 0" means a null pointer dereferenced.

Thomas fixed several bugs in DSA, which will be in next release, postgres 10.8
and 11.3.

However that doesn't explain the segfault you saw, and I don't see anything
which looks relevant changed since in 10.5.

If you still see that using the latest minor release (10.7), please try to
capture a core file and send a backtrace with a new thread on pgsql-hackers.

Thanks,
Justin


Re: dsa_allocate() faliure

From
Jakub Glapa
Date:
Hi I just checked the dmesg.
The segfault I wrote about is the only one I see, dated Nov 24 last year.
Since then no other segfaults happened although dsa_allocated failures happen daily.
I'll report if anything occurs.
I have the core dumping setup in place.



--
regards,
pozdrawiam,
Jakub Glapa


On Sun, Feb 17, 2019 at 11:21 PM Justin Pryzby <pryzby@telsasoft.com> wrote:
Hi,

On Mon, Nov 26, 2018 at 09:52:07AM -0600, Justin Pryzby wrote:
> Hi, thanks for following through.
>
> On Mon, Nov 26, 2018 at 04:38:35PM +0100, Jakub Glapa wrote:
> > I had a look at dmesg and indeed I see something like:
> >
> > postgres[30667]: segfault at 0 ip 0000557834264b16 sp 00007ffc2ce1e030
> > error 4 in postgres[557833db7000+6d5000]
>
> That's useful, I think "at 0" means a null pointer dereferenced.

Thomas fixed several bugs in DSA, which will be in next release, postgres 10.8
and 11.3.

However that doesn't explain the segfault you saw, and I don't see anything
which looks relevant changed since in 10.5.

If you still see that using the latest minor release (10.7), please try to
capture a core file and send a backtrace with a new thread on pgsql-hackers.

Thanks,
Justin