Re: dsa_allocate() faliure - Mailing list pgsql-performance

From Jakub Glapa
Subject Re: dsa_allocate() faliure
Date
Msg-id CAJk1zg28tqx2021D0j-RqFtbLe+SPj4JKdmnc+K2aJZTUYk3eQ@mail.gmail.com
Whole thread Raw
In response to Re: dsa_allocate() faliure  (Thomas Munro <thomas.munro@enterprisedb.com>)
Responses Re: dsa_allocate() faliure
List pgsql-performance
Hi Thomas,
I was one of the reporter in the early Dec last year.
I somehow dropped the ball and forgot about the issue.
Anyhow I upgraded the clusters to pg11.1 and nothing changed. I also have a rule to coredump but a segfault does not happen while this is occurring.
I see the error showing up every night on 2 different servers. But it's a bit of a heisenbug because If I go there now it won't be reproducible.
It was suggested by Justin Pryzby that I recompile pg src with his patch that would cause a coredump.
But I don't feel comfortable doing this especially if I would have to run this with prod data.
My question is. Can I do anything like increasing logging level or enable some additional options?
It's a production server but I'm willing to sacrifice a bit of it's performance if that would help.


--
regards,
pozdrawiam,
Jakub Glapa


On Wed, Jan 30, 2019 at 4:13 AM Thomas Munro <thomas.munro@enterprisedb.com> wrote:
On Tue, Jan 29, 2019 at 10:32 PM Fabio Isabettini
<fisabettini@voipfuture.com> wrote:
>  we are facing a similar issue on a Production system using a Postgresql 10.6:
>
> org.postgresql.util.PSQLException: ERROR: EXCEPTION on getstatistics ; ID: EXCEPTION on getstatistics_media ; ID: uidatareader.
> run_query_media(2): [a1] REMOTE FATAL: dsa_allocate could not find 7 free pages

> We would like not to stop the Production system and upgrade it to PG11. And even though would this guarantee a permanent fix?
> Any suggestion?

Hi Fabio,

Thanks for your report.  Could you please also show the query plan
that runs on the "remote" node (where the error occurred)?

There is no indication that upgrading to PG11 would help here.  It
seems we have an undiagnosed bug (in 10 and 11), and so far no one has
been able to reproduce it at will.  I personally have chewed a lot of
CPU time on several machines trying various plan shapes and not seen
this or the possibly related symptom from bug #15585 even once.  But
we have about three reports of each of the two symptoms.  One reporter
wrote to me off-list to say that they'd seen #15585 twice, the second
time by running the same query in a tight loop for 8 hours, and then
not seen it again in the past 3 weeks.  Clearly there is issue needing
a fix here, but I don't yet know what it is.

--
Thomas Munro
http://www.enterprisedb.com

pgsql-performance by date:

Previous
From: Justin Pryzby
Date:
Subject: Re: dsa_allocate() faliure
Next
From: Thomas Munro
Date:
Subject: Re: dsa_allocate() faliure