Re: How to debug a connection that's "active" but hanging? - Mailing list pgsql-general

From Jurrie Overgoor
Subject Re: How to debug a connection that's "active" but hanging?
Date
Msg-id 1b783c37-0712-8b21-f1e3-3ac1f8c4f096@jurr.org
Whole thread Raw
In response to Re: How to debug a connection that's "active" but hanging?  (Vijaykumar Jain <vijaykumarjain.github@gmail.com>)
Responses Re: How to debug a connection that's "active" but hanging?  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-general
On 12-07-2021 20:56, Vijaykumar Jain wrote:
On Mon, 12 Jul 2021 at 23:16, Tom Lane <tgl@sss.pgh.pa.us> wrote:

The backtraces you captured look like the query is not "hung", it's
just computing away.


He mentioned earlier that the query was hung as 'active' for 8 hours and on.

incase this is due to bad plan,
@Jurrie Overgoor  is it also possible for you to run manually 

`vacuumdb  -a -v`  from the terminal, each time before you run your test suite for some runs, do you still get the same issue?

I have a feeling repeated runs may have caused a lot of bloat on some tables which might have not been reclaimed by autovacuum runs.


I configured Jenkins to run that command prior to executing the tests. I got 5 successful runs, no hanging queries. Then I reverted and ran again. The first and second run were ok; the third run hung again. So your hunch might be right.


On 12-07-2021 19:46, Tom Lane wrote:
You might need
to investigate by altering your application to capture "EXPLAIN ..."
output just before the troublesome query, so you can see if it gets
a different plan in the slow cases.


Then I tried this. The query plans are indeed not consistent.

Most of the time the first line of the query plan is: Unique  (cost=4892.35..4892.35 rows=1 width=64) [1]

I have seen other costs: 5818.30, 6350.85 and 6514.73. They all complete correctly. [2], [3], [4]

The plan that leaves the query hanging in the 'active' state starts with: Unique  (cost=241.81..241.82 rows=1 width=64) [5]

That's clearly much lower than the rest. So I suspect the planner making a 'wrong' guess there, causing a bad plan, and a long time to execute. For reference, the executed query is [6].

Now, where to go from here? Is this considered a bug in PostgreSQL, or am I misusing the database engine by doing DROP DATABASE and CREATE DATABASE over and over again? I must say that I never saw this behavior on PostgreSQL 9.6, so in that regard it might be considered a bug.....?

What can I do to get to the bottom of this? Should I export the content of some metadata tables prior to executing the hanging query? Should I `vacuumdb -a -v` prior to logging the EXPLAIN for the hanging query?

With kind regards,

Jurrie


[1] https://jurr.org/PostgreSQL_13_hanging_query/normal.txt
[2] https://jurr.org/PostgreSQL_13_hanging_query/alt1.txt
[3] https://jurr.org/PostgreSQL_13_hanging_query/alt3.txt
[4] https://jurr.org/PostgreSQL_13_hanging_query/alt2.txt
[5] https://jurr.org/PostgreSQL_13_hanging_query/hang.txt
[6] https://jurr.org/PostgreSQL_13_hanging_query/query.txt


pgsql-general by date:

Previous
From: Vijaykumar Jain
Date:
Subject: Re: How to debug a connection that's "active" but hanging?
Next
From: Tom Lane
Date:
Subject: Re: How to debug a connection that's "active" but hanging?