Re: [GENERAL] EXPLAIN command just hangs... - Mailing list pgsql-general

From Rhhh Lin
Subject Re: [GENERAL] EXPLAIN command just hangs...
Date
Msg-id DB6PR1001MB11417225813B47418F39E456AE520@DB6PR1001MB1141.EURPRD10.PROD.OUTLOOK.COM
Whole thread Raw
In response to Re: [GENERAL] EXPLAIN command just hangs...  (Justin Pryzby <pryzby@telsasoft.com>)
Responses Re: [GENERAL] EXPLAIN command just hangs...
List pgsql-general

Thanks Justin,


Yeah, I'd agree it would be much easier to show you direct command output but I'm prohibited due to the security and sensitive nature of my work environment. 


A few observations I had earlier this evening. 

I created a CTAS copy of the table in order to mirror it from its original in terms of constraints, indexes etc. I was able to query this copy table (using my "problematic" query) in under 1 second (Actually even before I had applied indexes). So I was also able to check the explain plan and based on the index makeup, the access path should be using the appropriate index correctly. So I was somewhat pleased that the issue was not a query optimisation difficulty as such. 


I also dug deeper into vacuum, and discovered that even though a vacuum appeared to be completing, a verbose vacuum was actually erring out at the point of truncation at the end and not shrinking back the table to reclaim space. It could not obtain an exclusive lock to do this...


So I decided to intervene and terminate some active sessions (Which were active and not waiting) which were all querying this table with a similar query, by using pg_backend_terminate(pid). Some of these sessions were in an active state for a few days executing this query. However, the termination of the PIDs [The command succeeded and returned 't']has not closed down the sessions. They are still present, and still active and still executing. Those current sessions are for whatever reason completely "hung", locking out exclusive access to the table (from vacuum) and cannot be killed. I have not yet resorted to trying to kill the process PID's from the OS perspective. Instead, I will try to schedule an outage over the next few days, restart the server, implement a vacuum execution and see where I am at then. 


Regards,

Ruan 


From: Justin Pryzby <pryzby@telsasoft.com>
Sent: 03 November 2017 21:33
To: Rhhh Lin
Cc: Tom Lane; pgsql-general@postgresql.org
Subject: Re: [GENERAL] EXPLAIN <query> command just hangs...
 
On Fri, Nov 03, 2017 at 09:12:02PM +0000, Rhhh Lin wrote:
> I checked for dead tuples against that particular table initially as I have seen performance problems before in a related 'busy' environment which needed its frequency of vacuuming to be increased. So I have a query to check for table with dead tuples and this table is not showing any.
>
> I also came across the suggestion that bloat might be an issue on the database and how to identify and address it and it does not appear to be evident here also, so thats my thinking as to why these are not factors.

I'd be helpful if you'd paste the commands+output as you run them "\dt+, \di+,
ps, vacuum, dead tuples, etc"

> I have vacuumed. I have not reindexed as it is a prod environment and I see that... "REINDEX locks out writes but not reads of the index's parent table.", so I may have to arrange this to avoid any interruptions (Although currently, accessing this table seems completely problematic anyway!).

Perhaps you could look into pg_repack?  Note that by default it will kill
longrunning transaction if it needs in order to (briefly) obtain a
super-exclusive lock.

> The table is 691MB and the composite index(PK) is 723 MB.

It'd be useful to see the pg_stat_user_tables.* and pg_class.reltuples and
relpages for that table.  Also output from VACUUM VERBOSE or autovacuum logs,
if you have them (but note that vacuum does different work every time it's
re-run).

> My thinking now is I may need to export this data out to a staging area whereby I can attempt to "play" with it without any repercussions...

I imagine that maybe this is related to the pattern of activity on that table
(specifically around the extremes values of its indexed columns).  So it'll be
hard to reproduce, and dumping and reloading the table (or just reindexing it
without reloading it at all) will probably temporarily improve or resolve the
issue.

You could try *adding* a new index on the timestamp column alone (CREATE INDEX
CONCURRENTLY).  Re-creating the index might conceivably be the solution in
the end, and it's what pg_repack does behind the scenes.

Justin

pgsql-general by date:

Previous
From: Rob Sargent
Date:
Subject: Re: [GENERAL] gin index trouble
Next
From: Rhhh Lin
Date:
Subject: Re: [GENERAL]