Hanging query on a fresh restart - Mailing list pgsql-performance

From Jason Jho
Subject Hanging query on a fresh restart
Date
Msg-id CACLReKuBuHU85p38fYoF_h_jMKeeFewOXvS7zSGV+fHWz4ENrQ@mail.gmail.com
Whole thread Raw
Responses Re: Hanging query on a fresh restart  (Jim Nasby <Jim.Nasby@BlueTreble.com>)
List pgsql-performance
Hi,

We using Postgres 9.3.10 on Amazon RDS and running into some strange behavior that has been tough to track down and debug (partially due to the limited admin access from RDS).

We're running a read-only query that normally takes ~10-15 min., but also runs concurrently with several other intensive queries (these queries themselves, finish).  

On one particular day, this query hung for many hours and even while we killed pids for running queries and any locks granted, the query would never return.  Also no hints of blocking processes.  After some digging through some I/O metrics, we didn't see any memory issues or unusual spikes that would lead us to believe that we're running low on resources.

There is 1 caveat, however:  there was a different schema that contained a day-old copy of data that isn't normally present when the hang started to occur.  However, since these are completely different schema namespaces with no crossovers in the queries themselves, I don't see how this is relevant.


 1) We ended up doing a full reboot of the RDS instance and ran the query again, this time, no other queries are running off of a fresh boot-up (no competing locks or transactions).  The query continued to hang.

 2) We then ran pg_dump to snapshot the current data and did a full pg_restore (after dropping all schemas) of an older dataset where we knew this query would run successfully.  As expected, the query ran fine.

 3) We then dropped all schemas again and pg_restored the previous dataset that was causing the query to hang, and then to my surprise, the query ran just fine.  No hangs.  

We thought this might be possibly due to some internal vacuuming, but this is unlikely since there are no real concurrent reads or updates happening.  Auto-vacuum is also on with default settings.

What is the most confusing part in all of this is why a DROP SCHEMA CASCADE and a fresh pg_restore would somehow fix the problem.  Even a fresh reboot didn't fix it.

Any ideas??

pgsql-performance by date:

Previous
From: Igor Neyman
Date:
Subject: Re: Slow 3 Table Join with v bad row estimate
Next
From: David Osborne
Date:
Subject: Re: Slow 3 Table Join with v bad row estimate