On 11/10/15 3:42 PM, Jason Jho wrote:
> On one particular day, this query hung for many hours and even while we
> killed pids for running queries and any locks granted, the query would
> never return. Also no hints of blocking processes. After some digging
> through some I/O metrics, we didn't see any memory issues or unusual
> spikes that would lead us to believe that we're running low on resources.
Did IO stats indicate IO was happening? Did you see a pegged CPU running
the query?
> There is 1 caveat, however: there was a different schema that contained
> a day-old copy of data that isn't normally present when the hang started
> to occur. However, since these are completely different schema
> namespaces with no crossovers in the queries themselves, I don't see how
> this is relevant.
If search_path wasn't what you thought it was you could have easily been
running against the wrong set of tables.
> We thought this might be possibly due to some internal vacuuming, but
> this is unlikely since there are no real concurrent reads or updates
> happening. Auto-vacuum is also on with default settings.
There are other reasons why autovacuum could kick in, notably to prevent
transaction ID wraparound.
> What is the most confusing part in all of this is why a DROP SCHEMA
> CASCADE and a fresh pg_restore would somehow fix the problem. Even a
> fresh reboot didn't fix it.
Without more info we're stuck guessing. You might try submitting a
ticket with amazon, especially if you can reproduce this.
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com