Re: Postgres Query Plan Live Lock - Mailing list pgsql-performance

From Pweaver (Paul Weaver)
Subject Re: Postgres Query Plan Live Lock
Date
Msg-id CAFTGa=kU2aTRQVdW2Wwg4XUYxzU0_7LCWEYfhGBAm3n5+7Yyyw@mail.gmail.com
Whole thread Raw
In response to Postgres Query Plan Live Lock  (Jeff Janes <jeff.janes@gmail.com>)
Responses Re: Postgres Query Plan Live Lock
Re: Postgres Query Plan Live Lock
List pgsql-performance

On Wed, Feb 5, 2014 at 9:52 AM, Jeff Janes <jeff.janes@gmail.com> wrote:
On Monday, February 3, 2014, Pweaver (Paul Weaver) <pweaver@panjiva.com> wrote:
We have been running into a (live lock?) issue on our production Postgres instance causing queries referencing a particular table to become extremely slow and our application to lock up.

This tends to occur on a particular table that gets a lot of queries against it after a large number of deletes. When this happens, the following symptoms occur when queries referencing that table are run (even it we stop the deleting):

What do you mean by "stop the deleting"?  Are you pausing the delete but without either committing or rolling back the transaction, but just holding it open?  Are you stopping it cleanly, between transactions?

We are repeatedly running delete commands in their own transactions. We stop issuing new deletes and let them finish cleanly. 

Also, how many queries are happening concurrently?  Perhaps you need a connection pooler.
Usually between 1 and 20. When it gets locked up closer to 100-200.
We should add a connection pooler. Would the number of active queries on the table be causing the issue?

Is the CPU time user time or system time?  What kernel version do you have?
Real time - 3.2.0-26


SELECT * FROM table_name LIMIT 10;  -- takes ~45 seconds to complete
EXPLAIN SELECT * FROM table_name LIMIT 10;  -- takes ~45 seconds to complete the explain query, the query plan looks reasonable

This sounds like the problem we heard quite a bit about recently, where processes spend a lot of time fighting over the proclock while they try to check the commit status of open transactions while.  But I don't see how deletes could trigger that behavior.  If the delete has not committed, the tuples are still visible and the LIMIT 10 is quickly satisfied.  If the delete has committed, the tuples quickly get hinted, and so the next query along should be faster.

I also don't see why the explain would be slow.  A similar problem was tracked down to digging through in-doubt tuples while trying to use an index to find the true the min or max during estimating the cost of a merge join.  But I don't think a simple table query should lead to that, unless table_name is a view.  And I don't see how deletes, rather than uncommitted inserts, could trigger it either.


 max_connections              | 600                                      | configuration file

That is quite extreme.  If a temporary load spike (like from the deletes and the hinting needed after them) slows down the select queries and you start more and more of them, soon you could tip the system over into kernel scheduler insanity with high system time.  Once in this mode, it will stay there until the incoming stream of queries stops and the existing ones clear out.  But, if that is what is occurring, I don't know why queries on other tables would still be fast.
We probably want a connection pooler anyways, but in this particular case, the load average is fairly low on the machine running Postrgres.

Cheers,

Jeff



--
Thank You,

pgsql-performance by date:

Previous
From: "Pweaver (Paul Weaver)"
Date:
Subject: Re: Postgres Query Plan Live Lock
Next
From: Claudio Freire
Date:
Subject: Re: Postgres Query Plan Live Lock