Re: Trying to track down weird query stalls - Mailing list pgsql-performance

From Scott Marlowe
Subject Re: Trying to track down weird query stalls
Date
Msg-id dcc563d10903301238u135d82bpf498c2d40ac31977@mail.gmail.com
Whole thread Raw
In response to Re: Trying to track down weird query stalls  (dan@sidhe.org)
List pgsql-performance
On Mon, Mar 30, 2009 at 12:42 PM,  <dan@sidhe.org> wrote:
>> On Mon, Mar 30, 2009 at 1:50 PM,  <dan@sidhe.org> wrote:
>>> I'm running a 64-bit build of Postgres 8.3.5 on AIX 5.3, and have a
>>> really
>>> strange, annoying transient problem with one particular query stalling.
>>>
>>> The symptom here is that when this query is made with X or more records
>>> in
>>> a temp table involved in the join (where X is constant when the problem
>>> manifests, but is different between manifestations) the query takes
>>> about
>>> 20 minutes. When the query is made with X-1 records it takes less than a
>>> second. It's just this one query -- for everything else the system's
>>> nice
>>> and snappy. The machine load's never above 9 (it's a 32 CPU box) and
>>> hasn't had less than 60G of free system memory on it.
>>>
>>> An EXPLAIN ANALYZE of the two queries (with X-1 and X records) is even
>>> more bizarre. Not only are the two query plans identical (save trivial
>>> differences because of the record count differences) but the explain
>>> EXPLAIN ANALYZE total runtimes are near-identical -- the fast case
>>> showed
>>> 259ms, the slow case 265ms.
>>
>> log_min_duration_statement is a good place to start, but it sounds
>> like the query plan you're getting when you test it by hand isn't the
>> same one that's actually executing, so I'm not sure how far that's
>> going to get you.  contrib/auto_explain sounds like it would be just
>> the thing, but unfortunately that's an 8.4 feature which hasn't been
>> released yet.  I'm not sure whether it could be built and run against
>> 8.3.
>
> I'm not executing any of the EXPLAINs by hand, because I didn't want to
> have to worry about typos or filling in temp tables with test data. Inside
> the app the SQL for the problematic query's stored in a variable -- when
> the task runs with debugging enabled it first executes the query with
> EXPLAIN ANALYZE prepended and dumps the output, then it executes the query
> itself. It's possible something's going wrong in that, but the code's
> pretty simple.
>
> Arguably in this case the actual query should run faster than the EXPLAIN
> ANALYZE version, since the cache is hot. (Though that'd only likely shave
> a few dozen ms off the runtime)

Joining a lot of tables together?  Could be GEQO kicking in.

pgsql-performance by date:

Previous
From: dan@sidhe.org
Date:
Subject: Re: Trying to track down weird query stalls
Next
From: Scott Marlowe
Date:
Subject: Re: Trying to track down weird query stalls