Re: Some queries starting to hang - Mailing list pgsql-performance

From Scott Marlowe
Subject Re: Some queries starting to hang
Date
Msg-id 1149627741.25526.224.camel@state.g2switchworks.com
Whole thread Raw
In response to Re: Some queries starting to hang  ("Jim C. Nasby" <jnasby@pervasive.com>)
Responses Re: Some queries starting to hang  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-performance
On Tue, 2006-06-06 at 15:51, Jim C. Nasby wrote:
> On Tue, Jun 06, 2006 at 12:54:27PM -0500, Scott Marlowe wrote:
> > On Tue, 2006-06-06 at 12:50, Craig A. James wrote:
> > > Tom Lane wrote:
> > > >>The idea I just had was: why do we need EXPLAIN ANALYZE to run to
> > > >>completion? In severe cases like this thread, we might be able to
> > > >>discover the root cause by a *partial* execution of the plan, as long as
> > > >>it was properly instrumented. That way, the OP might have been able to
> > > >>discover the root cause himself...
> > > >
> > > >
> > > > I don't think that helps, as it just replaces one uncertainty by
> > > > another: how far did the EXPLAIN really get towards completion of the
> > > > plan?  You still don't have any hard data.
> > >
> > > But at least you have some data, which is better than no data.  Even knowing that the plan got stuck on a
particularnode of the query plan could be vital information.  For a query that never finishes, you can't even find out
whereit's getting stuck. 
> > >
> > > That's why Simon's proposal might help in some particularly difficult situations.
> >
> > Hmmmmm.  I wonder if it be hard to have explain analyze have a timeout
> > per node qualifier?  Something that said if it takes more than x
> > milliseconds for a node to kill the explain analyze and list the up to
> > the nasty node that's using all the time up?
> >
> > That would be extremely useful.
>
> Maybe, maybe not. It would be very easy for this to croak on the first
> sort it hits. I suspect the original proposal of aborting once a
> rowcount estimate proves to be way off is a better idea.
>
> For the record, I also think being able to get a current snapshot is
> great, too.

I can see value in both.

Just because the row count is right doesn't mean it won't take a
fortnight of processing.  :)

The problem with the row count estimate being off from the real thing is
you only get it AFTER the set is retrieved for that node.

The cost of aborting on the first sort is minimal.  You just turn up the
number for the timeout and run it again.  1 minute or so wasted.

The cost of not aborting on the first sort is that you may never see
what the part of the plan is that's killing your query, since you never
get the actual plan.

pgsql-performance by date:

Previous
From: "Jim C. Nasby"
Date:
Subject: Re: Some queries starting to hang
Next
From: Tom Lane
Date:
Subject: Re: Some queries starting to hang