Thread: EXPLAIN ANALYZE

EXPLAIN ANALYZE

From
"Simon Riggs"
Date:
My understanding from 8.2 development was that EXPLAIN ANALYZE had been
altered so that if you issued a particular keystroke while it was
executing you'd get a partial results-so-far version of the EXPLAIN.

AFAICS that wasn't implemented. Or at least I can't find it?

The problem raised in 8.2dev was that EXPLAIN ANALYZE takes forever. It
still does, which means analyzing queries that run for too long is
almost impossible. Those are the ones we really need to analyze...

Thoughts?

--  Simon Riggs              EnterpriseDB   http://www.enterprisedb.com




Re: EXPLAIN ANALYZE

From
Martijn van Oosterhout
Date:
On Fri, Dec 08, 2006 at 10:36:12AM +0000, Simon Riggs wrote:
> My understanding from 8.2 development was that EXPLAIN ANALYZE had been
> altered so that if you issued a particular keystroke while it was
> executing you'd get a partial results-so-far version of the EXPLAIN.

Not as I recall. There were some attempts to reduce the overhead, but
that's about it.

There were some attempts to get an estimate of how far the query had
progressed, but that's something else.

Not quite sure what you mean.

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to litigate.

Re: EXPLAIN ANALYZE

From
Tom Lane
Date:
"Simon Riggs" <simon@2ndquadrant.com> writes:
> My understanding from 8.2 development was that EXPLAIN ANALYZE had been
> altered so that if you issued a particular keystroke while it was
> executing you'd get a partial results-so-far version of the EXPLAIN.

Ways to get partial results from EXPLAIN were speculated about, but
AFAIR no specific design was agreed to, much less implemented.
        regards, tom lane


Re: EXPLAIN ANALYZE

From
"Simon Riggs"
Date:
On Fri, 2006-12-08 at 10:42 -0500, Tom Lane wrote:
> "Simon Riggs" <simon@2ndquadrant.com> writes:
> > My understanding from 8.2 development was that EXPLAIN ANALYZE had been
> > altered so that if you issued a particular keystroke while it was
> > executing you'd get a partial results-so-far version of the EXPLAIN.
> 
> Ways to get partial results from EXPLAIN were speculated about, but
> AFAIR no specific design was agreed to, much less implemented.

Well, I'd like a way of making EXPLAIN ANALYZE return something useful
within a reasonable amount of time. We can define that as the amount of
time that the user considers is their goal for the query. Having an EA
that goes on forever and ever doesn't really help anyone diagnose
problems with long running queries.

With that requirement in mind, how about something simple like:

explain_analyze_timeout = n

When timeout is reached the EA returns results as if it had finished
executing normally, though with some additional text to make it very
clear that the displayed results are partial execution only. Implemented
over the top of statement_timeout, but without throwing an error.

I'm not very sure about that idea, but it meets the requirement.

--  Simon Riggs              EnterpriseDB   http://www.enterprisedb.com




Re: EXPLAIN ANALYZE

From
Neil Conway
Date:
On Sun, 2006-12-10 at 13:00 +0000, Simon Riggs wrote:
> With that requirement in mind, how about something simple like:
> 
> explain_analyze_timeout = n
> 
> When timeout is reached the EA returns results as if it had finished
> executing normally, though with some additional text to make it very
> clear that the displayed results are partial execution only.

Sounds like a bit of a kludge, IMHO. EXPLAIN ANALYZE isn't designed to
provide online feedback about the execution of a query, which I think is
the real feature here. There has been prior discussion about how to do
online progress indication for queries in general[1] -- I think some
sort of "online EXPLAIN ANALYZE" would be a natural part of such a
feature.

-Neil

[1] http://developer.postgresql.org/index.php/Query_progress_indication




Re: EXPLAIN ANALYZE

From
"Simon Riggs"
Date:
On Sun, 2006-12-10 at 09:28 -0500, Neil Conway wrote:
> On Sun, 2006-12-10 at 13:00 +0000, Simon Riggs wrote:
> > With that requirement in mind, how about something simple like:
> > 
> > explain_analyze_timeout = n
> > 
> > When timeout is reached the EA returns results as if it had finished
> > executing normally, though with some additional text to make it very
> > clear that the displayed results are partial execution only.
> 
> Sounds like a bit of a kludge, IMHO. 

Without a doubt. I was hoping for some further inspiration...

How about any of these?
 EXPLAIN [ ANALYZE [TIME LIMIT n]] [ VERBOSE ] statement
 EXPLAIN [ ANALYZE [SHOW STATISTICS AFTER n]] [ VERBOSE ] statement

Neither of which need new keywords.

> EXPLAIN ANALYZE isn't designed to
> provide online feedback about the execution of a query, which I think is
> the real feature here. There has been prior discussion about how to do
> online progress indication for queries in general[1] -- I think some
> sort of "online EXPLAIN ANALYZE" would be a natural part of such a
> feature.
> 
> -Neil
> 
> [1] http://developer.postgresql.org/index.php/Query_progress_indication

Well, that all looks good and can see I'd want all of those things.

The EA case is pretty straightforward though; we don't really need
regular feedback as much as *any* feedback. So, I'm seeing it as a very
simple case of the overall requirement. It's also a special case in that
the output from a partial EA needs to be the same shape as a normal EA.

Query progress data would likely be a different shape, no? You'd need
some estimates of eventual time based upon a re-evaluation of the
original estimates based upon things learned so far during execution.
i.e. at start we thought this loop would be called X times (planner
estimate), so far its been called Y times (EA actual) and based upon
that we now think it will be called Z times. So I'm thinking of
enhancing EA to provide partial results rather than go for the Full
Monty just yet.

BTW, can anybody edit the Wiki? I didn't realise it existed.

--  Simon Riggs              EnterpriseDB   http://www.enterprisedb.com




Re: EXPLAIN ANALYZE

From
Tom Lane
Date:
"Simon Riggs" <simon@2ndquadrant.com> writes:
> The EA case is pretty straightforward though;

Well, no its not, as you'll recall if you re-read the prior discussions.
The killer problem is that it's unclear whether the early termination of
the query represents an error condition or not.  If it's not an error
then you've got a serious problem for non-SELECT queries (which EA
actually executes, remember) --- you'll have allowed an incompletely
executed update to become committed, which is as good a definition of
"data corruption" as I can come up with offhand.  On the other hand,
if it is an error then delivering some results along with the error
requires serious contortion of the FE/BE protocol, libpq's response to
errors, etc.  To say nothing of what it might take to do it inside the
backend, which generally does not like doing anything interesting in an
already-aborted transaction.

We might be able to finesse the protocol problem by teaching EA to
respond to query cancel by emitting the data-so-far as a NOTICE (like it
used to do many moons ago), rather than a standard query result, then
allowing the query to error out.  However this'd be fairly unfriendly
for client-side tools that are expecting a query result.
        regards, tom lane


Re: EXPLAIN ANALYZE

From
Peter Eisentraut
Date:
Simon Riggs wrote:
> Well, I'd like a way of making EXPLAIN ANALYZE return something
> useful within a reasonable amount of time. We can define that as the
> amount of time that the user considers is their goal for the query.

What sort of "useful" results would you expect to be able to see from 
such an aborted EXPLAIN ANALYZE?  I cannot quite imagine what 
instructive value a partially executed plan output would have.  It's 
not like we can somehow ensure executing an equal proportion of each 
plan node or something.  Do you have a specific case in mind?

-- 
Peter Eisentraut
http://developer.postgresql.org/~petere/


Re: EXPLAIN ANALYZE

From
Mark Kirkwood
Date:
Tom Lane wrote:

> 
> We might be able to finesse the protocol problem by teaching EA to
> respond to query cancel by emitting the data-so-far as a NOTICE (like it
> used to do many moons ago), rather than a standard query result, then
> allowing the query to error out.  However this'd be fairly unfriendly
> for client-side tools that are expecting a query result.
> 

wouldn't the above be ok - given that the primary use for this EA 
extension is troublesome query debugging anyway ?

Cheers

Mark


Re: EXPLAIN ANALYZE

From
"Simon Riggs"
Date:
On Sun, 2006-12-10 at 18:09 -0500, Tom Lane wrote:
> "Simon Riggs" <simon@2ndquadrant.com> writes:
> > The EA case is pretty straightforward though;
> 
> Well, no its not, as you'll recall if you re-read the prior discussions.
> The killer problem is that it's unclear whether the early termination of
> the query represents an error condition or not.  If it's not an error
> then you've got a serious problem for non-SELECT queries (which EA
> actually executes, remember) 

I was expecting to take the non-ERROR route, realising all of the
problems you mention make the issue so costly to solve that way. Most
big queries are SELECTs without any updates and its those we care about
here. Anybody doing huge updates needs other assistance, IMHO.

Intermediate results are always better than none at all. I do understand
what a partial execution would look like - frequently it is the
preparatory stages that slow a query down - costly sorts, underestimated
hash joins etc. Other times it is loop underestimation, which can
usually be seen fairly quickly.

I foresaw that it would be possible to enforce EA as a read-only
transaction, or throw an error (with no output) when issued in
time-limited form. Maybe that isn't possible.

Yes, I am looking for a fix that can be accomplished without major work
and/or change. This topic is a pain, but not such a priority feature.

--  Simon Riggs              EnterpriseDB   http://www.enterprisedb.com




Re: EXPLAIN ANALYZE

From
Gregory Stark
Date:
"Tom Lane" <tgl@sss.pgh.pa.us> writes:

> We might be able to finesse the protocol problem by teaching EA to
> respond to query cancel by emitting the data-so-far as a NOTICE (like it
> used to do many moons ago), rather than a standard query result, then
> allowing the query to error out.  However this'd be fairly unfriendly
> for client-side tools that are expecting a query result.

What I suggested was introducing a new FE/BE message type for analyze query
plans. Then clients that recognize it can use it to display the query plan
without interfering with the query results. Clients that don't know what to do
with it would have to just ignore it.

Then we could introduce as many ways of triggering these messages as we like.
A GUC to trigger one every n seconds, a FE/BE message like QueryCancel, say,
QueryProbe which triggers one when the user presses a button in pgadmin or C-t
(SIGINFO) in psql, etc.

I was thinking that it should be more structured than the current block of
text that clients receive. I had in mind to make it equivalent to a PGResult
so the various bits of data would be in different named columns. This would
let GUI clients like pgadmin interpret the results more effectively and make
it easier for us to add data without worrying about information overload on
the user's side.

And the query would keep operating. Canceling the query and statement_timeout
would both be entirely orthogonal to requesting analyze results.

--  Gregory Stark EnterpriseDB          http://www.enterprisedb.com


Re: EXPLAIN ANALYZE

From
Richard Huxton
Date:
Simon Riggs wrote:
> Intermediate results are always better than none at all. I do understand
> what a partial execution would look like - frequently it is the
> preparatory stages that slow a query down - costly sorts, underestimated
> hash joins etc. Other times it is loop underestimation, which can
> usually be seen fairly quickly.

Surely all you're interested in is where the actual plan differs from 
the expected plan? Could you not just have a mode that issues NOTICEs 
when expected/actual number of rows differ by more than a set amount? 
You'd probably want two NOTICEs - one when the threshold is exceeded, 
one when the node completes.

--   Richard Huxton  Archonet Ltd


Re: EXPLAIN ANALYZE

From
"Simon Riggs"
Date:
On Mon, 2006-12-11 at 11:00 +0000, Gregory Stark wrote:
> "Tom Lane" <tgl@sss.pgh.pa.us> writes:
> 
> > We might be able to finesse the protocol problem by teaching EA to
> > respond to query cancel by emitting the data-so-far as a NOTICE (like it
> > used to do many moons ago), rather than a standard query result, then
> > allowing the query to error out.  However this'd be fairly unfriendly
> > for client-side tools that are expecting a query result.
> 
> What I suggested was introducing a new FE/BE message type for analyze query
> plans. Then clients that recognize it can use it to display the query plan
> without interfering with the query results. Clients that don't know what to do
> with it would have to just ignore it.
> 
> Then we could introduce as many ways of triggering these messages as we like.
> A GUC to trigger one every n seconds, a FE/BE message like QueryCancel, say,
> QueryProbe which triggers one when the user presses a button in pgadmin or C-t
> (SIGINFO) in psql, etc.
> 
> I was thinking that it should be more structured than the current block of
> text that clients receive. I had in mind to make it equivalent to a PGResult
> so the various bits of data would be in different named columns. This would
> let GUI clients like pgadmin interpret the results more effectively and make
> it easier for us to add data without worrying about information overload on
> the user's side.
> 
> And the query would keep operating. Canceling the query and statement_timeout
> would both be entirely orthogonal to requesting analyze results.

I like the idea, but its more work than I really wanted to get into
right now.

--  Simon Riggs              EnterpriseDB   http://www.enterprisedb.com




Re: EXPLAIN ANALYZE

From
Tom Lane
Date:
"Simon Riggs" <simon@2ndquadrant.com> writes:
> On Mon, 2006-12-11 at 11:00 +0000, Gregory Stark wrote:
>> What I suggested was introducing a new FE/BE message type for analyze query
>> plans.

> I like the idea, but its more work than I really wanted to get into
> right now.

Yeah ... a protocol change is *painful*, especially if you really want
clients to behave in a significantly new way.
        regards, tom lane


Re: EXPLAIN ANALYZE

From
Neil Conway
Date:
Tom Lane wrote:
> Yeah ... a protocol change is *painful*, especially if you really want
> clients to behave in a significantly new way.

A backward-incompatible protocol change is painful, sure, but ISTM we 
could implement what Greg describes as a straightforward extension to 
the V3 protocol. Then the backend could just avoid sending the query 
progress information to < V4 protocol clients.

-Neil



Re: EXPLAIN ANALYZE

From
Neil Conway
Date:
Simon Riggs wrote:
> I like the idea, but its more work than I really wanted to get into
> right now.

Well, from another point of view: do we need this feature so urgently 
that there is not enough time to do it properly? IMHO, no.

-Neil


Re: EXPLAIN ANALYZE

From
Tom Lane
Date:
Neil Conway <neilc@samurai.com> writes:
> Tom Lane wrote:
>> Yeah ... a protocol change is *painful*, especially if you really want
>> clients to behave in a significantly new way.

> A backward-incompatible protocol change is painful, sure, but ISTM we 
> could implement what Greg describes as a straightforward extension to 
> the V3 protocol. Then the backend could just avoid sending the query 
> progress information to < V4 protocol clients.

You're dodging the point though.  If you want the new message type to do
anything useful in V4 clients, you still have to define an API for
libpq, update psql, try to figure out what the heck JDBC and ODBC are
going to do with it, etc etc.  All doable, but it's a lot more work than
just a quick hack in the backend.
        regards, tom lane


Re: EXPLAIN ANALYZE

From
Bruce Momjian
Date:
Richard Huxton wrote:
> Simon Riggs wrote:
> > Intermediate results are always better than none at all. I do understand
> > what a partial execution would look like - frequently it is the
> > preparatory stages that slow a query down - costly sorts, underestimated
> > hash joins etc. Other times it is loop underestimation, which can
> > usually be seen fairly quickly.
> 
> Surely all you're interested in is where the actual plan differs from 
> the expected plan? Could you not just have a mode that issues NOTICEs 
> when expected/actual number of rows differ by more than a set amount? 
> You'd probably want two NOTICEs - one when the threshold is exceeded, 
> one when the node completes.

Right, we already have a TODO:
* Have EXPLAIN ANALYZE highlight poor optimizer estimates

I was thinking we could issue NOTICE when the estimates differed from
the actual by a specified percentage, and that NOTICE could be issued
while the query is still processing, assuming the stage completes before
the query does.  This seems much easier than doing protocol changes. 
TODO updated:
* Have EXPLAIN ANALYZE issue NOTICE messages when the estimated and  actual row counts differ by a specified
percentage

--  Bruce Momjian   bruce@momjian.us EnterpriseDB    http://www.enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +


Re: EXPLAIN ANALYZE

From
Neil Conway
Date:
On Tue, 2006-12-12 at 17:30 -0500, Bruce Momjian wrote: 
>     * Have EXPLAIN ANALYZE highlight poor optimizer estimates

> TODO updated:
> 
>     * Have EXPLAIN ANALYZE issue NOTICE messages when the estimated and
>       actual row counts differ by a specified percentage

I don't think this is an improvement. The old wording describes a broad
set of possible improvements. Your new text describes one way of
implementing a subset of the former TODO wording.

-Neil




Re: EXPLAIN ANALYZE

From
Bruce Momjian
Date:
Neil Conway wrote:
> On Tue, 2006-12-12 at 17:30 -0500, Bruce Momjian wrote: 
> >     * Have EXPLAIN ANALYZE highlight poor optimizer estimates
> 
> > TODO updated:
> > 
> >     * Have EXPLAIN ANALYZE issue NOTICE messages when the estimated and
> >       actual row counts differ by a specified percentage
> 
> I don't think this is an improvement. The old wording describes a broad
> set of possible improvements. Your new text describes one way of
> implementing a subset of the former TODO wording.

Well, we can still do a broader implementation if we want it.  Do you
have any suggestions?  Basically, the more specific, the more likely we
will get someone to do it, and  we can always add more details.

--  Bruce Momjian   bruce@momjian.us EnterpriseDB    http://www.enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +


Re: EXPLAIN ANALYZE

From
"Jim C. Nasby"
Date:
On Mon, Dec 11, 2006 at 12:24:12AM +0100, Peter Eisentraut wrote:
> Simon Riggs wrote:
> > Well, I'd like a way of making EXPLAIN ANALYZE return something
> > useful within a reasonable amount of time. We can define that as the
> > amount of time that the user considers is their goal for the query.
> 
> What sort of "useful" results would you expect to be able to see from 
> such an aborted EXPLAIN ANALYZE?  I cannot quite imagine what 
> instructive value a partially executed plan output would have.  It's 
> not like we can somehow ensure executing an equal proportion of each 
> plan node or something.  Do you have a specific case in mind?

The query is most likely to get canceled while it is working on whatever
node in the plan is the bottleneck, and it's likely going to be easy to
spot since nodes above it wouldn't have gotten much done.
-- 
Jim Nasby                                            jim@nasby.net
EnterpriseDB      http://enterprisedb.com      512.569.9461 (cell)


Re: EXPLAIN ANALYZE

From
Joshua Reich
Date:
Thumbs up on this from a lurker.

I recall a previous post about some sort of "progress bar" hack that 
would show you where in a plan a currently executing query was at. Has 
any work been done on this?

Josh Reich


Jim C. Nasby wrote:
> On Mon, Dec 11, 2006 at 12:24:12AM +0100, Peter Eisentraut wrote:
>   
>> Simon Riggs wrote:
>>     
>>> Well, I'd like a way of making EXPLAIN ANALYZE return something
>>> useful within a reasonable amount of time. We can define that as the
>>> amount of time that the user considers is their goal for the query.
>>>       
>> What sort of "useful" results would you expect to be able to see from 
>> such an aborted EXPLAIN ANALYZE?  I cannot quite imagine what 
>> instructive value a partially executed plan output would have.  It's 
>> not like we can somehow ensure executing an equal proportion of each 
>> plan node or something.  Do you have a specific case in mind?
>>     
>
> The query is most likely to get canceled while it is working on whatever
> node in the plan is the bottleneck, and it's likely going to be easy to
> spot since nodes above it wouldn't have gotten much done.
>