Ending EXPLAIN ANALYZE early (was Re: That EXPLAIN ANALYZE patch still needs work) - Mailing list pgsql-hackers

From Tom Lane
Subject Ending EXPLAIN ANALYZE early (was Re: That EXPLAIN ANALYZE patch still needs work)
Date
Msg-id 29265.1149789093@sss.pgh.pa.us
Whole thread Raw
In response to Re: That EXPLAIN ANALYZE patch still needs work  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Ending EXPLAIN ANALYZE early (was Re: That EXPLAIN ANALYZE patch still needs work)  (Gregory Stark <gsstark@mit.edu>)
Re: Ending EXPLAIN ANALYZE early (was Re: That EXPLAIN ANALYZE patch still needs work)  (Alvaro Herrera <alvherre@commandprompt.com>)
Re: Ending EXPLAIN ANALYZE early (was Re: That EXPLAIN  (Simon Riggs <simon@2ndquadrant.com>)
List pgsql-hackers
I wrote:
> Simon Riggs <simon@2ndquadrant.com> writes:
>> A full EXPLAIN ANALYZE is always desirable - we agree on that. The
>> question is what we do when one is not available.

> The least bad alternative I've heard is to let EXPLAIN ANALYZE print
> out stats-so-far if the query is canceled by control-C or statement
> timeout.  The objection to this is you may mistake startup transients
> for full query behavior ... but at least the numbers will be good as
> far as they go.

I thought some more about this, and it seems doable but tricky (ie,
there are many wrong ways to do it).  Here are my conclusions:

We can't just use the normal QueryCancel logic that throws an
elog(ERROR) from someplace down inside the query.  This would leave
the backend in an unclean state from which we could only certainly
recover by doing AbortTransaction.  And once we've aborted the
transaction we can't do catalog accesses, which gets in the way of
producing the EXPLAIN printout.

Running the test query inside a subtransaction would fix that,
but aborting the subtransaction would throw away the executor state,
including the Instrumentation nodes we need.

So it seems we need a way to stop the query without raising an error
per se.  What I'm thinking is that while EXPLAIN ANALYZE is running,
timeout or SIGINT should not set QueryCancelPending, but instead
set a separate flag "ExplainCancelPending", which we then test in
ExecProcNode(), say
if (node->instrument){
+        if (ExplainCancelPending)
+            return NULL;    InstrStartNode(node->instrument);}

There might be one or two other places to check it, but basically
we'd only notice the flag in very limited circumstances where
it's definitely safe to force early termination of ExecutorRun.

When control comes back to explain.c, we just print the results as
normal (but probably adding a line explicitly noting that the query
was abandoned before completion).  Note that we won't have any "running"
Instrumentation nodes to contend with, since the change doesn't cause
nodes to drop out after they've started timing.  So the data is good,
just incomplete.

After we've printed the results, we have a bit of a problem: if
ExplainCancelPending is set, we now need to abort the transaction.  It
would not do at all to allow an incompletely executed UPDATE to commit.
I experimented with throwing an elog() at the bottom of ExplainQuery()
after end_tup_output(), but this does not work: psql only prints the
error and not the data, because libpq throws away the query result
upon seeing the error.  We could probably hack psql to print the results
before noting the error, but I'm concerned about making a change that
would change the behavior for other error-at-end-of-statement cases.
Also, it's not clear what might have to happen to get non-psql clients
to play along.

It seems like the best solution is to establish a subtransaction around
the entire EXPLAIN command (not just the test query), which we can abort
after we've printed the results.

A possible objection to this is that running a query inside a
subtransaction might have different/worse performance than running it
at top level.  I don't recall any severe bottlenecks of that kind but
that doesn't mean there aren't any (Alvaro, any comments?)

Note that this would mean that ending an EXPLAIN ANALYZE early, via
either control-C or statement timeout, would be treated as a non-error
situation from the point of view of the outer transaction.  This bothers
me a bit, because in fact the effects if any of the tested query would
have been rolled back.  Not sure we have any choice though.  If we
expose the error then we'll have problems with clients not showing the
EXPLAIN results.

Thoughts?
        regards, tom lane


pgsql-hackers by date:

Previous
From: Greg Stark
Date:
Subject: Re: ADD/DROP INHERITS
Next
From: "Jim C. Nasby"
Date:
Subject: Re: [PATCHES] drop if exists remainder