Thread: EXPLAIN, utility statement parameters, and recent plpgsql changes

EXPLAIN, utility statement parameters, and recent plpgsql changes

From
Tom Lane
Date:
Pavel pointed out here
http://archives.postgresql.org/pgsql-hackers/2010-01/msg01233.php
that it no longer works to reference plpgsql variables in EXPLAIN
statements in plpgsql.  I dug into this a bit, and the code is trying
to do it but it doesn't quite work.

The issue centers around the behavior of the ParamListInfo data
structure, which was originally intended to carry only values for a
fixed set of $1 .. $n parameters (as in PREPARE/EXECUTE for instance).
This is the structure that carries plpgsql values into a command that's
executed as a cursor.  To support the recent changes in plgsql parsing,
I extended that struct to also carry parser hook functions.  The idea is
that while doing parse analysis of a statement, the parser hook
functions could capture references to plpgsql variables and turn them
into Params, which would then reference the data area of the
ParamListInfo struct at runtime.

This works well enough for regular DML statements, but it falls down for
EXPLAIN which is a utility statement, because *parse analysis of utility
statements doesn't do anything*.  EXPLAIN actually does the parse
analysis of its contained statement at the beginning of execution.
And that is too late, in the scenario Pavel exhibited.  Why is it too
late?  Because SPI_cursor_open_internal() intentionally "freezes" the
ParamListInfo struct after doing initial parsing: what it copies into
the cursor portal is just a static list of data values without the
parser hooks (see copyParamList).  This is really necessary because the
execution of the portal could outlive the function that created the
cursor, so we can't safely execute its parsing hooks anymore.

So what to do about it?  I can see two basic avenues towards a solution:

1. Change things so that copyParamList copies enough state into the
cursor portal so that we can still run the plpgsql parsing hooks during
cursor execution.  In the worst case this would imply copying *all*
local variables and parameters of the plpgsql function into the cursor
portal, plus a lot of names, types, etc.  We could perhaps optimize
things enough to only copy the values actually referenced, but it still
seems like possibly a rather nasty performance hit.  And it'd affect not
only explicit cursors, but every plpgsql for-over-rows construct,
because those are cursors internally.

2. Redesign EXPLAIN so that it parses the contained query in the initial
parsing step; it wouldn't be a simple utility command anymore but a
hybrid much like DECLARE CURSOR.  I think this would not be very messy.
The main objection to it is that it doesn't scale to solve the problem
for other types of utility statements.  Now we don't support parameters
in other types of utility statements anyway, but it's something we'd
like to do someday probably.

(Of course there are also 3. "Sorry, we're not going to support
variables in EXPLAIN anymore" and 4. Revert all those parsing fixes
in plpgsql, but I rejected these solutions out of hand.)

I'm kind of leaning to #2, particularly given that we don't have time
to expend a great deal of work on this for 8.5.  But I wonder if anyone
has any comments or alternative ideas.
        regards, tom lane


Re: EXPLAIN, utility statement parameters, and recent plpgsql changes

From
Pavel Stehule
Date:
2010/1/14 Tom Lane <tgl@sss.pgh.pa.us>:
> Pavel pointed out here
> http://archives.postgresql.org/pgsql-hackers/2010-01/msg01233.php
> that it no longer works to reference plpgsql variables in EXPLAIN
> statements in plpgsql.  I dug into this a bit, and the code is trying
> to do it but it doesn't quite work.
>
> The issue centers around the behavior of the ParamListInfo data
> structure, which was originally intended to carry only values for a
> fixed set of $1 .. $n parameters (as in PREPARE/EXECUTE for instance).
> This is the structure that carries plpgsql values into a command that's
> executed as a cursor.  To support the recent changes in plgsql parsing,
> I extended that struct to also carry parser hook functions.  The idea is
> that while doing parse analysis of a statement, the parser hook
> functions could capture references to plpgsql variables and turn them
> into Params, which would then reference the data area of the
> ParamListInfo struct at runtime.
>
> This works well enough for regular DML statements, but it falls down for
> EXPLAIN which is a utility statement, because *parse analysis of utility
> statements doesn't do anything*.  EXPLAIN actually does the parse
> analysis of its contained statement at the beginning of execution.
> And that is too late, in the scenario Pavel exhibited.  Why is it too
> late?  Because SPI_cursor_open_internal() intentionally "freezes" the
> ParamListInfo struct after doing initial parsing: what it copies into
> the cursor portal is just a static list of data values without the
> parser hooks (see copyParamList).  This is really necessary because the
> execution of the portal could outlive the function that created the
> cursor, so we can't safely execute its parsing hooks anymore.
>
> So what to do about it?  I can see two basic avenues towards a solution:
>
> 1. Change things so that copyParamList copies enough state into the
> cursor portal so that we can still run the plpgsql parsing hooks during
> cursor execution.  In the worst case this would imply copying *all*
> local variables and parameters of the plpgsql function into the cursor
> portal, plus a lot of names, types, etc.  We could perhaps optimize
> things enough to only copy the values actually referenced, but it still
> seems like possibly a rather nasty performance hit.  And it'd affect not
> only explicit cursors, but every plpgsql for-over-rows construct,
> because those are cursors internally.
>
> 2. Redesign EXPLAIN so that it parses the contained query in the initial
> parsing step; it wouldn't be a simple utility command anymore but a
> hybrid much like DECLARE CURSOR.  I think this would not be very messy.
> The main objection to it is that it doesn't scale to solve the problem
> for other types of utility statements.  Now we don't support parameters
> in other types of utility statements anyway, but it's something we'd
> like to do someday probably.

+1

Pavel
>
> (Of course there are also 3. "Sorry, we're not going to support
> variables in EXPLAIN anymore" and 4. Revert all those parsing fixes
> in plpgsql, but I rejected these solutions out of hand.)
>
> I'm kind of leaning to #2, particularly given that we don't have time
> to expend a great deal of work on this for 8.5.  But I wonder if anyone
> has any comments or alternative ideas.
>
>                        regards, tom lane
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>


Re: EXPLAIN, utility statement parameters, and recent plpgsql changes

From
Dimitri Fontaine
Date:
Tom Lane <tgl@sss.pgh.pa.us> writes:
> This works well enough for regular DML statements, but it falls down for
> EXPLAIN which is a utility statement, because *parse analysis of utility
> statements doesn't do anything*.  EXPLAIN actually does the parse
> analysis of its contained statement at the beginning of execution.
> And that is too late, in the scenario Pavel exhibited.  Why is it too
> late?  Because SPI_cursor_open_internal() intentionally "freezes" the
> ParamListInfo struct after doing initial parsing: what it copies into
> the cursor portal is just a static list of data values without the
> parser hooks (see copyParamList).

Would it make any sense for this function to get to call the hook in the
case a utility statement is being processed?

Regards,
-- 
dim


Re: EXPLAIN, utility statement parameters, and recent plpgsql changes

From
Tom Lane
Date:
Dimitri Fontaine <dfontaine@hi-media.com> writes:
> Tom Lane <tgl@sss.pgh.pa.us> writes:
>> This works well enough for regular DML statements, but it falls down for
>> EXPLAIN which is a utility statement, because *parse analysis of utility
>> statements doesn't do anything*.  EXPLAIN actually does the parse
>> analysis of its contained statement at the beginning of execution.
>> And that is too late, in the scenario Pavel exhibited.  Why is it too
>> late?  Because SPI_cursor_open_internal() intentionally "freezes" the
>> ParamListInfo struct after doing initial parsing: what it copies into
>> the cursor portal is just a static list of data values without the
>> parser hooks (see copyParamList).

> Would it make any sense for this function to get to call the hook in the
> case a utility statement is being processed?

Well, the point of the hook is to change the results of parse
transformation, so just calling it doesn't do much --- you have to apply
the whole parse analysis process, *and keep the resulting tree*.
        regards, tom lane


Re: EXPLAIN, utility statement parameters, and recent plpgsql changes

From
Tom Lane
Date:
I wrote:
> 2. Redesign EXPLAIN so that it parses the contained query in the initial
> parsing step; it wouldn't be a simple utility command anymore but a
> hybrid much like DECLARE CURSOR.  I think this would not be very messy.
> The main objection to it is that it doesn't scale to solve the problem
> for other types of utility statements.  Now we don't support parameters
> in other types of utility statements anyway, but it's something we'd
> like to do someday probably.

I've been looking some more at this.  The analogy to DECLARE CURSOR
isn't as good as I thought: we can't use a transformed representation
similar to DECLARE CURSOR's (namely, a Query with some extra stuff in
its utilityStmt field) because EXPLAIN can take non-SELECT queries,
which could be rewritten into multiple Query trees by the action of
rules.  So it seems the transformed representation would have to be
an ExplainStmt with a list of Queries underneath it.

The reason for the rule that utility statements aren't affected by parse
analysis is that parse analysis of regular queries takes locks on the
referenced tables, and we must keep hold of those locks to be sure that
the transformed tree still reflects database reality.  At the time we
made that rule it seemed too messy to consider doing anything similar
for utility statements.  However, now the locking considerations have
been centralized in plancache.c, which knows about re-taking locks on
a possibly stale cached plan.  So the price of doing parse analysis of
EXPLAIN's target statement during the normal parse analysis phase is
just going to be some adjustments in plancache.c so that it knows to
look underneath an ExplainStmt for queries representing additional locks
to re-take.  This is a little bit ugly, but not really any worse than
what it knows already about the representation of parsed queries.

So I conclude that the "it doesn't scale" argument isn't as strong
as it seemed.  In principle, to support parameters in other utility
statements, we'll have the same type of changes to make: * transform the expressions that might reference parameters
during  the normal parse analysis phase * teach plancache.c about finding lock dependencies in these   expressions
 
That seems fairly reasonable.
        regards, tom lane


Re: EXPLAIN, utility statement parameters, and recent plpgsql changes

From
Dimitri Fontaine
Date:
Tom Lane <tgl@sss.pgh.pa.us> writes:

> Dimitri Fontaine <dfontaine@hi-media.com> writes:
>> Tom Lane <tgl@sss.pgh.pa.us> writes:
>>> This works well enough for regular DML statements, but it falls down for
>>> EXPLAIN which is a utility statement, because *parse analysis of utility
>>> statements doesn't do anything*.  EXPLAIN actually does the parse
>>> analysis of its contained statement at the beginning of execution.
>>> And that is too late, in the scenario Pavel exhibited.  Why is it too
>>> late?  Because SPI_cursor_open_internal() intentionally "freezes" the
>>> ParamListInfo struct after doing initial parsing: what it copies into
>>> the cursor portal is just a static list of data values without the
>>> parser hooks (see copyParamList).
>
>> Would it make any sense for this function to get to call the hook in the
>> case a utility statement is being processed?
>
> Well, the point of the hook is to change the results of parse
> transformation, so just calling it doesn't do much --- you have to apply
> the whole parse analysis process, *and keep the resulting tree*.

Could that be done in the function, in the phase you call "doing initial
parsing"?
-- 
dim


Re: EXPLAIN, utility statement parameters, and recent plpgsql changes

From
Tom Lane
Date:
Dimitri Fontaine <dfontaine@hi-media.com> writes:
> Tom Lane <tgl@sss.pgh.pa.us> writes:
>> Well, the point of the hook is to change the results of parse
>> transformation, so just calling it doesn't do much --- you have to apply
>> the whole parse analysis process, *and keep the resulting tree*.

> Could that be done in the function, in the phase you call "doing initial
> parsing"?

Within copyParamList?  Seems like rather an abuse of the design ... it's
just supposed to copy the ParamListInfo, not editorialize on the entire
content of the portal.
        regards, tom lane