Thread: EXPLAIN, utility statement parameters, and recent plpgsql changes
Pavel pointed out here http://archives.postgresql.org/pgsql-hackers/2010-01/msg01233.php that it no longer works to reference plpgsql variables in EXPLAIN statements in plpgsql. I dug into this a bit, and the code is trying to do it but it doesn't quite work. The issue centers around the behavior of the ParamListInfo data structure, which was originally intended to carry only values for a fixed set of $1 .. $n parameters (as in PREPARE/EXECUTE for instance). This is the structure that carries plpgsql values into a command that's executed as a cursor. To support the recent changes in plgsql parsing, I extended that struct to also carry parser hook functions. The idea is that while doing parse analysis of a statement, the parser hook functions could capture references to plpgsql variables and turn them into Params, which would then reference the data area of the ParamListInfo struct at runtime. This works well enough for regular DML statements, but it falls down for EXPLAIN which is a utility statement, because *parse analysis of utility statements doesn't do anything*. EXPLAIN actually does the parse analysis of its contained statement at the beginning of execution. And that is too late, in the scenario Pavel exhibited. Why is it too late? Because SPI_cursor_open_internal() intentionally "freezes" the ParamListInfo struct after doing initial parsing: what it copies into the cursor portal is just a static list of data values without the parser hooks (see copyParamList). This is really necessary because the execution of the portal could outlive the function that created the cursor, so we can't safely execute its parsing hooks anymore. So what to do about it? I can see two basic avenues towards a solution: 1. Change things so that copyParamList copies enough state into the cursor portal so that we can still run the plpgsql parsing hooks during cursor execution. In the worst case this would imply copying *all* local variables and parameters of the plpgsql function into the cursor portal, plus a lot of names, types, etc. We could perhaps optimize things enough to only copy the values actually referenced, but it still seems like possibly a rather nasty performance hit. And it'd affect not only explicit cursors, but every plpgsql for-over-rows construct, because those are cursors internally. 2. Redesign EXPLAIN so that it parses the contained query in the initial parsing step; it wouldn't be a simple utility command anymore but a hybrid much like DECLARE CURSOR. I think this would not be very messy. The main objection to it is that it doesn't scale to solve the problem for other types of utility statements. Now we don't support parameters in other types of utility statements anyway, but it's something we'd like to do someday probably. (Of course there are also 3. "Sorry, we're not going to support variables in EXPLAIN anymore" and 4. Revert all those parsing fixes in plpgsql, but I rejected these solutions out of hand.) I'm kind of leaning to #2, particularly given that we don't have time to expend a great deal of work on this for 8.5. But I wonder if anyone has any comments or alternative ideas. regards, tom lane
2010/1/14 Tom Lane <tgl@sss.pgh.pa.us>: > Pavel pointed out here > http://archives.postgresql.org/pgsql-hackers/2010-01/msg01233.php > that it no longer works to reference plpgsql variables in EXPLAIN > statements in plpgsql. I dug into this a bit, and the code is trying > to do it but it doesn't quite work. > > The issue centers around the behavior of the ParamListInfo data > structure, which was originally intended to carry only values for a > fixed set of $1 .. $n parameters (as in PREPARE/EXECUTE for instance). > This is the structure that carries plpgsql values into a command that's > executed as a cursor. To support the recent changes in plgsql parsing, > I extended that struct to also carry parser hook functions. The idea is > that while doing parse analysis of a statement, the parser hook > functions could capture references to plpgsql variables and turn them > into Params, which would then reference the data area of the > ParamListInfo struct at runtime. > > This works well enough for regular DML statements, but it falls down for > EXPLAIN which is a utility statement, because *parse analysis of utility > statements doesn't do anything*. EXPLAIN actually does the parse > analysis of its contained statement at the beginning of execution. > And that is too late, in the scenario Pavel exhibited. Why is it too > late? Because SPI_cursor_open_internal() intentionally "freezes" the > ParamListInfo struct after doing initial parsing: what it copies into > the cursor portal is just a static list of data values without the > parser hooks (see copyParamList). This is really necessary because the > execution of the portal could outlive the function that created the > cursor, so we can't safely execute its parsing hooks anymore. > > So what to do about it? I can see two basic avenues towards a solution: > > 1. Change things so that copyParamList copies enough state into the > cursor portal so that we can still run the plpgsql parsing hooks during > cursor execution. In the worst case this would imply copying *all* > local variables and parameters of the plpgsql function into the cursor > portal, plus a lot of names, types, etc. We could perhaps optimize > things enough to only copy the values actually referenced, but it still > seems like possibly a rather nasty performance hit. And it'd affect not > only explicit cursors, but every plpgsql for-over-rows construct, > because those are cursors internally. > > 2. Redesign EXPLAIN so that it parses the contained query in the initial > parsing step; it wouldn't be a simple utility command anymore but a > hybrid much like DECLARE CURSOR. I think this would not be very messy. > The main objection to it is that it doesn't scale to solve the problem > for other types of utility statements. Now we don't support parameters > in other types of utility statements anyway, but it's something we'd > like to do someday probably. +1 Pavel > > (Of course there are also 3. "Sorry, we're not going to support > variables in EXPLAIN anymore" and 4. Revert all those parsing fixes > in plpgsql, but I rejected these solutions out of hand.) > > I'm kind of leaning to #2, particularly given that we don't have time > to expend a great deal of work on this for 8.5. But I wonder if anyone > has any comments or alternative ideas. > > regards, tom lane > > -- > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-hackers >
Tom Lane <tgl@sss.pgh.pa.us> writes: > This works well enough for regular DML statements, but it falls down for > EXPLAIN which is a utility statement, because *parse analysis of utility > statements doesn't do anything*. EXPLAIN actually does the parse > analysis of its contained statement at the beginning of execution. > And that is too late, in the scenario Pavel exhibited. Why is it too > late? Because SPI_cursor_open_internal() intentionally "freezes" the > ParamListInfo struct after doing initial parsing: what it copies into > the cursor portal is just a static list of data values without the > parser hooks (see copyParamList). Would it make any sense for this function to get to call the hook in the case a utility statement is being processed? Regards, -- dim
Dimitri Fontaine <dfontaine@hi-media.com> writes: > Tom Lane <tgl@sss.pgh.pa.us> writes: >> This works well enough for regular DML statements, but it falls down for >> EXPLAIN which is a utility statement, because *parse analysis of utility >> statements doesn't do anything*. EXPLAIN actually does the parse >> analysis of its contained statement at the beginning of execution. >> And that is too late, in the scenario Pavel exhibited. Why is it too >> late? Because SPI_cursor_open_internal() intentionally "freezes" the >> ParamListInfo struct after doing initial parsing: what it copies into >> the cursor portal is just a static list of data values without the >> parser hooks (see copyParamList). > Would it make any sense for this function to get to call the hook in the > case a utility statement is being processed? Well, the point of the hook is to change the results of parse transformation, so just calling it doesn't do much --- you have to apply the whole parse analysis process, *and keep the resulting tree*. regards, tom lane
I wrote: > 2. Redesign EXPLAIN so that it parses the contained query in the initial > parsing step; it wouldn't be a simple utility command anymore but a > hybrid much like DECLARE CURSOR. I think this would not be very messy. > The main objection to it is that it doesn't scale to solve the problem > for other types of utility statements. Now we don't support parameters > in other types of utility statements anyway, but it's something we'd > like to do someday probably. I've been looking some more at this. The analogy to DECLARE CURSOR isn't as good as I thought: we can't use a transformed representation similar to DECLARE CURSOR's (namely, a Query with some extra stuff in its utilityStmt field) because EXPLAIN can take non-SELECT queries, which could be rewritten into multiple Query trees by the action of rules. So it seems the transformed representation would have to be an ExplainStmt with a list of Queries underneath it. The reason for the rule that utility statements aren't affected by parse analysis is that parse analysis of regular queries takes locks on the referenced tables, and we must keep hold of those locks to be sure that the transformed tree still reflects database reality. At the time we made that rule it seemed too messy to consider doing anything similar for utility statements. However, now the locking considerations have been centralized in plancache.c, which knows about re-taking locks on a possibly stale cached plan. So the price of doing parse analysis of EXPLAIN's target statement during the normal parse analysis phase is just going to be some adjustments in plancache.c so that it knows to look underneath an ExplainStmt for queries representing additional locks to re-take. This is a little bit ugly, but not really any worse than what it knows already about the representation of parsed queries. So I conclude that the "it doesn't scale" argument isn't as strong as it seemed. In principle, to support parameters in other utility statements, we'll have the same type of changes to make: * transform the expressions that might reference parameters during the normal parse analysis phase * teach plancache.c about finding lock dependencies in these expressions That seems fairly reasonable. regards, tom lane
Tom Lane <tgl@sss.pgh.pa.us> writes: > Dimitri Fontaine <dfontaine@hi-media.com> writes: >> Tom Lane <tgl@sss.pgh.pa.us> writes: >>> This works well enough for regular DML statements, but it falls down for >>> EXPLAIN which is a utility statement, because *parse analysis of utility >>> statements doesn't do anything*. EXPLAIN actually does the parse >>> analysis of its contained statement at the beginning of execution. >>> And that is too late, in the scenario Pavel exhibited. Why is it too >>> late? Because SPI_cursor_open_internal() intentionally "freezes" the >>> ParamListInfo struct after doing initial parsing: what it copies into >>> the cursor portal is just a static list of data values without the >>> parser hooks (see copyParamList). > >> Would it make any sense for this function to get to call the hook in the >> case a utility statement is being processed? > > Well, the point of the hook is to change the results of parse > transformation, so just calling it doesn't do much --- you have to apply > the whole parse analysis process, *and keep the resulting tree*. Could that be done in the function, in the phase you call "doing initial parsing"? -- dim
Dimitri Fontaine <dfontaine@hi-media.com> writes: > Tom Lane <tgl@sss.pgh.pa.us> writes: >> Well, the point of the hook is to change the results of parse >> transformation, so just calling it doesn't do much --- you have to apply >> the whole parse analysis process, *and keep the resulting tree*. > Could that be done in the function, in the phase you call "doing initial > parsing"? Within copyParamList? Seems like rather an abuse of the design ... it's just supposed to copy the ParamListInfo, not editorialize on the entire content of the portal. regards, tom lane