read-only planner input - Mailing list pgsql-hackers

From Neil Conway
Subject read-only planner input
Date
Msg-id 423A5E17.9080506@samurai.com
Whole thread Raw
Responses Re: read-only planner input
Re: read-only planner input
List pgsql-hackers
I've been taking a look at how to stop the planner from scribbling on 
its input. This is my first modification of any significance to the 
planner, so don't hesitate to tell me what I've gotten wrong :)

I think the planner makes two kinds of modifications to the input Query: 
(a) rewriting of the Query to improve planning (b) as a convenient place 
to store planner working state. Some examples of the former include 
transforming IN clauses to joins, transforming simple FROM-clause 
subselects into joins, preprocessing expressions, and so forth. Examples 
of the latter are mostly the "internal to planner" fields denoted in the 
Query struct definition.

(b) should be pretty easy to solve; we can create a per-Query PlanState 
struct that contains this information, as well as holding a pointer to 
the Query (and perhaps the in-construct Plan tree).

I'm still trying to figure out how to handle (a). Perhaps we can create 
an additional plan node that always sits at the top of the plan tree. 
This would hold derivations of data from the input Query. A lot of the 
code that implements (a) is actually already applicative in nature, but 
any code that modifies a Query destructively would need to be changed. 
In other words, rather than
    query->jointree = pull_up_subqueries(parse, query->jointree);

We'd have:
    top_plan_node->jointree = pull_up_subqueries(plan_state,
query->jointree);

(Possibly passing PlanState rather than `parse', which is a Query, if 
needed. The example is also somewhat simplified.)

BTW, I wonder whether it would be possible to move some preprocessing 
from the early stages of the planner to a "preprocessing" phase that 
would run after the rewriter but before the planner proper. The 
preprocessor would maintain the essential properties of the input Query, 
but it wouldn't need to be re-run when the query is replanned due to a 
modification to a dependent database object. For example, the decision 
about whether to pull-up a subquery could be done once and not redone in 
subsequent invocations of the planner on the same Query. On the other 
hand, I'm not sure how much preprocessing could be rearranged like this, 
and since replanning ought to be relatively rare, I'm not sure it's 
worth spending a whole lot of time trying to optimize it...

Comments welcome.

-Neil



pgsql-hackers by date:

Previous
From: Bruce Momjian
Date:
Subject: Query crashes/hangs server
Next
From: "Qingqing Zhou"
Date:
Subject: Re: Query crashes/hangs server