On Thu, Aug 12, 2010 at 11:15 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> I'm not exactly following this. =A0My guess is that the breakeven point
>> is going to be pretty low because I think Param nodes are pretty
>> cheap.
>
> If you have any significant number of executions of the expression, then
> of course converting it to an initplan is a win. =A0What I'm worried about
> is where you have just a small number (like maybe only one, if it gets
> converted to an indexqual for instance). =A0Then the added expense of
> setting up the initplan isn't going to be repaid. =A0As long as the
> expression is pretty expensive, the percentage overhead of wrapping it
> in an initplan will probably be tolerable anyway, but I'm not sure where
> the threshold of "pretty expensive" is for that.
Oh, I see. It seems a lot more elegant if we can start by determining
whether the expression is in a context where it's likely to be
executed more than once.
*thinks*
I wonder if we could make this decision mostly based on node types.
Maybe it's reasonable to say that if we're doing say, a Seq Scan, we
always assume it's going to get evaluated more than once. Yeah, the
table could have <2 rows in it, but mostly it won't, and I don't know
that it's wise to optimize for that case even if we *think* that's
what the statistics are telling us. Similarly for a Function Scan,
CTE Scan, Worktable Scan, etc. We *could* look at the row estimates,
but I bet it isn't necessary. On the other hand, for an index qual,
it's probably pointless. I guess the hard case is a filter qual on an
index scan... it's not too clear to me what the right thing to do is
in that case.
>> Well, that's certainly a good place to start, but I was thinking that
>> it would be nice to optimize things like this:
>
>> SELECT * FROM foo WHERE somecolumn =3D somefunc();
>
>> This is OK if we choose a straight index scan, but it's probably very
>> much worth optimizing if we end up doing anything else. =A0If that's too
>> hairy, then maybe not, but it's not obvious to me why it would be
>> expensive.
>
> Because you have to look at every subexpression of every subexpression
> to figure out if it's (a) stable and (b) expensive. =A0Each of those
> checks is un-cheap in itself, and if you blindly apply them at every
> node of an expression tree the cost will be exponential.
I think you'd need some kind of expression tree walker that builds up
a list of maximal stable subexpression trees. It would be nice to
figure this out at some point in the process where we already have to
check volatility anyway.
--=20
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company