Thread: Early evaluation of constant expresions (with PATCH)

Early evaluation of constant expresions (with PATCH)

From
Bernard Frankpitt
Date:
Hi All, 

Summary of the problem:
I have another patch for a problem that I have run across, I describe
the problem, the solution I propose, and I have included the patch as
an attachment.

The problem is that given two queries of the form

SELECT <target list> FROM <range table>           WHERE <var> <op> <function> (<const>);

and

SELECT <target list> FROM <range table>           WHERE <var> <op> <const>;

and a usable index on the table attribute corresponds to <var>,
PostgreSQL will process the queries in different ways.  Where the
second query will use the index, the <function> in the first index
will fool the optimizer into missing the index even though it is
applicable.


Example:
------------------------------------------------------------------------
Welcome to the POSTGRESQL interactive sql monitor: Please read the file COPYRIGHT for copyright terms of POSTGRESQL
[PostgreSQL 6.5.1 on i686-pc-linux-gnu, compiled by gcc 2.7.2.3]
  type \? for help on slash commands  type \q to quit  type \g or terminate with semicolon to execute queryYou are
currentlyconnected to the database: bernie
 


bernie=> \i ../pgsql6_6/test.sql
CREATE TABLE t1 (a1 int4);
CREATE

CREATE INDEX t1_idx ON t1 USING btree (a1);
CREATE

CREATE FUNCTION sqr (int4) RETURNS int4          AS 'SELECT ($1)*($1)'          LANGUAGE 'sql';
CREATE

INSERT INTO t1 VALUES (1);
INSERT 188236 1
"  "  "  "  "  "  "   "

INSERT INTO t1 VALUES (10);
INSERT 188245 1

-- Select with predicate of the form <var> <op> <const>

SELECT * FROM t1 WHERE t1.a1=5;
a1
--5
(1 row)


EXPLAIN SELECT * FROM t1 WHERE t1.a1=5;
NOTICE:  QUERY PLAN:

Index Scan using t1_idx on t1  (cost=2.05 rows=1 width=4)

EXPLAIN

-- Select with predicate of the form <var> <op> <function> (<const>)

SELECT * FROM t1 WHERE t1.a1 = sqr(2);
a1
--4
(1 row)


EXPLAIN SELECT * FROM t1 WHERE t1.a1 = sqr(2);
NOTICE:  QUERY PLAN:

Seq Scan on t1  (cost=43.00 rows=100 width=4)

EXPLAIN
EOF

-------------------------------------------------------------------------
Cause: 

The cause of the problem is in the match_clause_to_indexkey() In
optimizer/path/indxpath.c. The comment before the function says
*        To match, the clause:**        (1a) for a restriction clause: must be in the form (indexkey
op const)*                 or (const op indexkey), or

So the routine that matches a restriction clause to an index is not
smart enough to realize that a function of constant arguments is
really a constant.


Solution: 

The solution that I propose is to include code in the optimizer that
picks functions with constant arguments out of a qualification
clause, and evaluates them.  If sub-expression tree in a qual node
only consists of functions, operators, boolean nodes, and constants,
then it should evaluate to a constant node.  It would be possible to
scan for such subtrees in the parser without evaluating them, but it
seems to me that doing the evaluation early is an optimization, and a
simplification of the planner and executor trees.

I have implemented the solution by adding a tree mutator called
eval_const_expr_mutator() to optimizer/util/clauses.c.  This mutator
calls ExecEvalExpr() from executor/execQual.c to do the actual
evaluation.  The ExpressionContext argument to ExecEvalExpr() is
assigned a null pointer.  This hack works, because the tree that is
being evaluated contains only constant nodes, The only code called by
ExecEvalExpr() that needs the econtext is code that resolves parameter
and variable nodes.  The eval_const_expr_mutator() uses the fields in
the fcache structure that ExecEvalExpr() creates to construct the 
Const node that it returns.

I don't know how you all feel about mixing code from the executor and
the planner in this way, but it if you accept early evaluation of
constant functions in the planner, then there has to be some common
functionality between the two sections.  I would be happy to hear
suggestions for a better way to abstract the interface to the
evaluation code so that the executor and planner see a common neutral
interface.

Finally, there is the question of where in the planner should the
early evaluation occur.  It is not obvious to me where the best point
is, I chose to put it in
plan/initsplan.c:add_restrict_and_join_to_rels().  The function
add_restrict_and_join_to_rels() loops through the list of qual
clauses, and adds the join and restriction information to the
RelOptInfo nodes for the realtions that participate in the qual clauses.

The function becomes:

void
add_restrict_and_join_to_rels(Query *root, List *clauses)
{       List       *clause;
       foreach(clause, clauses)         {               clause = eval_const_expr_in_quals(clause)
add_restrict_and_join_to_rel(root,(Node*)
 
lfirst(clause));         }
}

This choice means that evaluation is performed right before the call
to make_one_rel() in planmain.c:subplanner(). 

Results:

With the patch the second SELECT statement in the example becomes

------------------------------------------------------------------------
bernie=> SELECT * FROM t1 WHERE t1.a1 = sqr(2);
a1
--4
(1 row)

bernie=> 
bernie=> EXPLAIN SELECT * FROM t1 WHERE t1.a1 = sqr(2);
NOTICE:  QUERY PLAN:

Index Scan using t1_idx on t1  (cost=2.50 rows=10 width=4)

EXPLAIN
------------------------------------------------------------------------

That's a long explanation for a small patch, but hacking this stuff is
a little like walking on hot stones --- you want to be sure that you
are doing it right before you get burnt.

Bernie Frankpitt


-------------------------------------------------------------------------

Patch attached:

Note: I pgindent'ed both the .orig and the new files before  making the 
patch

Re: [HACKERS] Early evaluation of constant expresions (with PATCH)

From
Tom Lane
Date:
Bernard Frankpitt <frankpit@pop.dn.net> writes:
> The solution that I propose is to include code in the optimizer that
> picks functions with constant arguments out of a qualification
> clause, and evaluates them.

This is something I had on my own to-do list, and I'm glad to see
someone beat me to it.  But you've only done half the job: you
should also be folding operators with constant arguments.

Also, you need to be wary of functions like now() and random().
There probably isn't any other way to handle these than to add a
column to pg_proc flagging functions that can't be constant-folded.

> Finally, there is the question of where in the planner should the
> early evaluation occur.  It is not obvious to me where the best point
> is, I chose to put it in
> plan/initsplan.c:add_restrict_and_join_to_rels().

I believe it would be best to do it considerably earlier, specifically,
before cnfify().  It might even be worth running the code twice,
once before and once after cnfify.

Also, probably we should apply it to the targetlist as well as the qual.
        regards, tom lane


Re: [HACKERS] Early evaluation of constant expresions (with PATCH)

From
Bruce Momjian
Date:
> Bernard Frankpitt <frankpit@pop.dn.net> writes:
> > The solution that I propose is to include code in the optimizer that
> > picks functions with constant arguments out of a qualification
> > clause, and evaluates them.
> 
> This is something I had on my own to-do list, and I'm glad to see
> someone beat me to it.  But you've only done half the job: you
> should also be folding operators with constant arguments.
> 
> Also, you need to be wary of functions like now() and random().
> There probably isn't any other way to handle these than to add a
> column to pg_proc flagging functions that can't be constant-folded.

Already there, pg_proc.proiscachable.

--  Bruce Momjian                        |  http://www.op.net/~candle maillist@candle.pha.pa.us            |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: [HACKERS] Early evaluation of constant expresions (with PATCH)

From
frankpit@pop.dn.net
Date:
Tom Lane wrote:
> 
> Bernard Frankpitt <frankpit@pop.dn.net> writes:
> > The solution that I propose is to include code in the optimizer that
> > picks functions with constant arguments out of a qualification
> > clause, and evaluates them.
> 
> This is something I had on my own to-do list, and I'm glad to see
> someone beat me to it.  But you've only done half the job: you
> should also be folding operators with constant arguments.
> 
> Also, you need to be wary of functions like now() and random().
> There probably isn't any other way to handle these than to add a
> column to pg_proc flagging functions that can't be constant-folded.
>

I actually do the operators as well, and also boolean operators (which
are handled by special Expr nodes).

I puzzled over case of now() for a while but I don't think that it
raises a problem.

For queries like

SELECT * FROM t WHERE t.a < now();

Early evaluation seems quite reasonable. Now means a fixed time close to 
the time the backend received the query. It seems to me that all the
now() calls in a query should return values pretty close to each other.
A query like

SELECT * FROM t1 t2 WHERE t1.a < now() AND t2.a < now();

will have two values of now that are very close, since the evaluations
both happen in the planner.  People who expect the two now() calls to
give exactly the same value in this case are expecting too much, queries
like this should be rewritten

SELECT * FROM t1 t2 WHERE t1.a < now() AND t1.a = t2.a;

In fact istm that the correct way to handle now() would be to have a
value that is constant over a transation, and comensurate with the
numbering of tids.

I don't think that random() is a problem at all. It gets called once
each time it is written in the query string.  That is certainly a
reasonable interpretation of its meaning.  

> > is, I chose to put it in
> > plan/initsplan.c:add_restrict_and_join_to_rels().
> 
> I believe it would be best to do it considerably earlier, specifically,
> before cnfify().  It might even be worth running the code twice,
> once before and once after cnfify.
> 
> Also, probably we should apply it to the targetlist as well as the qual.
> 

Yes, close to cnfify might be a better place.  I only did it for the
quals because I don't think I understand the other parts of the plan
trees well enough.  The function is quite easy to use though, it acts as
a filter on connected subtrees that consist of List nodes and all Expr
nodes other than
SUBPLAN_EXPR nodes.  Because of the recursive way that qual plans are
built, subplans are still optimized.

Another factor about positioning of the filter that I was uncertain
about was time expense.  Is the time taken by multiple tree walks in the
planner
very significant in the overall scheme of things?

Bernie


Re: [HACKERS] Early evaluation of constant expresions (with PATCH)

From
Tom Lane
Date:
frankpit@pop.dn.net writes:
> Tom Lane wrote:
>> This is something I had on my own to-do list, and I'm glad to see
>> someone beat me to it.  But you've only done half the job: you
>> should also be folding operators with constant arguments.

> I actually do the operators as well, and also boolean operators (which
> are handled by special Expr nodes).

(Hangs head...)  Yup.  That's what I get for opining after a fast
late-night scan of a patch.  Relying on ExecEvalExpr is a good hack ---
the patch is much smaller than I would've guessed.  (Actually, now
that I look at it, it looks like the functions rather than the
operators are missing the necessary preinitialization.  Perhaps at
the place where you chose to put this in, setFcache has already
been done?)

There are additional smarts that could/should be put in, though.
In particular, I think we should be smarter about AND and OR clauses.
If *any* of the inputs to an AND are a constant FALSE, you can collapse
the node and not bother computing the other subexpressions; likewise
a constant TRUE input to an OR allows short-circuiting.  (I have seen
queries, primarily machine-generated ones, where this would be an
enormous win.)  Contrariwise, constant TRUE/FALSE inputs can simply be
dropped, and the AND or OR operator eliminated if only one nonconstant
input remains.  This is the reason why I think there is an interaction
with cnfify(): it rearranges the AND/OR structure of the tree and might
expose --- or hide --- opportunities of this kind.  (BTW, it might be a
good idea to do the first pass of cnfify, namely AND/OR flattening,
before trying to apply this simplification.)

Also, most operators and functions can be collapsed to NULL if any of
their inputs are NULL, although I don't think we can risk making that
optimization without adding a flag to pg_proc that tells us if it is OK.

>> Also, you need to be wary of functions like now() and random().
>> There probably isn't any other way to handle these than to add a
>> column to pg_proc flagging functions that can't be constant-folded.

> I puzzled over case of now() for a while but I don't think that it
> raises a problem.

No, you can't just define the problem away by saying that whatever
behavior is convenient to implement is acceptable.  It's true that
now() is not really a problem, because it's defined to yield the
start time of the current transaction, and therefore is effectively
a constant *within any one transaction*.  But random() *is* a problem,
and user-defined functions could be a problem.  SQL functions probably
shouldn't be folded either (not quite sure of that).

Bruce points out in another reply that the proiscachable field of
pg_proc is intended for exactly this purpose.  It hasn't been kept
up carefully because no extant code uses it, and in fact hardly any
of the standard entries in pg_proc are marked iscachable, which is
obviously silly.  But we could certainly go through pg_proc and set
the flag on everything except the danger items.

> Another factor about positioning of the filter that I was uncertain
> about was time expense.  Is the time taken by multiple tree walks in
> the planner very significant in the overall scheme of things?

I don't think you need to worry about anything that has cost linear in
the size of the expression tree.  Up till a couple weeks ago we had some
code in cnfify() that used space and time exponential in the size of the
tree :-( ... now it's down to O(N^2) which is still a bottleneck for
complex query expressions, but O(N) is not to be worried about.  Like
I said, I wouldn't object to running this code twice on a qual.

There are some upstream places where it would be nice too --- for
example, coercion of DEFAULT expressions would be best handled by
sticking a type-conversion function atop the given parsetree and then
seeing if this code would simplify it.  We'll definitely need to make
use of proiscachable to make that safe, however.
        regards, tom lane


Re: [HACKERS] Early evaluation of constant expresions (with PATCH)

From
Bernard Frankpitt
Date:
Tom Lane wrote:
> 
>  .... (Actually, now
> that I look at it, it looks like the functions rather than the
> operators are missing the necessary preinitialization.  Perhaps at
> the place where you chose to put this in, setFcache has already
> been done?)
>

The functions work because the funcid field in the Func node is already
filled in, and the EvalQual code uses this field to generate the Fcache.
In the case of Oper node there are two fields, one for the pg_operator
Oid,
and one for the pg_proc Oid. The pg_operator oid is already filled in,
but the pg_proc oid isn't.  The EvalQual code wants the pg_proc oid, so
I provide it in the patch before I do the evaluation.

> 
> There are additional smarts that could/should be put in, though. ....
> 
>     { Many good suggestions here } 
>
> .... without adding a flag to pg_proc that tells us if it is OK.
> 

All points well taken.  I don't have time to do this thoroughly right
now,
but I will get back to it.  My original ( needed-for-project-at-hand )
motivation for this was to get index scans to work with expressions that
evaluate to constants.  I can see that I am about to learn quite a bit
more about parsing and planning. 

> > I puzzled over case of now() for a while but I don't think that it
> > raises a problem.
> 
> No, you can't just define the problem away by saying that whatever
> behavior is convenient to implement is acceptable.

Oh darn! -- I've spent too many years studying mathematics

> and user-defined functions could be a problem.  SQL functions probably
> shouldn't be folded either (not quite sure of that).
> 
> Bruce points out in another reply that the proiscachable field of
> pg_proc is intended for exactly this purpose.  

Perhaps adding another option to create function is in order here. I
know how to do that already.  Seriously, there are some interesting
semantic issues here, especially if the database were being used as the
basis for a large dynamic stochastic model.


Bernie


Re: [HACKERS] Early evaluation of constant expresions (with PATCH)

From
Thomas Lockhart
Date:
You da man Bernard! I've been wanting to do this for a while also, but
hadn't taken the time to puzzle through it.

> In fact istm that the correct way to handle now() would be to have a
> value that is constant over a transation, and comensurate with the
> numbering of tids.

That is the current behavior of now(); Postgres has a time function
which returns the time of the current transaction and now() (and every
other form of date/time 'now') uses that.

When doing this pre-evaluation in the parse tree, is it possible that
the transaction time is not yet set? So perhaps 'now' and now() would
have problems here. Remember that "datetime 'now'" also resembles a
constant but has the same behavior as "now()", so can't really be
considered a true constant either.

> I don't think that random() is a problem at all. It gets called once
> each time it is written in the query string.  That is certainly a
> reasonable interpretation of its meaning.

If we use the "is cachable" flag for procedures (*and* for constants
on types!) then it would be possible to have random() behave as
expected- returning unique values for every invocation and unique
values into each field of a single-line query/insert.

I hope we haven't put you off here, and I'd suggest that we
incorporate your patches as they stand now, then work on the next step
later (assuming that it isn't something you have time or interest
for). But if you can see doing this next step already, we'd love to
have it included in the first patch. What do you think?

Thanks for the work. This is a neat area to get improvements for...
                     - Thomas

-- 
Thomas Lockhart                lockhart@alumni.caltech.edu
South Pasadena, California


Re: [HACKERS] Early evaluation of constant expresions (with PATCH)

From
Tom Lane
Date:
Bernard Frankpitt <frankpit@pop.dn.net> writes:
>> .... (Actually, now
>> that I look at it, it looks like the functions rather than the
>> operators are missing the necessary preinitialization.  Perhaps at
>> the place where you chose to put this in, setFcache has already
>> been done?)

> The functions work because the funcid field in the Func node is already
> filled in, and the EvalQual code uses this field to generate the
> Fcache.

Oh, OK.  Cool.  For some reason I was thinking that the planner
was supposed to generate the fcache entry somewhere along the line.

> All points well taken.  I don't have time to do this thoroughly right
> now, but I will get back to it.

OK, or I will work on it if I get to it before you do.  As Thomas
remarked, you've provided a great starting point --- thanks!

I know I already have one patch from you that I promised to integrate,
but I'm up to my ass in buffer refcount bugs :-(.  As soon as I can come
up for air, I will stick in this code, though I think I will call it
from somewhere near cnfify per prior discussion.

>> Bruce points out in another reply that the proiscachable field of
>> pg_proc is intended for exactly this purpose.  

> Perhaps adding another option to create function is in order here.

Actually, create function *has* an iscachable option which sets that
field.  According to glimpse, it's the only code in the system that
knows the field exists :-(
        regards, tom lane