Early evaluation of constant expresions (with PATCH) - Mailing list pgsql-hackers
From | Bernard Frankpitt |
---|---|
Subject | Early evaluation of constant expresions (with PATCH) |
Date | |
Msg-id | 37E7E7F8.55FE1A8B@pop.dn.net Whole thread Raw |
Responses |
Re: [HACKERS] Early evaluation of constant expresions (with PATCH)
|
List | pgsql-hackers |
Hi All, Summary of the problem: I have another patch for a problem that I have run across, I describe the problem, the solution I propose, and I have included the patch as an attachment. The problem is that given two queries of the form SELECT <target list> FROM <range table> WHERE <var> <op> <function> (<const>); and SELECT <target list> FROM <range table> WHERE <var> <op> <const>; and a usable index on the table attribute corresponds to <var>, PostgreSQL will process the queries in different ways. Where the second query will use the index, the <function> in the first index will fool the optimizer into missing the index even though it is applicable. Example: ------------------------------------------------------------------------ Welcome to the POSTGRESQL interactive sql monitor: Please read the file COPYRIGHT for copyright terms of POSTGRESQL [PostgreSQL 6.5.1 on i686-pc-linux-gnu, compiled by gcc 2.7.2.3] type \? for help on slash commands type \q to quit type \g or terminate with semicolon to execute queryYou are currentlyconnected to the database: bernie bernie=> \i ../pgsql6_6/test.sql CREATE TABLE t1 (a1 int4); CREATE CREATE INDEX t1_idx ON t1 USING btree (a1); CREATE CREATE FUNCTION sqr (int4) RETURNS int4 AS 'SELECT ($1)*($1)' LANGUAGE 'sql'; CREATE INSERT INTO t1 VALUES (1); INSERT 188236 1 " " " " " " " " INSERT INTO t1 VALUES (10); INSERT 188245 1 -- Select with predicate of the form <var> <op> <const> SELECT * FROM t1 WHERE t1.a1=5; a1 --5 (1 row) EXPLAIN SELECT * FROM t1 WHERE t1.a1=5; NOTICE: QUERY PLAN: Index Scan using t1_idx on t1 (cost=2.05 rows=1 width=4) EXPLAIN -- Select with predicate of the form <var> <op> <function> (<const>) SELECT * FROM t1 WHERE t1.a1 = sqr(2); a1 --4 (1 row) EXPLAIN SELECT * FROM t1 WHERE t1.a1 = sqr(2); NOTICE: QUERY PLAN: Seq Scan on t1 (cost=43.00 rows=100 width=4) EXPLAIN EOF ------------------------------------------------------------------------- Cause: The cause of the problem is in the match_clause_to_indexkey() In optimizer/path/indxpath.c. The comment before the function says * To match, the clause:** (1a) for a restriction clause: must be in the form (indexkey op const)* or (const op indexkey), or So the routine that matches a restriction clause to an index is not smart enough to realize that a function of constant arguments is really a constant. Solution: The solution that I propose is to include code in the optimizer that picks functions with constant arguments out of a qualification clause, and evaluates them. If sub-expression tree in a qual node only consists of functions, operators, boolean nodes, and constants, then it should evaluate to a constant node. It would be possible to scan for such subtrees in the parser without evaluating them, but it seems to me that doing the evaluation early is an optimization, and a simplification of the planner and executor trees. I have implemented the solution by adding a tree mutator called eval_const_expr_mutator() to optimizer/util/clauses.c. This mutator calls ExecEvalExpr() from executor/execQual.c to do the actual evaluation. The ExpressionContext argument to ExecEvalExpr() is assigned a null pointer. This hack works, because the tree that is being evaluated contains only constant nodes, The only code called by ExecEvalExpr() that needs the econtext is code that resolves parameter and variable nodes. The eval_const_expr_mutator() uses the fields in the fcache structure that ExecEvalExpr() creates to construct the Const node that it returns. I don't know how you all feel about mixing code from the executor and the planner in this way, but it if you accept early evaluation of constant functions in the planner, then there has to be some common functionality between the two sections. I would be happy to hear suggestions for a better way to abstract the interface to the evaluation code so that the executor and planner see a common neutral interface. Finally, there is the question of where in the planner should the early evaluation occur. It is not obvious to me where the best point is, I chose to put it in plan/initsplan.c:add_restrict_and_join_to_rels(). The function add_restrict_and_join_to_rels() loops through the list of qual clauses, and adds the join and restriction information to the RelOptInfo nodes for the realtions that participate in the qual clauses. The function becomes: void add_restrict_and_join_to_rels(Query *root, List *clauses) { List *clause; foreach(clause, clauses) { clause = eval_const_expr_in_quals(clause) add_restrict_and_join_to_rel(root,(Node*) lfirst(clause)); } } This choice means that evaluation is performed right before the call to make_one_rel() in planmain.c:subplanner(). Results: With the patch the second SELECT statement in the example becomes ------------------------------------------------------------------------ bernie=> SELECT * FROM t1 WHERE t1.a1 = sqr(2); a1 --4 (1 row) bernie=> bernie=> EXPLAIN SELECT * FROM t1 WHERE t1.a1 = sqr(2); NOTICE: QUERY PLAN: Index Scan using t1_idx on t1 (cost=2.50 rows=10 width=4) EXPLAIN ------------------------------------------------------------------------ That's a long explanation for a small patch, but hacking this stuff is a little like walking on hot stones --- you want to be sure that you are doing it right before you get burnt. Bernie Frankpitt ------------------------------------------------------------------------- Patch attached: Note: I pgindent'ed both the .orig and the new files before making the patch
pgsql-hackers by date: