Using functions as filters in queries - Mailing list pgsql-admin

From Chris Mungall
Subject Using functions as filters in queries
Date
Msg-id Pine.LNX.4.33.0303101155270.20377-100000@heartbroken.lbl.gov
Whole thread Raw
Responses Re: Using functions as filters in queries
List pgsql-admin
I have a problem that can be reduced to this equivalent but simpler
problem:

Case 1:

CREATE TABLE t (
    n int,
    x varchar(32)
);
CREATE INDEX ti ON t(n);
CREATE INDEX tx ON t(x);
<insert 100k rows of data, random words into x>
VACUUM ANALYZE;
EXPLAIN ANALYZE select * from t where n=5 AND x like 'a%';

 Index Scan using ti on t  (cost=0.00..3.02 rows=1 width=15) (actual
time=0.12..0.12 rows=0 loops=1)
   Index Cond: (n = 5)
   Filter: (x ~~ 'a%'::text)
 Total runtime: 0.16 msec

This is perfect - ti is used for indexing.

Case 2:

Now I want to replace the "n=5" clause with a function:

CREATE FUNCTION f(t, int) RETURNS bool AS
 'SELECT $1.n = $2'
LANGUAGE 'sql';

EXPLAIN ANALYZE select * from t where f(t, 5) AND x like 'a%';

 Seq Scan on t  (cost=0.00..1161.25 rows=436 width=15) (actual
time=265.04..265.04 rows=0 loops=1)
   Filter: (f(t.*, 5) AND (x ~~ 'a%'::text))
 Total runtime: 265.08 msec

If I set enable_seqscan=0, I get this:

 Seq Scan on t  (cost=0.00..1161.25 rows=436 width=15) (actual
time=262.45..262.45 rows=0 loops=1)
   Filter: (f(t.*, 5) AND (x ~~ 'a%'::text))
 Total runtime: 262.49 msec

Same thing.

Now looking at the two cases (without and with function) I can see that
they are equivalent, but Pg treats them differently. I guess it is failing
to distiguish between two cases - if the function has no FROM clause, then
it is a simple case of variable substitution into the original WHERE
clause. If it does include a FROM clause, it's not so simple and
optimisation is hard, so the function becomes the filter.

I notice that internal functions (eg @ on boxes/points) are not treated
this way. is there any way to get my function treated like an internal
function, or is not as simple as that?

You could argue that my function is pointless and I could simply do the
replacement in the application layer that calls the SQL. This is true, but
with my full example I would like to hide some aspects of the physical
layer behind a nice SQL/function logical layer.


pgsql-admin by date:

Previous
From: byron@cc.gatech.edu (Byron A Jeff)
Date:
Subject: Re: Largest filesize under Linux
Next
From: kaustin@advance.net (girlyDBA)
Date:
Subject: keys allowed in child which do not exist in parent table when restoring using pg_restore