Thread: Full text search, SQL functions, and the planner

Full text search, SQL functions, and the planner

From
Marc Dahn
Date:
Hi,

To maintain an externally defined interface, I'd like to define a
function hasword(haystack, needle) that (essentially) returns 1 when
to_tsvector(haystack) @@ to_tsquery(needle), 0 otherwise.

I've tried

    CREATE OR REPLACE FUNCTION ivo_hasword(haystack TEXT, needle TEXT)
    RETURNS INTEGER AS $func$
        SELECT CASE WHEN to_tsvector('english', $1) @@ plainto_tsquery($2)
            THEN 1
            ELSE 0
        END
    $func$ LANGUAGE SQL STABLE;

It seems the planner inlines the function body, as intended:

# explain select * from rr.resource where 1=ivo_hasword(res_title, 'optical');
...
 Seq Scan on resource  (cost=0.00..2269.72 rows=69 width=924)
   Filter: (1 = CASE WHEN (to_tsvector('english'::regconfig, res_title) @@ plainto_tsquery('optical'::text)) THEN 1
ELSE0 END) 

-- but it doesn't use the index on res_title, as it could, as
exhibited by this equivalent query:

# explain select * from rr.resource where to_tsvector('english'::regconfig, res_title) @@
plainto_tsquery('optical'::text);
...
 Bitmap Heap Scan on resource  (cost=21.96..731.76 rows=252 width=924)
   Recheck Cond: (to_tsvector('english'::regconfig, res_title) @@ plainto_tsquery('optical'::text))
   ->  Bitmap Index Scan on resource_res_title  (cost=0.00..21.89 rows=252 width=0)
         Index Cond: (to_tsvector('english'::regconfig, res_title) @@ plainto_tsquery('optical'::text))

Indeed, the index isn't used either when I'm writing the 1 = CASE WHEN
statement into the query directly (this is on postgres 9.1).

Sure enough, if I define the function like this:

CREATE OR REPLACE FUNCTION bool_hasword(haystack TEXT, needle TEXT)
    RETURNS BOOLEAN AS $func$
        SELECT to_tsvector('english', $1) @@ plainto_tsquery($2)
    $func$ LANGUAGE SQL STABLE;

(i.e., returning a boolean instead of the lousy integer), the index
is used.

So -- is there a way to let the planner look through the CASE?  Or
some way of turning the boolean to an integer that the planner can
see through?  The CAST(... AS INTEGER) that fortunately would even
produce the right numbers appears to be opaque to the planner.


Anticipating the sane and logical answer: The boolean function
doesn't really help me; this is about handing through that function
directly to ADQL (http://www.ivoa.net/documents/latest/ADQL.html)
as a user defined function, and those user defined functions cannot
be boolean-valued.

Cheers,

           Markus




Re: Full text search, SQL functions, and the planner

From
Tom Lane
Date:
Marc Dahn <dahn@tfiu.de> writes:
> So -- is there a way to let the planner look through the CASE?

No.  It would actually be wrong to simplify the expression in the way
you're imagining, since "1 = case when bool_expr then 1 else 0 end"
does not give the same result as the plain bool_expr if the latter
yields null.

If you're sufficiently desperate, though, you might consider some hack
like this:

regression=# CREATE OR REPLACE FUNCTION b_hasword(haystack TEXT, needle TEXT)
RETURNS boolean AS $func$
SELECT to_tsvector('english', $1) @@ plainto_tsquery($2)
$func$ LANGUAGE SQL STABLE;
CREATE FUNCTION
regression=# create function inteqbool(int,bool) returns bool as
regression-# $$select $1::bool = $2 $$ language sql stable;
CREATE FUNCTION
regression=# CREATE OPERATOR = (procedure = inteqbool, leftarg=int, rightarg=bool);
CREATE OPERATOR
regression=# explain select * from resource where 1=b_hasword(res_title, 'optical');
                                               QUERY PLAN
--------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on resource  (cost=4.20..14.38 rows=7 width=32)
   Recheck Cond: (to_tsvector('english'::regconfig, res_title) @@ plainto_tsquery('optical'::text))
   ->  Bitmap Index Scan on resource_to_tsvector_idx  (cost=0.00..4.20 rows=7 width=0)
         Index Cond: (to_tsvector('english'::regconfig, res_title) @@ plainto_tsquery('optical'::text))
(4 rows)

which relies on the fact that the planner *will* simplify "boolexpr = true"
to just "boolexpr", so that after a couple of rounds of inlining and
simplification we get to just the bare @@ expression.  But aside from
the time spent doing that, this approach could have unpleasant side
effects in the form of causing "int = bool" expressions to be accepted
generally, thus masking errors.  Might be better to fix your client-side
code.

            regards, tom lane