ts_rewrite aggregate API seems mighty ugly - Mailing list pgsql-hackers

From Tom Lane
Subject ts_rewrite aggregate API seems mighty ugly
Date
Msg-id 14669.1192674607@sss.pgh.pa.us
Whole thread Raw
Responses Re: ts_rewrite aggregate API seems mighty ugly
List pgsql-hackers
While working on the documentation it's finally sunk into me what
ts_rewrite is all about, and I'm not very happy.  The simple
three-argument form is fine, it's basically a tsquery-specific
version of replace():

regression=# select ts_rewrite('a & b'::tsquery, 'a'::tsquery, 'foo|bar'::tsquery);      ts_rewrite        
-------------------------'b' & ( 'foo' | 'bar' )
(1 row)

(BTW, would it be possible to get it to not randomly rearrange the order
of the clauses?  That makes it quite confusing to understand what it's
doing, and I don't offhand see a reason for that to happen.)

And the two-argument form is okay, though a bit hard to wrap your head
around --- the second argument is the text of a SQL query that returns
successive second and third arguments for successive ts_rewrite
operations on the first argument.  For instance,

regression=# create table ts_subst(target tsquery, subst tsquery);
CREATE TABLE
regression=# insert into ts_subst values('a', 'foo|bar');
INSERT 0 1
regression=# insert into ts_subst values('b', 'baz');
INSERT 0 1
regression=# select ts_rewrite('a & b'::tsquery, 'select target, subst from ts_subst');       ts_rewrite         
---------------------------'baz' & ( 'foo' | 'bar' )
(1 row)

The point of this of course is to be able to apply many different
substitutions stored in a table.

However, the aggregate form seems just plain weird:

regression=# select ts_rewrite(array['a & b'::tsquery, target, subst]) from ts_subst;       ts_rewrite         
---------------------------'baz' & ( 'baz' | 'foo' )
(1 row)

The assumption here is that target and substitute come from successive
rows of the table, and again we're trying to map an original tsquery
through a bunch of different replace() substitutions.  The array[] bit
is ugly and inefficient.  I suppose it's a holdover from days when
aggregates could only take one argument.  Now that we have
multiple-argument aggregates it would seem to make more sense to express
the thing as a three-argument aggregate, except that then we'd have to
give it a different name to distinguish it from the basic three-argument
non-aggregate function.

Another thing that I find ugly about this is that the 'original'
tsquery is meaningful only on the first accumulation cycle; after that
it's ignored, but the second and third array elements do have meaning.
That's inconsistent.

But the killer problem is that the aggregate form actually does the
Wrong Thing if there are no rows to aggregate over --- it'll return
NULL, not the original tsquery as you'd want.  This is not too
improbable if you follow the documentation's recommendation to filter
the rows using an @> operator:

regression=# select ts_rewrite(array['a & b'::tsquery, target, subst]) from ts_subst where 'a & b'::tsquery @> target;
    ts_rewrite         
 
---------------------------'baz' & ( 'bar' | 'foo' )
(1 row)

regression=# select ts_rewrite(array['quux'::tsquery, target, subst]) from ts_subst where 'quux'::tsquery @>
target;ts_rewrite
 
------------
(1 row)

So it seems to me this is just wrong, and the only safe way to do it is
to use the two-argument form, with the selection from the table
happening as a SPI query.

Since we're already committed to an initdb for beta2, it's not quite too
late to reconsider the API here.  My feeling at the moment is that we
should just drop the aggregate form of ts_rewrite; it does nothing you
can't do better with the two-argument form, and it is just ugly to boot.
Also, if anyone does come up with a not-so-ugly design later, we can
always add things in 8.4 or beyond; but once it's in core it's going
to be a very hard sell to take it out.

Comments?
        regards, tom lane


pgsql-hackers by date:

Previous
From: Kenneth Marshall
Date:
Subject: Re: Hash index todo list item
Next
From: Tom Lane
Date:
Subject: ts_rewrite bug?