ts_rewrite aggregate API seems mighty ugly - Mailing list pgsql-hackers
From | Tom Lane |
---|---|
Subject | ts_rewrite aggregate API seems mighty ugly |
Date | |
Msg-id | 14669.1192674607@sss.pgh.pa.us Whole thread Raw |
Responses |
Re: ts_rewrite aggregate API seems mighty ugly
|
List | pgsql-hackers |
While working on the documentation it's finally sunk into me what ts_rewrite is all about, and I'm not very happy. The simple three-argument form is fine, it's basically a tsquery-specific version of replace(): regression=# select ts_rewrite('a & b'::tsquery, 'a'::tsquery, 'foo|bar'::tsquery); ts_rewrite -------------------------'b' & ( 'foo' | 'bar' ) (1 row) (BTW, would it be possible to get it to not randomly rearrange the order of the clauses? That makes it quite confusing to understand what it's doing, and I don't offhand see a reason for that to happen.) And the two-argument form is okay, though a bit hard to wrap your head around --- the second argument is the text of a SQL query that returns successive second and third arguments for successive ts_rewrite operations on the first argument. For instance, regression=# create table ts_subst(target tsquery, subst tsquery); CREATE TABLE regression=# insert into ts_subst values('a', 'foo|bar'); INSERT 0 1 regression=# insert into ts_subst values('b', 'baz'); INSERT 0 1 regression=# select ts_rewrite('a & b'::tsquery, 'select target, subst from ts_subst'); ts_rewrite ---------------------------'baz' & ( 'foo' | 'bar' ) (1 row) The point of this of course is to be able to apply many different substitutions stored in a table. However, the aggregate form seems just plain weird: regression=# select ts_rewrite(array['a & b'::tsquery, target, subst]) from ts_subst; ts_rewrite ---------------------------'baz' & ( 'baz' | 'foo' ) (1 row) The assumption here is that target and substitute come from successive rows of the table, and again we're trying to map an original tsquery through a bunch of different replace() substitutions. The array[] bit is ugly and inefficient. I suppose it's a holdover from days when aggregates could only take one argument. Now that we have multiple-argument aggregates it would seem to make more sense to express the thing as a three-argument aggregate, except that then we'd have to give it a different name to distinguish it from the basic three-argument non-aggregate function. Another thing that I find ugly about this is that the 'original' tsquery is meaningful only on the first accumulation cycle; after that it's ignored, but the second and third array elements do have meaning. That's inconsistent. But the killer problem is that the aggregate form actually does the Wrong Thing if there are no rows to aggregate over --- it'll return NULL, not the original tsquery as you'd want. This is not too improbable if you follow the documentation's recommendation to filter the rows using an @> operator: regression=# select ts_rewrite(array['a & b'::tsquery, target, subst]) from ts_subst where 'a & b'::tsquery @> target; ts_rewrite ---------------------------'baz' & ( 'bar' | 'foo' ) (1 row) regression=# select ts_rewrite(array['quux'::tsquery, target, subst]) from ts_subst where 'quux'::tsquery @> target;ts_rewrite ------------ (1 row) So it seems to me this is just wrong, and the only safe way to do it is to use the two-argument form, with the selection from the table happening as a SPI query. Since we're already committed to an initdb for beta2, it's not quite too late to reconsider the API here. My feeling at the moment is that we should just drop the aggregate form of ts_rewrite; it does nothing you can't do better with the two-argument form, and it is just ugly to boot. Also, if anyone does come up with a not-so-ugly design later, we can always add things in 8.4 or beyond; but once it's in core it's going to be a very hard sell to take it out. Comments? regards, tom lane
pgsql-hackers by date: