Thread: Aggregating tsqueries
Hello,
I have a table of tree nodes with a tsquery column. To get a subtree's tsquery, I need to OR all of its nodes' tsqueries together.
I defined a custom aggregate using tsquery_or:
CREATE AGGREGATE tsquery_or_agg (tsquery)
(
sfunc = tsquery_or,
stype = tsquery
);
I defined a custom aggregate using tsquery_or:
CREATE AGGREGATE tsquery_or_agg (tsquery)
(
sfunc = tsquery_or,
stype = tsquery
);
but I've found that
tsquery_or_agg(query)
is about a hundred times slower than this:
('(' || string_agg(query::text, ')|(') || ')')::tsquery
tsquery_or_agg(query)
is about a hundred times slower than this:
('(' || string_agg(query::text, ')|(') || ')')::tsquery
That works perfectly so I'm happy to continue doing it, but I'm curious to know why the difference is so great and if anything can be done about it?
Cheers,
Alex
On 09/17/2014 07:56 AM, Alexander Hill wrote: > Hello, > > I have a table of tree nodes with a tsquery column. To get a subtree's > tsquery, I need to OR all of its nodes' tsqueries together. > > I defined a custom aggregate using tsquery_or: > > CREATE AGGREGATE tsquery_or_agg (tsquery) > ( > sfunc = tsquery_or, > stype = tsquery > ); > > but I've found that > > tsquery_or_agg(query) > > is about a hundred times slower than this: > > ('(' || string_agg(query::text, ')|(') || ')')::tsquery > > That works perfectly so I'm happy to continue doing it, but I'm curious to > know why the difference is so great and if anything can be done about it? string_agg's state transition function uses a buffer that's expanded as needed. At every step, the next string is appended to the buffer. Your custom aggregate is less efficient, because it constructs a new tsquery object at every step. In every step, a new tsquery object is allocated and the old result and the next source tsquery are copied to it. That's much more expensive. If you're not shy of writing C code, you could write a more efficient version of tsquery_or_agg too, using a similar technique. - Heikki