Thread: Aggregating tsqueries

Aggregating tsqueries

From
Alexander Hill
Date:
Hello,

I have a table of tree nodes with a tsquery column. To get a subtree's tsquery, I need to OR all of its nodes' tsqueries together.

I defined a custom aggregate using tsquery_or:

    CREATE AGGREGATE tsquery_or_agg (tsquery)
    (
        sfunc = tsquery_or,
        stype = tsquery
    );

but I've found that

    tsquery_or_agg(query)

is about a hundred times slower than this:

    ('(' || string_agg(query::text, ')|(') || ')')::tsquery

That works perfectly so I'm happy to continue doing it, but I'm curious to know why the difference is so great and if anything can be done about it?

Cheers,
Alex

Re: Aggregating tsqueries

From
Heikki Linnakangas
Date:
On 09/17/2014 07:56 AM, Alexander Hill wrote:
> Hello,
>
> I have a table of tree nodes with a tsquery column. To get a subtree's
> tsquery, I need to OR all of its nodes' tsqueries together.
>
> I defined a custom aggregate using tsquery_or:
>
>      CREATE AGGREGATE tsquery_or_agg (tsquery)
>      (
>          sfunc = tsquery_or,
>          stype = tsquery
>      );
>
> but I've found that
>
>      tsquery_or_agg(query)
>
> is about a hundred times slower than this:
>
>      ('(' || string_agg(query::text, ')|(') || ')')::tsquery
>
> That works perfectly so I'm happy to continue doing it, but I'm curious to
> know why the difference is so great and if anything can be done about it?

string_agg's state transition function uses a buffer that's expanded as
needed. At every step, the next string is appended to the buffer. Your
custom aggregate is less efficient, because it constructs a new tsquery
object at every step. In every step, a new tsquery object is allocated
and the old result and the next source tsquery are copied to it. That's
much more expensive.

If you're not shy of writing C code, you could write a more efficient
version of tsquery_or_agg too, using a similar technique.

- Heikki