Thread: a little fix for text search

a little fix for text search

From
Oleg Bartunov
Date:
Hi there !

I don't know when exactly it was improved, but following notice in
https://www.postgresql.org/docs/current/static/textsearch-controls.html#TEXTSEARCH-HEADLINE
is currently not needed.

ts_headline uses the original document, not a tsvector summary, so it can be slow and should be used with care. A typical mistake is to call ts_headline for every matching document when only ten documents are to be shown. SQL subqueries can help; here is an example:

Regards,
Oleg

Re: a little fix for text search

From
Tom Lane
Date:
Oleg Bartunov <obartunov@gmail.com> writes:
> Hi there !
> I don't know when exactly it was improved, but following notice in
> https://www.postgresql.org/docs/current/static/textsearch-controls.html#TEXTSEARCH-HEADLINE
> is currently not needed.

> ts_headline uses the original document, not a tsvector summary, so it can
> be slow and should be used with care. A typical mistake is to call
> ts_headline for every matching document when only ten documents are to be
> shown. SQL subqueries can help; here is an example:

I don't see why that stopped being appropriate?  The point is that it
takes a raw text input which has to be re-parsed; that's still true
AFAICS.

            regards, tom lane


Re: a little fix for text search

From
Oleg Bartunov
Date:


On Sat, Nov 12, 2016 at 11:49 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Oleg Bartunov <obartunov@gmail.com> writes:
> Hi there !
> I don't know when exactly it was improved, but following notice in
> https://www.postgresql.org/docs/current/static/textsearch-controls.html#TEXTSEARCH-HEADLINE
> is currently not needed.

> ts_headline uses the original document, not a tsvector summary, so it can
> be slow and should be used with care. A typical mistake is to call
> ts_headline for every matching document when only ten documents are to be
> shown. SQL subqueries can help; here is an example:

I don't see why that stopped being appropriate?  The point is that it
takes a raw text input which has to be re-parsed; that's still true
AFAICS.

I mean that in the past we recommended to use subselect to avoid extra ts_headline() call, which now, at least at 9.6, it's obsoleted and two sql queries call ts_headline() exactly 5 times.

select ts_headline(body,to_tsquery('supernovae & x-ray')),ts_rank(fts,to_tsquery('supernovae & x-ray')) as rank
from apod
where fts  @@ to_tsquery('supernovae & x-ray') order by rank desc limit 5;
explain (analyze, costs off) select ts_headline(body,to_tsquery('supernovae & x-ray')), rank from (
  select body, ts_rank(fts,to_tsquery('supernovae & x-ray')) as rank from apod  where fts @@ to_tsquery('supernovae & x-ray')
  order by rank desc limit 5
) as foo;


select ts_headline(body,to_tsquery('supernovae & x-ray')),ts_rank(fts,to_tsquery('supernovae & x-ray')) as rank
from apod
where fts  @@ to_tsquery('supernovae & x-ray') order by rank desc limit 5;



                        regards, tom lane

Re: a little fix for text search

From
Tom Lane
Date:
Oleg Bartunov <obartunov@gmail.com> writes:
> On Sat, Nov 12, 2016 at 11:49 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> I don't see why that stopped being appropriate?  The point is that it
>> takes a raw text input which has to be re-parsed; that's still true
>> AFAICS.

> I mean that in the past we recommended to use subselect to avoid extra
> ts_headline() call, which now, at least at 9.6, it's obsoleted and two sql
> queries call ts_headline() exactly 5 times.

Oh, I see your point: commit 9118d03a8 fixed the planner so you don't get
extra evaluations of ts_headline() in this example.  I think it's probably
still appropriate to warn that ts_headline() is expensive, but yes, the
specific example is obsolete.

            regards, tom lane