Thread: a little fix for text search
Hi there !
I don't know when exactly it was improved, but following notice in
https://www.postgresql.org/docs/current/static/textsearch-controls.html#TEXTSEARCH-HEADLINE
is currently not needed. I don't know when exactly it was improved, but following notice in
https://www.postgresql.org/docs/current/static/textsearch-controls.html#TEXTSEARCH-HEADLINE
ts_headline
uses the original document, not a tsvector summary, so it can be slow and should be used with care. A typical mistake is to call ts_headline
for every matching document when only ten documents are to be shown. SQL subqueries can help; here is an example:Regards,
Oleg
Oleg Bartunov <obartunov@gmail.com> writes: > Hi there ! > I don't know when exactly it was improved, but following notice in > https://www.postgresql.org/docs/current/static/textsearch-controls.html#TEXTSEARCH-HEADLINE > is currently not needed. > ts_headline uses the original document, not a tsvector summary, so it can > be slow and should be used with care. A typical mistake is to call > ts_headline for every matching document when only ten documents are to be > shown. SQL subqueries can help; here is an example: I don't see why that stopped being appropriate? The point is that it takes a raw text input which has to be re-parsed; that's still true AFAICS. regards, tom lane
On Sat, Nov 12, 2016 at 11:49 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Oleg Bartunov <obartunov@gmail.com> writes:
> Hi there !
> I don't know when exactly it was improved, but following notice in
> https://www.postgresql.org/docs/current/static/ textsearch-controls.html# TEXTSEARCH-HEADLINE
> is currently not needed.
> ts_headline uses the original document, not a tsvector summary, so it can
> be slow and should be used with care. A typical mistake is to call
> ts_headline for every matching document when only ten documents are to be
> shown. SQL subqueries can help; here is an example:
I don't see why that stopped being appropriate? The point is that it
takes a raw text input which has to be re-parsed; that's still true
AFAICS.
I mean that in the past we recommended to use subselect to avoid extra ts_headline() call, which now, at least at 9.6, it's obsoleted and two sql queries call ts_headline() exactly 5 times.
select ts_headline(body,to_tsquery('supernovae & x-ray')),ts_rank(fts,to_tsquery('supernovae & x-ray')) as rank
from apod
where fts @@ to_tsquery('supernovae & x-ray') order by rank desc limit 5;
explain (analyze, costs off) select ts_headline(body,to_tsquery('supernovae & x-ray')), rank from (
select body, ts_rank(fts,to_tsquery('supernovae & x-ray')) as rank from apod where fts @@ to_tsquery('supernovae & x-ray')
order by rank desc limit 5
) as foo;
select ts_headline(body,to_tsquery('supernovae & x-ray')),ts_rank(fts,to_tsquery('supernovae & x-ray')) as rank
from apod
where fts @@ to_tsquery('supernovae & x-ray') order by rank desc limit 5;
select ts_headline(body,to_tsquery('supernovae & x-ray')),ts_rank(fts,to_tsquery('supernovae & x-ray')) as rank
from apod
where fts @@ to_tsquery('supernovae & x-ray') order by rank desc limit 5;
explain (analyze, costs off) select ts_headline(body,to_tsquery('supernovae & x-ray')), rank from (
select body, ts_rank(fts,to_tsquery('supernovae & x-ray')) as rank from apod where fts @@ to_tsquery('supernovae & x-ray')
order by rank desc limit 5
) as foo;
select ts_headline(body,to_tsquery('supernovae & x-ray')),ts_rank(fts,to_tsquery('supernovae & x-ray')) as rank
from apod
where fts @@ to_tsquery('supernovae & x-ray') order by rank desc limit 5;
regards, tom lane
Oleg Bartunov <obartunov@gmail.com> writes: > On Sat, Nov 12, 2016 at 11:49 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> I don't see why that stopped being appropriate? The point is that it >> takes a raw text input which has to be re-parsed; that's still true >> AFAICS. > I mean that in the past we recommended to use subselect to avoid extra > ts_headline() call, which now, at least at 9.6, it's obsoleted and two sql > queries call ts_headline() exactly 5 times. Oh, I see your point: commit 9118d03a8 fixed the planner so you don't get extra evaluations of ts_headline() in this example. I think it's probably still appropriate to warn that ts_headline() is expensive, but yes, the specific example is obsolete. regards, tom lane