Re: [GENERAL] ts_headline - Mailing list pgsql-patches

From Bruce Momjian
Subject Re: [GENERAL] ts_headline
Date
Msg-id 200803040319.m243JsU23168@momjian.us
Whole thread Raw
List pgsql-patches
I have applied the attached documentation patch to show ts_headline()
using a configuration name.

---------------------------------------------------------------------------

Oleg Bartunov wrote:
> On Sat, 23 Feb 2008, Stephen Davies wrote:
>
> > As it turns out, all I needed was in the doco but the key element - the first
> > config arg to ts_headline - was not in any of the examples so I missed it.
>
> aha, Original one were based on default
> configuration, but then concept was changed, but the examples were not
> modified.
>
> >
> > Would it be possible for ts_headline to work with the pre-parsed ts_vector?
>
> it's impossible, Richard already explained you the reasons.
>
> >
> > I see references to future plans for phrase searching in ts. Is there a date
> > for this?
>
> Not yet. The problem mostly algebraical :) Simple 'exact search' is doable, but
> we need something more, since we support boolean operators,
> pluggable dictionaries (which could produce several lexemes, for example),
> and document structure (lexem weights). So, we need to define consistent
> algebra for text, to have predictable results. This is quite a complex task,
> which require a lot of dedicated time, which we don't have.
>
> >
> > Cheers and thanks,
> > Stephen
> > Davies
> >
> >
> > On Friday 22 February 2008 22:54, Oleg Bartunov wrote:
> >> On Fri, 22 Feb 2008, Stephen Davies wrote:
> >>> Hmmmm!
> >>> I think I now understand the ts position better, thank you.
> >>>
> >>> Part of my problem has been that I am used to the functionality of Open
> >>> Text's LCS (aka BASIS) product which handles text differently.
> >>>
> >>> It includes the position (and context) information in the index and does
> >>> "remember" how the text was parsed so does not need to reparse to insert
> >>> hit navigation tags nor need pointers as to how to parse queries. (It
> >>> also supports phrase searching.)
> >>>
> >>> Now that I have a better understanding of ts, I think I will be able to
> >>> make it do at least most of what I hoped for.
> >>
> >> I'm wondering if it was not described in the text search documentation :)
> >>
> >>> Thank you again for your help with this.
> >>>
> >>> Cheers,
> >>> Stephen Davies
> >>>
> >>> On Friday 22 February 2008 20:45, Richard Huxton wrote:
> >>>> Stephen Davies wrote:
> >>>>> Unfortunately, my link to the box with the test database is down due to
> >>>>> lack of maintenance by our local telco (Telstra) but I think that I
> >>>>> also missed the optional config arg to ts_headline.
> >>>>>
> >>>>> The lack of link also means that I cannot confirm your findings but
> >>>>> your logic looks good.
> >>>>
> >>>> Looks like ALTER DATABASE SET default_text_config='english' is what you
> >>>> need.
> >>>>
> >>>>> It begs the question, however, as to why ts-headline needs to reparse
> >>>>> the raw text.
> >>>>
> >>>> It needs to line up tsvector lexemes with actual characters in the text.
> >>>> The tsvector is missing punctuation, any stopwords (the, it, a) as well
> >>>> as being stemmed (if your dictionary does that).
> >>>>
> >>>> Also, it's looking for a short span of words that provide the best
> >>>> match. That might not be a complete match of course, and is different to
> >>>> how you'd normally look to use a tsvector.
> >>>>
> >>>>> At least in my case, I am using a trigger to parse the combination of
> >>>>> Title and Abstract to a ts_vector field in the table row (as suggested
> >>>>> in 12.2.2 and 12.4.3 in the doco) so that the ts_vector is already
> >>>>> available to ts_headline.
> >>>>>
> >>>>> If ts_headline had the ability to use that pre-parsed ts_vector, my
> >>>>> problem would never have arisen - and the performance of ts_headline
> >>>>> would be improved.
> >>>>
> >>>> Maybe. It would still have to parse the text to some degree though, just
> >>>> to get the original words & punctuation into the headline.
> >>
> >>      Regards,
> >>          Oleg
> >> _____________________________________________________________
> >> Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
> >> Sternberg Astronomical Institute, Moscow University, Russia
> >> Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
> >> phone: +007(495)939-16-83, +007(495)939-23-83
> >
> >
>
>      Regards,
>          Oleg
> _____________________________________________________________
> Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
> Sternberg Astronomical Institute, Moscow University, Russia
> Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
> phone: +007(495)939-16-83, +007(495)939-23-83
>
> ---------------------------(end of broadcast)---------------------------
> TIP 2: Don't 'kill -9' the postmaster

--
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://postgres.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +
Index: doc/src/sgml/textsearch.sgml
===================================================================
RCS file: /cvsroot/pgsql/doc/src/sgml/textsearch.sgml,v
retrieving revision 1.40
diff -c -c -r1.40 textsearch.sgml
*** doc/src/sgml/textsearch.sgml    13 Dec 2007 06:32:47 -0000    1.40
--- doc/src/sgml/textsearch.sgml    4 Mar 2008 02:55:17 -0000
***************
*** 1102,1108 ****
      For example:

  <programlisting>
! SELECT ts_headline('The most common type of search
  is to find all documents containing given query terms
  and return them in order of their similarity to the
  query.', to_tsquery('query & similarity'));
--- 1102,1108 ----
      For example:

  <programlisting>
! SELECT ts_headline('english', 'The most common type of search
  is to find all documents containing given query terms
  and return them in order of their similarity to the
  query.', to_tsquery('query & similarity'));
***************
*** 1112,1118 ****
   and return them in order of their <b>similarity</b> to the
   <b>query</b>.

! SELECT ts_headline('The most common type of search
  is to find all documents containing given query terms
  and return them in order of their similarity to the
  query.',
--- 1112,1118 ----
   and return them in order of their <b>similarity</b> to the
   <b>query</b>.

! SELECT ts_headline('english', 'The most common type of search
  is to find all documents containing given query terms
  and return them in order of their similarity to the
  query.',

pgsql-patches by date:

Previous
From: "Brendan Jurd"
Date:
Subject: Re: [GENERAL] Empty arrays with ARRAY[]
Next
From: Bruce Momjian
Date:
Subject: Re: [BUGS] Incomplete docs for restore_command for hot standby