Re: Fragments in tsearch2 headline - Mailing list pgsql-general

From Sushant Sinha
Subject Re: Fragments in tsearch2 headline
Date
Msg-id 9fb559330710301011n77ef2544n4ef73dfce3177ac4@mail.gmail.com
Whole thread Raw
In response to Re: Fragments in tsearch2 headline  (Oleg Bartunov <oleg@sai.msu.su>)
Responses Re: Fragments in tsearch2 headline  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-general
This is a nice idea and seems easy to implement. I will try to write
it down and send a patch to the mailing list.

I was also working to add support for phrase search. Currently to
check for phrase you have to match the entire document. It will be
better if a filter like are_words_consecutive(tsvector *t, tsquery *q)
can be added to reduce the number of matching documents before we
actually do the phrase search. Do you think this will improve the
performance of phrase search?  If so I will like to write this
function and send a patch.

-Sushant.

On 10/30/07, Oleg Bartunov <oleg@sai.msu.su> wrote:
> On Tue, 30 Oct 2007, Catalin Marinas wrote:
>
> > On 30/10/2007, Richard Huxton <dev@archonet.com> wrote:
> >> Oleg Bartunov wrote:
> >>> Catalin,
> >>>
> >>> what is your need ? What's wrong with this ?
> >>>
> >>> postgres=# select ts_headline('1 2 3 4 5 3 4 abc abc 2 3
> >>> xyz','2'::tsquery, 'StartSel=...,StopSel=...')
> >>> ;
> >>>                 ts_headline
> >>> -------------------------------------------
> >>>  1 ...2... 3 4 5 3 4 abc abc ...2... 3 xyz
> >>
> >> I think he want's something like: "1 2 3 ... abc 2 3 ..."
> >>
> >> A few characters of context around each match and then ... between. Kind
> >> of like grep -C.
> >
> > That's pretty much correct (with the difference that I'd like context
> > of words rather than lines as in "grep" and StartSel=<b>,
> > StopSel=</b>).
> >
> > Since the text I want a headline for might be pretty long (tens of
> > lines), I'd like to only show the excerpts around the matching words.
> > Similar to the above example:
> >
> > select ts_headline('1 2 3 4 5 3 4 abc x y z 2 3', '2 & abc'::tsquery);
> >
> > should give:
> >
> > '1 <b>2</b> 3 4 ... 3 4 <b>abc</b> x y'
> >
> > Currently, if you limit the maximum words so that 'abc' is too far, it
> > only highlights the first match.
>
> ok, then you have to formalize many things - how long should be excerpts,
> how much excerpts to show, etc. In tsearch2 we have get_covers() function,
> which produces all excerpts like:
>
> =# select get_covers(to_tsvector('1 2 3 4 5 3 4 abc x y z 2 3'),
> '2&3'::tsquery);
>                     get_covers
> ------------------------------------------------
>   1 {1 2 3 }1 4 5 {2 3 4 abc x y z {3 2 }2 3 }3
> (1 row)
>
> Once you formalize your requirements, you can look on it and adapt to your
> needs (and share with people). I think it could be nice contrib module.
>
>
> >
> > Many of the search engines (including google) show the headline this
> > way. I think Lucene can do this as well but I've never used it to be
> > sure.
> >
> >
>
>      Regards,
>          Oleg
> _____________________________________________________________
> Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
> Sternberg Astronomical Institute, Moscow University, Russia
> Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
> phone: +007(495)939-16-83, +007(495)939-23-83
>
> ---------------------------(end of broadcast)---------------------------
> TIP 6: explain analyze is your friend
>

pgsql-general by date:

Previous
From: Howard Cole
Date:
Subject: Table has type character varying, but query expects character varying
Next
From: Tom Lane
Date:
Subject: Re: Table has type character varying, but query expects character varying