Re: [GENERAL] Fragments in tsearch2 headline - Mailing list pgsql-hackers

From Sushant Sinha
Subject Re: [GENERAL] Fragments in tsearch2 headline
Date
Msg-id 1216249733.5842.5.camel@dragflick
Whole thread Raw
In response to Re: [GENERAL] Fragments in tsearch2 headline  (Oleg Bartunov <oleg@sai.msu.su>)
Responses Re: [GENERAL] Fragments in tsearch2 headline
List pgsql-hackers
I will add test queries and their results for the corner cases in a
separate file. I guess the only thing I am confused about is what should
be the behavior of headline generation when Query items have words of
size less than ShortWord. I guess the answer is to ignore ShortWord
parameter but let me know if the answer is any different.

-Sushant.
On Thu, 2008-07-17 at 02:53 +0400, Oleg Bartunov wrote:
> Sushant,
> 
> first, please, provide simple test queries, which demonstrate the right work
> in the corner cases. This will helps reviewers to test your patch and
> helps you to make sure your new version is ok. For example:
> 
> =# select ts_headline('1 2 3 4 5 1 2 3 1','1&3'::tsquery);
>                       ts_headline
> ------------------------------------------------------
>   <b>1</b> 2 <b>3</b> 4 5 <b>1</b> 2 <b>3</b> <b>1</b>
> 
> This select breaks your code:
> 
> =# select ts_headline('1 2 3 4 5 1 2 3 1','1&3'::tsquery,'maxfragments=2');
>   ts_headline
> --------------
>   ...  2 ...
> 
> and so on ....
> 
> 
> Oleg
> On Tue, 15 Jul 2008, Sushant Sinha wrote:
> 
> > Attached a new patch that:
> >
> > 1. fixes previous bug
> > 2. better handles the case when cover size is greater than the MaxWords.
> > Basically it divides a cover greater than MaxWords into fragments of
> > MaxWords, resizes each such fragment so that each end of the fragment
> > contains a query word and then evaluates best fragments based on number of
> > query words in each fragment. In case of tie it picks up the smaller
> > fragment. This allows more query words to be shown with multiple fragments
> > in case a single cover is larger than the MaxWords.
> >
> > The resizing of a  fragment such that each end is a query word provides room
> > for stretching both sides of the fragment. This (hopefully) better presents
> > the context in which query words appear in the document. If a cover is
> > smaller than MaxWords then the cover is treated as a fragment.
> >
> > Let me know if you have any more suggestions or anything is not clear.
> >
> > I have not yet added the regression tests. The regression test suite seemed
> > to be only ensuring that the function works. How many tests should I be
> > adding? Is there any other place that I need to add different test cases for
> > the function?
> >
> > -Sushant.
> >
> >
> > Nice. But it will be good to resolve following issues:
> >> 1) Patch contains mistakes, I didn't investigate or carefully read it. Get
> >>
http://www.sai.msu.su/~megera/postgres/fts/apod.dump.gz<http://www.sai.msu.su/%7Emegera/postgres/fts/apod.dump.gz>and
loadin db.
 
> >>
> >> Queries
> >> # select ts_headline(body, plainto_tsquery('black hole'), 'MaxFragments=1')
> >> from apod where to_tsvector(body) @@ plainto_tsquery('black hole');
> >>
> >> and
> >>
> >> # select ts_headline(body, plainto_tsquery('black hole'), 'MaxFragments=1')
> >> from apod;
> >>
> >> crash postgresql :(
> >>
> >> 2) pls, include in your patch documentation and regression tests.
> >>
> >>
> >>> Another change that I was thinking:
> >>>
> >>> Right now if cover size > max_words then I just cut the trailing words.
> >>> Instead I was thinking that we should split the cover into more
> >>> fragments such that each fragment contains a few query words. Then each
> >>> fragment will not contain all query words but will show more occurrences
> >>> of query words in the headline. I would  like to know what your opinion
> >>> on this is.
> >>>
> >>
> >> Agreed.
> >>
> >>
> >> --
> >> Teodor Sigaev                                   E-mail: teodor@sigaev.ru
> >>                                                   WWW:
> >> http://www.sigaev.ru/
> >>
> >
> 
>      Regards,
>          Oleg
> _____________________________________________________________
> Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
> Sternberg Astronomical Institute, Moscow University, Russia
> Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
> phone: +007(495)939-16-83, +007(495)939-23-83



pgsql-hackers by date:

Previous
From: Oleg Bartunov
Date:
Subject: Re: [GENERAL] Fragments in tsearch2 headline
Next
From: Oleg Bartunov
Date:
Subject: Re: [GENERAL] Fragments in tsearch2 headline