Re: [GENERAL] Fragments in tsearch2 headline - Mailing list pgsql-hackers
From | Sushant Sinha |
---|---|
Subject | Re: [GENERAL] Fragments in tsearch2 headline |
Date | |
Msg-id | 1216249733.5842.5.camel@dragflick Whole thread Raw |
In response to | Re: [GENERAL] Fragments in tsearch2 headline (Oleg Bartunov <oleg@sai.msu.su>) |
Responses |
Re: [GENERAL] Fragments in tsearch2 headline
|
List | pgsql-hackers |
I will add test queries and their results for the corner cases in a separate file. I guess the only thing I am confused about is what should be the behavior of headline generation when Query items have words of size less than ShortWord. I guess the answer is to ignore ShortWord parameter but let me know if the answer is any different. -Sushant. On Thu, 2008-07-17 at 02:53 +0400, Oleg Bartunov wrote: > Sushant, > > first, please, provide simple test queries, which demonstrate the right work > in the corner cases. This will helps reviewers to test your patch and > helps you to make sure your new version is ok. For example: > > =# select ts_headline('1 2 3 4 5 1 2 3 1','1&3'::tsquery); > ts_headline > ------------------------------------------------------ > <b>1</b> 2 <b>3</b> 4 5 <b>1</b> 2 <b>3</b> <b>1</b> > > This select breaks your code: > > =# select ts_headline('1 2 3 4 5 1 2 3 1','1&3'::tsquery,'maxfragments=2'); > ts_headline > -------------- > ... 2 ... > > and so on .... > > > Oleg > On Tue, 15 Jul 2008, Sushant Sinha wrote: > > > Attached a new patch that: > > > > 1. fixes previous bug > > 2. better handles the case when cover size is greater than the MaxWords. > > Basically it divides a cover greater than MaxWords into fragments of > > MaxWords, resizes each such fragment so that each end of the fragment > > contains a query word and then evaluates best fragments based on number of > > query words in each fragment. In case of tie it picks up the smaller > > fragment. This allows more query words to be shown with multiple fragments > > in case a single cover is larger than the MaxWords. > > > > The resizing of a fragment such that each end is a query word provides room > > for stretching both sides of the fragment. This (hopefully) better presents > > the context in which query words appear in the document. If a cover is > > smaller than MaxWords then the cover is treated as a fragment. > > > > Let me know if you have any more suggestions or anything is not clear. > > > > I have not yet added the regression tests. The regression test suite seemed > > to be only ensuring that the function works. How many tests should I be > > adding? Is there any other place that I need to add different test cases for > > the function? > > > > -Sushant. > > > > > > Nice. But it will be good to resolve following issues: > >> 1) Patch contains mistakes, I didn't investigate or carefully read it. Get > >> http://www.sai.msu.su/~megera/postgres/fts/apod.dump.gz<http://www.sai.msu.su/%7Emegera/postgres/fts/apod.dump.gz>and loadin db. > >> > >> Queries > >> # select ts_headline(body, plainto_tsquery('black hole'), 'MaxFragments=1') > >> from apod where to_tsvector(body) @@ plainto_tsquery('black hole'); > >> > >> and > >> > >> # select ts_headline(body, plainto_tsquery('black hole'), 'MaxFragments=1') > >> from apod; > >> > >> crash postgresql :( > >> > >> 2) pls, include in your patch documentation and regression tests. > >> > >> > >>> Another change that I was thinking: > >>> > >>> Right now if cover size > max_words then I just cut the trailing words. > >>> Instead I was thinking that we should split the cover into more > >>> fragments such that each fragment contains a few query words. Then each > >>> fragment will not contain all query words but will show more occurrences > >>> of query words in the headline. I would like to know what your opinion > >>> on this is. > >>> > >> > >> Agreed. > >> > >> > >> -- > >> Teodor Sigaev E-mail: teodor@sigaev.ru > >> WWW: > >> http://www.sigaev.ru/ > >> > > > > Regards, > Oleg > _____________________________________________________________ > Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru), > Sternberg Astronomical Institute, Moscow University, Russia > Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ > phone: +007(495)939-16-83, +007(495)939-23-83
pgsql-hackers by date: