Re: Fragments in tsearch2 headline - Mailing list pgsql-general
From | Sushant Sinha |
---|---|
Subject | Re: Fragments in tsearch2 headline |
Date | |
Msg-id | 9fb559330710301011n77ef2544n4ef73dfce3177ac4@mail.gmail.com Whole thread Raw |
In response to | Re: Fragments in tsearch2 headline (Oleg Bartunov <oleg@sai.msu.su>) |
Responses |
Re: Fragments in tsearch2 headline
(Tom Lane <tgl@sss.pgh.pa.us>)
|
List | pgsql-general |
This is a nice idea and seems easy to implement. I will try to write it down and send a patch to the mailing list. I was also working to add support for phrase search. Currently to check for phrase you have to match the entire document. It will be better if a filter like are_words_consecutive(tsvector *t, tsquery *q) can be added to reduce the number of matching documents before we actually do the phrase search. Do you think this will improve the performance of phrase search? If so I will like to write this function and send a patch. -Sushant. On 10/30/07, Oleg Bartunov <oleg@sai.msu.su> wrote: > On Tue, 30 Oct 2007, Catalin Marinas wrote: > > > On 30/10/2007, Richard Huxton <dev@archonet.com> wrote: > >> Oleg Bartunov wrote: > >>> Catalin, > >>> > >>> what is your need ? What's wrong with this ? > >>> > >>> postgres=# select ts_headline('1 2 3 4 5 3 4 abc abc 2 3 > >>> xyz','2'::tsquery, 'StartSel=...,StopSel=...') > >>> ; > >>> ts_headline > >>> ------------------------------------------- > >>> 1 ...2... 3 4 5 3 4 abc abc ...2... 3 xyz > >> > >> I think he want's something like: "1 2 3 ... abc 2 3 ..." > >> > >> A few characters of context around each match and then ... between. Kind > >> of like grep -C. > > > > That's pretty much correct (with the difference that I'd like context > > of words rather than lines as in "grep" and StartSel=<b>, > > StopSel=</b>). > > > > Since the text I want a headline for might be pretty long (tens of > > lines), I'd like to only show the excerpts around the matching words. > > Similar to the above example: > > > > select ts_headline('1 2 3 4 5 3 4 abc x y z 2 3', '2 & abc'::tsquery); > > > > should give: > > > > '1 <b>2</b> 3 4 ... 3 4 <b>abc</b> x y' > > > > Currently, if you limit the maximum words so that 'abc' is too far, it > > only highlights the first match. > > ok, then you have to formalize many things - how long should be excerpts, > how much excerpts to show, etc. In tsearch2 we have get_covers() function, > which produces all excerpts like: > > =# select get_covers(to_tsvector('1 2 3 4 5 3 4 abc x y z 2 3'), > '2&3'::tsquery); > get_covers > ------------------------------------------------ > 1 {1 2 3 }1 4 5 {2 3 4 abc x y z {3 2 }2 3 }3 > (1 row) > > Once you formalize your requirements, you can look on it and adapt to your > needs (and share with people). I think it could be nice contrib module. > > > > > > Many of the search engines (including google) show the headline this > > way. I think Lucene can do this as well but I've never used it to be > > sure. > > > > > > Regards, > Oleg > _____________________________________________________________ > Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru), > Sternberg Astronomical Institute, Moscow University, Russia > Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ > phone: +007(495)939-16-83, +007(495)939-23-83 > > ---------------------------(end of broadcast)--------------------------- > TIP 6: explain analyze is your friend >
pgsql-general by date: