Re: ts_headline - Mailing list pgsql-general
From | Oleg Bartunov |
---|---|
Subject | Re: ts_headline |
Date | |
Msg-id | Pine.LNX.4.64.0802221523340.31180@sn.sai.msu.ru Whole thread Raw |
In response to | Re: ts_headline (Stephen Davies <scldad@sdc.com.au>) |
Responses |
Re: ts_headline
|
List | pgsql-general |
On Fri, 22 Feb 2008, Stephen Davies wrote: > Hmmmm! > I think I now understand the ts position better, thank you. > > Part of my problem has been that I am used to the functionality of Open Text's > LCS (aka BASIS) product which handles text differently. > > It includes the position (and context) information in the index and does > "remember" how the text was parsed so does not need to reparse to insert hit > navigation tags nor need pointers as to how to parse queries. (It also > supports phrase searching.) > > Now that I have a better understanding of ts, I think I will be able to make > it do at least most of what I hoped for. I'm wondering if it was not described in the text search documentation :) > > Thank you again for your help with this. > > Cheers, > Stephen Davies > > On Friday 22 February 2008 20:45, Richard Huxton wrote: >> Stephen Davies wrote: >>> Unfortunately, my link to the box with the test database is down due to >>> lack of maintenance by our local telco (Telstra) but I think that I also >>> missed the optional config arg to ts_headline. >>> >>> The lack of link also means that I cannot confirm your findings but your >>> logic looks good. >> >> Looks like ALTER DATABASE SET default_text_config='english' is what you >> need. >> >>> It begs the question, however, as to why ts-headline needs to reparse the >>> raw text. >> >> It needs to line up tsvector lexemes with actual characters in the text. >> The tsvector is missing punctuation, any stopwords (the, it, a) as well >> as being stemmed (if your dictionary does that). >> >> Also, it's looking for a short span of words that provide the best >> match. That might not be a complete match of course, and is different to >> how you'd normally look to use a tsvector. >> >>> At least in my case, I am using a trigger to parse the combination of >>> Title and Abstract to a ts_vector field in the table row (as suggested in >>> 12.2.2 and 12.4.3 in the doco) so that the ts_vector is already available >>> to ts_headline. >>> >>> If ts_headline had the ability to use that pre-parsed ts_vector, my >>> problem would never have arisen - and the performance of ts_headline >>> would be improved. >> >> Maybe. It would still have to parse the text to some degree though, just >> to get the original words & punctuation into the headline. > > Regards, Oleg _____________________________________________________________ Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru), Sternberg Astronomical Institute, Moscow University, Russia Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ phone: +007(495)939-16-83, +007(495)939-23-83
pgsql-general by date: