Re: ts_headline - Mailing list pgsql-general

From Oleg Bartunov
Subject Re: ts_headline
Date
Msg-id Pine.LNX.4.64.0802221523340.31180@sn.sai.msu.ru
Whole thread Raw
In response to Re: ts_headline  (Stephen Davies <scldad@sdc.com.au>)
Responses Re: ts_headline
List pgsql-general
On Fri, 22 Feb 2008, Stephen Davies wrote:

> Hmmmm!
> I think I now understand the ts position better, thank you.
>
> Part of my problem has been that I am used to the functionality of Open Text's
> LCS (aka BASIS) product which handles text differently.
>
> It includes the position (and context) information in the index and does
> "remember" how the text was parsed so does not need to reparse to insert hit
> navigation tags nor need pointers as to how to parse queries. (It also
> supports phrase searching.)
>
> Now that I have a better understanding of ts, I think I will be able to make
> it do at least most of what I hoped for.

I'm wondering if it was not described in the text search documentation :)


>
> Thank you again for your help with this.
>
> Cheers,
> Stephen Davies
>
> On Friday 22 February 2008 20:45, Richard Huxton wrote:
>> Stephen Davies wrote:
>>> Unfortunately, my link to the box with the test database is down due to
>>> lack of maintenance by our local telco (Telstra) but I think that I also
>>> missed the optional config arg to ts_headline.
>>>
>>> The lack of link also means that I cannot confirm your findings but your
>>> logic looks good.
>>
>> Looks like ALTER DATABASE SET default_text_config='english' is what you
>> need.
>>
>>> It begs the question, however, as to why ts-headline needs to reparse the
>>> raw text.
>>
>> It needs to line up tsvector lexemes with actual characters in the text.
>> The tsvector is missing punctuation, any stopwords (the, it, a) as well
>> as being stemmed (if your dictionary does that).
>>
>> Also, it's looking for a short span of words that provide the best
>> match. That might not be a complete match of course, and is different to
>> how you'd normally look to use a tsvector.
>>
>>> At least in my case, I am using a trigger to parse the combination of
>>> Title and Abstract to a ts_vector field in the table row (as suggested in
>>> 12.2.2 and 12.4.3 in the doco) so that the ts_vector is already available
>>> to ts_headline.
>>>
>>> If ts_headline had the ability to use that pre-parsed ts_vector, my
>>> problem would never have arisen - and the performance of ts_headline
>>> would be improved.
>>
>> Maybe. It would still have to parse the text to some degree though, just
>> to get the original words & punctuation into the headline.
>
>

     Regards,
         Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

pgsql-general by date:

Previous
From: Jorge Godoy
Date:
Subject: Re: need some help on figuring out how to write a query
Next
From: luca.ciciriello@email.it
Date:
Subject: