Re: [GENERAL] Fragments in tsearch2 headline - Mailing list pgsql-hackers

From Teodor Sigaev
Subject Re: [GENERAL] Fragments in tsearch2 headline
Date
Msg-id 483BD4CB.3030006@sigaev.ru
Whole thread Raw
In response to Re: [GENERAL] Fragments in tsearch2 headline  (Sushant Sinha <sushant354@gmail.com>)
Responses Re: [GENERAL] Fragments in tsearch2 headline  (Sushant Sinha <sushant354@gmail.com>)
List pgsql-hackers
Hi!

> 1. Why is hlparsetext used to parse the document rather than the
> parsetext function? Since  words to be included in the headline will be
> marked afterwords, it seems more reasonable to just use the parsetext
> function.
> The main difference I see is the use of hlfinditem and marking whether
> some word is repeated.
hlparsetext preserves any kind of lexeme - not indexed, spaces etc. parsetext 
doesn't.
hlparsetext preserves original form of lexemes. parsetext doesn't.

> 
> The reason this is important is that hlparsetext does not seem to be
> storing word positions which parsetext does. The word positions are
> important for generating headline with fragments.
Doesn't needed - hlparsetext preserves the whole text, so, position is a number 
of array.

> 
> 2.
>> I would prefer the signature ts_headline( [regconfig,] text, tsquery
>> [,text] )and function should accept 'NumFragments=>N' for default
>> parser. Another parsers may use another options.
> 
> Does this mean we want a unified function ts_headline and we trigger the
> fragments if NumFragments is specified? 

Trigger should be inside parser-specific function (pg_ts_parser.prsheadline). 
Another parsers might not recognize that option.

> It seems that introducing a new
> function which can take configuration OID, or name is complex as there
> are so many functions handling these issues in wparser.c.
No, of course - ts_headline takes care about finding configuration and calling 
correct parser.

> 
> If this is true then we need to just  add marking of headline words in
> prsd_headline. Otherwise we will need another prsd_headline_with_covers
> function.
Yeah, pg_ts_parser.prsheadline should mark the lexemes to. It even can  change 
an array of HeadlineParsedText.

> 
> 3. In many cases people may already have TSVector for a given document
> (for search operation). Would it be faster to pass TSVector to headline
> function when compared to computing TSVector each time? If that is the
> case then should we have an option to pass TSVector to headline
> function?
As I mentioned above, tsvector doesn;t contain whole information about text.

-- 
Teodor Sigaev                                   E-mail: teodor@sigaev.ru
  WWW: http://www.sigaev.ru/
 


pgsql-hackers by date:

Previous
From: Michael Meskes
Date:
Subject: Re: keyword list/ecpg
Next
From: Michael Meskes
Date:
Subject: Re: WITH RECURSIVE patches V0.1 TODO items