Hi!
> 1. Why is hlparsetext used to parse the document rather than the
> parsetext function? Since words to be included in the headline will be
> marked afterwords, it seems more reasonable to just use the parsetext
> function.
> The main difference I see is the use of hlfinditem and marking whether
> some word is repeated.
hlparsetext preserves any kind of lexeme - not indexed, spaces etc. parsetext
doesn't.
hlparsetext preserves original form of lexemes. parsetext doesn't.
>
> The reason this is important is that hlparsetext does not seem to be
> storing word positions which parsetext does. The word positions are
> important for generating headline with fragments.
Doesn't needed - hlparsetext preserves the whole text, so, position is a number
of array.
>
> 2.
>> I would prefer the signature ts_headline( [regconfig,] text, tsquery
>> [,text] )and function should accept 'NumFragments=>N' for default
>> parser. Another parsers may use another options.
>
> Does this mean we want a unified function ts_headline and we trigger the
> fragments if NumFragments is specified?
Trigger should be inside parser-specific function (pg_ts_parser.prsheadline).
Another parsers might not recognize that option.
> It seems that introducing a new
> function which can take configuration OID, or name is complex as there
> are so many functions handling these issues in wparser.c.
No, of course - ts_headline takes care about finding configuration and calling
correct parser.
>
> If this is true then we need to just add marking of headline words in
> prsd_headline. Otherwise we will need another prsd_headline_with_covers
> function.
Yeah, pg_ts_parser.prsheadline should mark the lexemes to. It even can change
an array of HeadlineParsedText.
>
> 3. In many cases people may already have TSVector for a given document
> (for search operation). Would it be faster to pass TSVector to headline
> function when compared to computing TSVector each time? If that is the
> case then should we have an option to pass TSVector to headline
> function?
As I mentioned above, tsvector doesn;t contain whole information about text.
--
Teodor Sigaev E-mail: teodor@sigaev.ru
WWW: http://www.sigaev.ru/