> I have attached a new patch with respect to the current cvs head. This
> produces headline in a document for a given query. Basically it
> identifies fragments of text that contain the query and displays them.
New variant is much better, but...
> HeadlineParsedText contains an array of actual words but not
> information about the norms. We need an indexed position vector for each
> norm so that we can quickly evaluate a number of possible fragments.
> Something that tsvector provides.
Why do you need to store norms? The single purpose of norms is identifying words
from query - but it's already done by hlfinditem. It sets
HeadlineWordEntry->item to corresponding QueryOperand in tsquery.
Look, headline function is rather expensive and your patch adds a lot of extra
work - at least in memory usage. And if user calls with NumFragments=0 the that
work is unneeded.
> This approach does not change any other interface and fits nicely with
> the overall framework.
Yeah, it's a really big step forward. Thank you. You are very close to
committing except: Did you find a hlCover() function which produce a cover from
original HeadlineParsedText representation? Is any reason to do not use it?
>
> The norms are converted into tsvector and a number of covers are
> generated. The best covers are then chosen to be in the headline. The
> covers are separated using a hardcoded coversep. Let me know if you want
> to expose this as an option.
>
> Covers that overlap with already chosen covers are excluded.
>
> Some options like ShortWord and MinWords are not taken care of right
> now. MaxWords are used as maxcoversize. Let me know if you would like to
> see other options for fragment generation as well.
ShortWord, MinWords and MaxWords should store their meaning, but for each
fragment, not for the whole headline.
>
> Let me know any more changes you would like to see.
if (num_fragments == 0) /* call the default headline generator */ mark_hl_words(prs,
query,highlight, shortword, min_words, max_words); else mark_hl_fragments(prs, query, highlight,
num_fragments,max_words);
Suppose, num_fragments < 2?
--
Teodor Sigaev E-mail: teodor@sigaev.ru
WWW: http://www.sigaev.ru/