Re: Fragments in tsearch2 headline - Mailing list pgsql-general

From Teodor Sigaev
Subject Re: Fragments in tsearch2 headline
Date
Msg-id 4837921C.8000905@sigaev.ru
Whole thread Raw
In response to Re: Fragments in tsearch2 headline  ("Pierre-Yves Strub" <pierre.yves.strub@gmail.com>)
List pgsql-general
[moved to -hackers, because talk is about implementation details]

> I've ported the patch of Sushant Sinha for fragmented headlines to pg8.3.1
> (http://archives.postgresql.org/pgsql-general/2007-11/msg00508.php)
Thank you.

1 > diff -Nrub postgresql-8.3.1-orig/contrib/tsearch2/tsearch2.c
now contrib/tsearch2 is compatibility layer for old applications - they don't
know about new features. So, this part isn't needed.

2 solution to compile function (ts_headline_with_fragments)  into core, but
using it only from contrib module looks very odd. So, new feature can be used
only with compatibility layer for old release :)

3 headline_with_fragments() is hardcoded to use default parser, but what will be
in case when configuration uses another parser? For example, for japanese language.

4 I would prefer the signature ts_headline( [regconfig,] text, tsquery [,text] )
and function should accept 'NumFragments=>N' for default parser. Another parsers
may use another options.

5 it just doesn't work correctly, because new code doesn't care of parser
specific type of lexemes.
contrib_regression=# select headline_with_fragments('english', 'wow asd-wow
wow', 'asd', '');
      headline_with_fragments
----------------------------------
  ...wow asd-wow<b>asd</b>-wow wow
(1 row)


So, I incline to use existing framework/infrastructure although it may be a
subject to change.

Some description:
1 ts_headline defines a correct parser to use
2 it calls hlparsetext to split text into structure suitable for both goals:
find the best fragment(s) and concatenate that fragment(s) back to the text
representation
3 it calls parser specific method   prsheadline which works with preparsed text
(parse was done in hlparsetext). Method should mark a needed
words/parts/lexemes etc.
4 ts_headline glues fragments into text and returns that.

We need a parser's headline method because only parser knows all about its lexemes.


--
Teodor Sigaev                                   E-mail: teodor@sigaev.ru
                                                    WWW: http://www.sigaev.ru/


pgsql-general by date:

Previous
From: Gregory Stark
Date:
Subject: Re: Error: Could not open relation...
Next
From: Magnus Hagander
Date:
Subject: Re: Error: Could not open relation...