Re: [GENERAL] Fragments in tsearch2 headline - Mailing list pgsql-hackers
From | Sushant Sinha |
---|---|
Subject | Re: [GENERAL] Fragments in tsearch2 headline |
Date | |
Msg-id | 1211663937.7293.26.camel@dragflick Whole thread Raw |
In response to | Re: [GENERAL] Fragments in tsearch2 headline (Teodor Sigaev <teodor@sigaev.ru>) |
Responses |
Re: [GENERAL] Fragments in tsearch2 headline
Re: [GENERAL] Fragments in tsearch2 headline |
List | pgsql-hackers |
Now I understand the code much better. A few more questions on headline generation that I was not able to get from the code: 1. Why is hlparsetext used to parse the document rather than the parsetext function? Since words to be included in the headline will be marked afterwords, it seems more reasonable to just use the parsetext function. The main difference I see is the use of hlfinditem and marking whether some word is repeated. The reason this is important is that hlparsetext does not seem to be storing word positions which parsetext does. The word positions are important for generating headline with fragments. 2. > I would prefer the signature ts_headline( [regconfig,] text, tsquery >[,text] )and function should accept 'NumFragments=>N' for default >parser. Another parsers may use another options. Does this mean we want a unified function ts_headline and we trigger the fragments if NumFragments is specified? It seems that introducing a new function which can take configuration OID, or name is complex as there are so many functions handling these issues in wparser.c. If this is true then we need to just add marking of headline words in prsd_headline. Otherwise we will need another prsd_headline_with_covers function. 3. In many cases people may already have TSVector for a given document (for search operation). Would it be faster to pass TSVector to headline function when compared to computing TSVector each time? If that is the case then should we have an option to pass TSVector to headline function? -Sushant. On Sat, 2008-05-24 at 07:57 +0400, Teodor Sigaev wrote: > [moved to -hackers, because talk is about implementation details] > > > I've ported the patch of Sushant Sinha for fragmented headlines to pg8.3.1 > > (http://archives.postgresql.org/pgsql-general/2007-11/msg00508.php) > Thank you. > > 1 > diff -Nrub postgresql-8.3.1-orig/contrib/tsearch2/tsearch2.c > now contrib/tsearch2 is compatibility layer for old applications - they don't > know about new features. So, this part isn't needed. > > 2 solution to compile function (ts_headline_with_fragments) into core, but > using it only from contrib module looks very odd. So, new feature can be used > only with compatibility layer for old release :) > > 3 headline_with_fragments() is hardcoded to use default parser, but what will be > in case when configuration uses another parser? For example, for japanese language. > > 4 I would prefer the signature ts_headline( [regconfig,] text, tsquery [,text] ) > and function should accept 'NumFragments=>N' for default parser. Another parsers > may use another options. > > 5 it just doesn't work correctly, because new code doesn't care of parser > specific type of lexemes. > contrib_regression=# select headline_with_fragments('english', 'wow asd-wow > wow', 'asd', ''); > headline_with_fragments > ---------------------------------- > ...wow asd-wow<b>asd</b>-wow wow > (1 row) > > > So, I incline to use existing framework/infrastructure although it may be a > subject to change. > > Some description: > 1 ts_headline defines a correct parser to use > 2 it calls hlparsetext to split text into structure suitable for both goals: > find the best fragment(s) and concatenate that fragment(s) back to the text > representation > 3 it calls parser specific method prsheadline which works with preparsed text > (parse was done in hlparsetext). Method should mark a needed > words/parts/lexemes etc. > 4 ts_headline glues fragments into text and returns that. > > We need a parser's headline method because only parser knows all about its lexemes. > > > -- > Teodor Sigaev E-mail: teodor@sigaev.ru > WWW: http://www.sigaev.ru/ > >
pgsql-hackers by date: