Thread: Changes in /contrib/fulltextindex

Changes in /contrib/fulltextindex

From
"Florian Helmberger"
Date:
Hi.

I have done some changes and improvements to the fulltextindex trigger
(contrib/fulltextindex). As these changes affects among other things the
parameter list I would like to ask the maintainer about his thoughts before
I submit a patch. And there comes the problem - there is no explizit
maintainer listed in the accompanying docs. The last changes according to
the cvs log has been done by Bruce Momjian.

Maybe the original author is subscribed to this list and can get in touch
with me because I'm kind of new to incorporation patches to a open source
project.

The changes made include:

+ Changed the split up behaviour from checking via isalpha to
  using a list of delimiters as isalpha is a pain used with
  data containing german umlauts, etc. ATM this list contains:

  " ,;.:-_#/*+~^°!?\"\\§$%&()[]{}=<>|0123456789\n\r\t@µ"

+ If the do be indexed field has'nt changed, the indexing will
  not be done. This way unnecessary redindexing of fields not
  affected by an update can by omited.

+ There is a new field 'word' of type BOOL in the index table.
  Using this field, it is possible to do 'full word' and
  'substring' searches.

+ The text gets no longer lowercased before it is written into
  the index table. This way it is possible to to casesensitive
  and incasesenstive (via a functional index using lower) searches.

+ Added functionality to prevent indexing of duplicate words
  (this is one item of the todo list in fti.c) using a hash
  table. As this is chained to a significant loos of performance
  and depends on the indexed data, this can be turned on or off
  via a newly introduced parameter.

-Florian

--
"Computers are useless. They can only give you answers."
  -- Pablo Picasso.




Re: Changes in /contrib/fulltextindex

From
"Christopher Kings-Lynne"
Date:
Hi Florian,

> I have done some changes and improvements to the fulltextindex trigger
> (contrib/fulltextindex). As these changes affects among other things the
> parameter list I would like to ask the maintainer about his thoughts
before
> I submit a patch. And there comes the problem - there is no explizit
> maintainer listed in the accompanying docs. The last changes according to
> the cvs log has been done by Bruce Momjian.

The most recent patches were submitted by me, so I guess you could call me
the defacto "maintainer".

> Maybe the original author is subscribed to this list and can get in touch
> with me because I'm kind of new to incorporation patches to a open source
> project.

Cool, reply to me personally if you need technical help, or if you want me
to review your patch and use the list to discuss things that affect the
project...

> The changes made include:
>
> + Changed the split up behaviour from checking via isalpha to
>   using a list of delimiters as isalpha is a pain used with
>   data containing german umlauts, etc. ATM this list contains:
>
>   " ,;.:-_#/*+~^°!?\"\\§$%&()[]{}=<>|0123456789\n\r\t@µ"

Good idea.  Is there a locale-aware version of isalpha anywhere?

> + If the do be indexed field has'nt changed, the indexing will
>   not be done. This way unnecessary redindexing of fields not
>   affected by an update can by omited.

Fantastic!  That was on my list of things to do!

> + There is a new field 'word' of type BOOL in the index table.
>   Using this field, it is possible to do 'full word' and
>   'substring' searches.

Hehe - that was another idea I had as well.  Breaks back compatibility.

> + The text gets no longer lowercased before it is written into
>   the index table. This way it is possible to to casesensitive
>   and incasesenstive (via a functional index using lower) searches.

ok

> + Added functionality to prevent indexing of duplicate words
>   (this is one item of the todo list in fti.c) using a hash
>   table. As this is chained to a significant loos of performance
>   and depends on the indexed data, this can be turned on or off
>   via a newly introduced parameter.

ok.

OK Florian,  can you please send me your new contrib/fulltextindex directory
tarred up?

List:  what should we do about the backward compatibility problem?

Chris




Re: Changes in /contrib/fulltextindex

From
Bruce Momjian
Date:
Christopher Kings-Lynne wrote:
> > + There is a new field 'word' of type BOOL in the index table.
> >   Using this field, it is possible to do 'full word' and
> >   'substring' searches.
>
> Hehe - that was another idea I had as well.  Breaks back compatibility.
> ok.
>
> OK Florian,  can you please send me your new contrib/fulltextindex directory
> tarred up?

Or a 'diff -c' against 7.2.X.

> List:  what should we do about the backward compatibility problem?

I don't see a problem with backward compatbility here.  It is contrib,
and the README has to explain that they need to reinstall.  I will
mention that in 7.3 HISTORY.

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026



Re: Changes in /contrib/fulltextindex

From
Tom Lane
Date:
"Christopher Kings-Lynne" <chriskl@familyhealth.com.au> writes:
>> + There is a new field 'word' of type BOOL in the index table.
>> Using this field, it is possible to do 'full word' and
>> 'substring' searches.

> Hehe - that was another idea I had as well.  Breaks back compatibility.

> List:  what should we do about the backward compatibility problem?

I'd be inclined to grin and bear it, if it only extends to needing to
rebuild the supporting index.  Changing the name of the function that
an application needs to call will be a much bigger pain than that, no?

            regards, tom lane