Thread: Full Text Indexing

Full Text Indexing

From
"Christopher Kings-Lynne"
Date:
Hi,

I'm playing around with the Full Text Indexing module, and I notice that
it's case-sensitive.  This seems to be pretty useless to me - especially for
my application.  I wonder if there'd be any objections to me modifying it to
be case-insensitive.  Or at least be configurable either way...

Also, the fti.pl that comes with the contrib seems to be using an outdated
version of CPAN's Pg.pm.

The Perl script currently does stuff in a procedural way:

ie. print(PQErrorMessage($conn))

Where it seems to need to be:
   print($conn->errorMessage).

I'm not sure if I'm missing something here, but I could also update it to
use the new interface.

Regards,

Chris



Re: Full Text Indexing

From
Tom Lane
Date:
"Christopher Kings-Lynne" <chriskl@familyhealth.com.au> writes:
> I'm playing around with the Full Text Indexing module, and I notice that
> it's case-sensitive.  This seems to be pretty useless to me - especially for
> my application.  I wonder if there'd be any objections to me modifying it to
> be case-insensitive.  Or at least be configurable either way...

Seems like a good idea, but make it configurable.

> Also, the fti.pl that comes with the contrib seems to be using an outdated
> version of CPAN's Pg.pm.

It hasn't been touched in awhile, so feel free to update it.  BTW,
someone ought to look at bringing src/interfaces/perl5 into sync with
the CPAN version, too.  Or possibly we should stop distributing that
altogether, if the CPAN copy is being maintained?
        regards, tom lane


Re: Full Text Indexing

From
"Christopher Kings-Lynne"
Date:
> "Christopher Kings-Lynne" <chriskl@familyhealth.com.au> writes:
> > I'm playing around with the Full Text Indexing module, and I notice that
> > it's case-sensitive.  This seems to be pretty useless to me -
> especially for
> > my application.  I wonder if there'd be any objections to me
> modifying it to
> > be case-insensitive.  Or at least be configurable either way...
>
> Seems like a good idea, but make it configurable.

I actually came up with another way of solving the problem.

The FTI table has two columns: (string, id).  The code needs to do two
things; delete all strings for an id, and join to the main table based on
the id.

The docs for FTI recommend indexing (string, id).  This is poor as the
delete based on id does a sequential scan, although the join seems to be
able to use the index (as long was you have a where string ~ '^something').

I indexed as follows:

-- Functional index that lets us do case-insensitivity without hacking
fti.so
CREATE INDEX fti_string_idx ON fti_table(lower(string));

-- Index on id to allow fast deletes
CREATE INDEX fti_id_idx ON fti_table(id);

That seems to be a good solution to me - it allows case-insensitivity, fast
deletion and fast joining.

> > Also, the fti.pl that comes with the contrib seems to be using
> an outdated
> > version of CPAN's Pg.pm.
>
> It hasn't been touched in awhile, so feel free to update it.  BTW,
> someone ought to look at bringing src/interfaces/perl5 into sync with
> the CPAN version, too.  Or possibly we should stop distributing that
> altogether, if the CPAN copy is being maintained?

I'll have a look someday maybe, but I'll try to get this
harder-than-expected ADD CONSTRAINT UNIQUE/PRIMARY patch off my hands first.

Chris



Re: Full Text Indexing

From
"Christopher Kings-Lynne"
Date:
Doh - sorry about these hideously late posts.  I think my mail queue has
been clogged up for a while!

Chris

> -----Original Message-----
> From: pgsql-hackers-owner@postgresql.org
> [mailto:pgsql-hackers-owner@postgresql.org]On Behalf Of Christopher
> Kings-Lynne
> Sent: Monday, 23 July 2001 10:34 AM
> To: Tom Lane
> Cc: Hackers
> Subject: Re: [HACKERS] Full Text Indexing
>
>
> > "Christopher Kings-Lynne" <chriskl@familyhealth.com.au> writes:
> > > I'm playing around with the Full Text Indexing module, and I
> notice that
> > > it's case-sensitive.  This seems to be pretty useless to me -
> > especially for
> > > my application.  I wonder if there'd be any objections to me
> > modifying it to
> > > be case-insensitive.  Or at least be configurable either way...
> >
> > Seems like a good idea, but make it configurable.
>
> I actually came up with another way of solving the problem.
>
> The FTI table has two columns: (string, id).  The code needs to do two
> things; delete all strings for an id, and join to the main table based on
> the id.
>
> The docs for FTI recommend indexing (string, id).  This is poor as the
> delete based on id does a sequential scan, although the join seems to be
> able to use the index (as long was you have a where string ~
> '^something').
>
> I indexed as follows:
>
> -- Functional index that lets us do case-insensitivity without hacking
> fti.so
> CREATE INDEX fti_string_idx ON fti_table(lower(string));
>
> -- Index on id to allow fast deletes
> CREATE INDEX fti_id_idx ON fti_table(id);
>
> That seems to be a good solution to me - it allows
> case-insensitivity, fast
> deletion and fast joining.
>
> > > Also, the fti.pl that comes with the contrib seems to be using
> > an outdated
> > > version of CPAN's Pg.pm.
> >
> > It hasn't been touched in awhile, so feel free to update it.  BTW,
> > someone ought to look at bringing src/interfaces/perl5 into sync with
> > the CPAN version, too.  Or possibly we should stop distributing that
> > altogether, if the CPAN copy is being maintained?
>
> I'll have a look someday maybe, but I'll try to get this
> harder-than-expected ADD CONSTRAINT UNIQUE/PRIMARY patch off my
> hands first.
>
> Chris
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 3: if posting/reading through Usenet, please send an appropriate
> subscribe-nomail command to majordomo@postgresql.org so that your
> message can get through to the mailing list cleanly
>