Re: full text indexing - Mailing list pgsql-general

From Mitch Vincent
Subject Re: full text indexing
Date
Msg-id 016301c028c2$39325420$0200000a@doot
Whole thread Raw
In response to full text indexing  ("Poul L. Christiansen" <poulc@cs.auc.dk>)
List pgsql-general
> Hi
>
> In my PostgreSQL database I have a lot of newspaper articles (size:
> 100mb now, growing beyond 1gb within few months).
> I wan't to use full text indexing so that users can search the articles
> with a keyword and have the results in less than one second.
>
> How do I accomplish that?

It's complicated :-)

> Does PostgreSQL have this feature?

Nope.

> Which 3rd party indexing tools are available that easily interoperate
> with PostgreSQL?

There is some code in contrib called FTI (Full Text Index - no doubt).. I
re-wrote it for my uses but ended up not using because I was doing so many
sorts and joins, it made a scan (even an index scan) of a table with 3
million rows in it very slow. However id you were just searching the keyword
table itself it was very, very fast.

The FTI trigger in the contrib breaks the words down to 2 letter bits (for
substring searching) -- mine doesn't, it only indexes whole words without
duplicates and looks at a list of words not to index (words like a, an, the,
anything else you want -- I think it has the 300 most used English words in
there already)

It's drawback is speed, it does take a few seconds to index on INSERT (and
UPDATE) -- that's if your text fields are 30ish k (very close to the max PG
can store)..

I would say you're going to run into the 32k limit pretty quick with
newspaper articles -- I index resumes and I've run into it many times (32k
of text really isn't all that much)..

Anyway, I'll try and get that trigger together that I did and send it to the
PG guys to see if it's worthy of being added to contrib -- I'll send you a
copy in private if you'd like. Note: I think there have been more people to
re-write that trigger, I haven't seen anything else though..

Good luck!

<End of long-winded response>

-Mitch



pgsql-general by date:

Previous
From: Titus Brown
Date:
Subject: Re: full text indexing
Next
From: Gunnar R|nning
Date:
Subject: Re: full text indexing