Re: [GENERAL] Text Databases - Mailing list pgsql-general

From The Hermit Hacker
Subject Re: [GENERAL] Text Databases
Date
Msg-id Pine.BSF.4.05.9812030133590.4737-100000@thelab.hub.org
Whole thread Raw
In response to Text Databases  (Nicholas Humfrey <nhumfrey@poboxes.com>)
List pgsql-general
On Wed, 2 Dec 1998, Nicholas Humfrey wrote:

> Hi,
>
> Here is a very general question:
>
> How good is PostgreSQL at hosting nearly pure text based databases ? Is it
> possible to search mutilple cols for the same keyword ? How well would it
> handle 40,000+ rows with say 15 columns each with about 20 words in each
> cell ?

    See the 'fulltextindex' directory under the contrib directory.
Part of the README follows...

Marc G. Fournier
Systems Administrator @ hub.org
primary: scrappy@hub.org           secondary: scrappy@{freebsd|postgresql}.org


An attempt at some sort of Full Text Indexing for PostgreSQL.

The included software is an attempt to add some sort of Full Text Indexing
support to PostgreSQL. I mean by this that we can ask questions like:

        Give me all rows that have 'still' and 'nash' in the 'artist'
field.

Ofcourse we can write this as:

        select * from cds where artist ~* 'stills' and artist ~* 'nash';

But this does not use any indices, and therefore, if your database
gets very large, it will not have very high performance (the above query
requires at least one sequential scan, it probably takes 2 due to the
self-join).

The approach used by this add-on is to define a trigger on the table and
column you want to do this queries on. On every insert in the table, it
takes the value in the specified column, breaks the text in this column
up into pieces, and stores all sub-strings into another table, together
with a reference to the row in the original table that contained this
sub-string (it uses the oid of that row).

By now creating an index over the 'fti-table', we can search for
substrings that occur in the original table. By making a join between
the fti-table and the orig-table, we can get the actual rows we want
(this can also be done by using subselects, and maybe there're other
ways too).



pgsql-general by date:

Previous
From: The Hermit Hacker
Date:
Subject: Re: [GENERAL] A question about some Linux's libs ( I hope ) !
Next
From: Sze Yuen Wong
Date:
Subject: Y2K REady?