Re: Text Search - Mailing list pgsql-novice

From D. Duccini
Subject Re: Text Search
Date
Msg-id Pine.GSO.4.03.10108110903450.7337-100000@ra.bpsi.net
Whole thread Raw
In response to Text Search  (Greg Elisara <elisara@world-net.co.nz>)
List pgsql-novice
There a several ways to implement this:

1. you can just use the built in regex operator "~*" to hit the table.
this is one way to handle mixed case searches, the other is is use some of
the functions such as "lower()" and substring()

but this is a somewhat nasty approach requiring big iron since you have to
touch every record to find a single match

2. you can do post/processing on your table and build a seperate index of
just the "important" words -- i say "important" because its up to you to
build a "stop-list" of words (usually things like a/and/the/for/with etc)
and make insertions into a two-column table of word and "id" that refers
to the record with the info

this will be faster than approach #1 at the cost of disk space (disposable
these days) -- (the postgresql INDEXES at last look were not handling
HIGH-VOLUME DUPLICATES very elegantly, but thats okay, it will eventually)

3. you can treat your text as a "blob" (binary large object) problem, and
store your text in either seperate files, or seperate file "system"

this allows you the freedom to let the reviews grow as large as you'd like
-- and store multiple formats very easily -- example, maintain a URL to
the text information in your table, and then use some other web searching
"engine" to index the text data -- like HTDIG.  then all you need to do is
bury META data into the pages or self-referential URI that feeds back into
your database .cgi stuff

just some random thoughts to get you thinking :)

-duck


On Sat, 11 Aug 2001, Greg Elisara wrote:

> Hi folks. I have a product table (music cds) that stores a review in one
> of the columns (datatype - text). The copy for the review can hold
> anywhere between 100-1500 words. The table holds a little over 200, 000
> rows.
>
> I want to set up a way to search through the table using keyword
> searches on the reviews. What is the best way to approach this? Any help
> would be welcome.
>
> Regards. Greg.
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 2: you can get off all lists at once with the unregister command
>     (send "unregister YourEmailAddressHere" to majordomo@postgresql.org)
>


-----------------------------------------------------------------------------
david@backpack.com            BackPack Software, Inc.        www.backpack.com
+1 651.645.7550 voice       "Life is an Adventure.
+1 651.645.9798 fax            Don't forget your BackPack!"
-----------------------------------------------------------------------------


pgsql-novice by date:

Previous
From: Andrew McMillan
Date:
Subject: Re: getting postgres to listen to tcp/ip on debian.
Next
From: Paul McGarry
Date:
Subject: Re: Text Search