Re: OK, does anyone have any better ideas? - Mailing list pgsql-hackers

From mlw
Subject Re: OK, does anyone have any better ideas?
Date
Msg-id 3A31882E.A44E9C06@mohawksoft.com
Whole thread Raw
In response to OK, does anyone have any better ideas?  (mlw <markw@mohawksoft.com>)
Responses Re: OK, does anyone have any better ideas?  (Oleg Bartunov <oleg@sai.msu.su>)
List pgsql-hackers
Tom Lane wrote:
> 
> mlw <markw@mohawksoft.com> writes:
> > I have a working version of a text search engine. I want to make it work
> > for Postgres (I will be releasing it GPL). It can literally find the
> > occurrence of a string of words within 5 million records in a few
> > milliseconds.
> 
> Where are the records coming from?  Are they inside the database?
> (If not, why do you care about integrating this with Postgres?)
> 
> It seems like the right way to integrate this sort of functionality
> is to turn it into a kind of index, so that you can do
> 
>         SELECT * FROM mytable WHERE keyfield ~~~ 'search string';
> 
> where ~~~ is the name of some operator that is associated with the
> index.  The temporary-table approach you are taking seems inherently
> klugy, and would still be awkward even if we had functions returning
> recordsets...

Oh! Another method I tried and just could not get working was returning
an array of integers. I as thinking about "select * from table where
key_field in ( textsearch('bla bla') ), but I haven't been able to get
that to work, and as per a previous post and belatedly reading a FAQ,
this would probably still force a full table scan.

Another method I thought about was having a table with some maximum
number of zero initialized records, and trying something like: 

create table temp_table as select * from ts_template limit
textsearch('bla bla', 10);

select filltable(temp_table, 10);

select * from table where key_field = temp_table.key;

As you can see, all of these ideas are heinous hacks, there has to be a
better way. Surely someone has a better idea.

-- 
http://www.mohawksoft.com


pgsql-hackers by date:

Previous
From: Chih-Chang Hsieh
Date:
Subject: Re: A mb problem in PostgreSQL
Next
From: "Mikheev, Vadim"
Date:
Subject: RE: 7.0.3(nofsync) vs 7.1