Re: OK, does anyone have any better ideas? - Mailing list pgsql-hackers

From Oleg Bartunov
Subject Re: OK, does anyone have any better ideas?
Date
Msg-id Pine.GSO.3.96.SK.1001209114358.4174j-100000@ra
Whole thread Raw
In response to Re: OK, does anyone have any better ideas?  (mlw <markw@mohawksoft.com>)
Responses ...
List pgsql-hackers
We need multi-key B-tree like index for such problem.
Our full text search engine is based on arrays and we need to find quickly
is some number exists in array - some kind of index over int array.
We're currently testing GiST approach and seems will have some conclusions
soon. I think multi-key B-tree like index would be better in my
opinion, but this requires to much work and knowledge of postgres's internals.
Yesterday I read about UBTree, seems like it's good for index and query
sets. Currently postgres has no set specific methods.
Regards,
    Oleg
On Fri, 8 Dec 2000, mlw wrote:

> Date: Fri, 08 Dec 2000 20:17:34 -0500
> From: mlw <markw@mohawksoft.com>
> To: Tom Lane <tgl@sss.pgh.pa.us>
> Cc: Hackers List <pgsql-hackers@postgresql.org>
> Subject: Re: [HACKERS] OK, does anyone have any better ideas?
> 
> Tom Lane wrote:
> > 
> > mlw <markw@mohawksoft.com> writes:
> > > I have a working version of a text search engine. I want to make it work
> > > for Postgres (I will be releasing it GPL). It can literally find the
> > > occurrence of a string of words within 5 million records in a few
> > > milliseconds.
> > 
> > Where are the records coming from?  Are they inside the database?
> > (If not, why do you care about integrating this with Postgres?)
> > 
> > It seems like the right way to integrate this sort of functionality
> > is to turn it into a kind of index, so that you can do
> > 
> >         SELECT * FROM mytable WHERE keyfield ~~~ 'search string';
> > 
> > where ~~~ is the name of some operator that is associated with the
> > index.  The temporary-table approach you are taking seems inherently
> > klugy, and would still be awkward even if we had functions returning
> > recordsets...
> 
> Oh! Another method I tried and just could not get working was returning
> an array of integers. I as thinking about "select * from table where
> key_field in ( textsearch('bla bla') ), but I haven't been able to get
> that to work, and as per a previous post and belatedly reading a FAQ,
> this would probably still force a full table scan.
> 
> Another method I thought about was having a table with some maximum
> number of zero initialized records, and trying something like: 
> 
> create table temp_table as select * from ts_template limit
> textsearch('bla bla', 10);
> 
> select filltable(temp_table, 10);
> 
> select * from table where key_field = temp_table.key;
> 
> As you can see, all of these ideas are heinous hacks, there has to be a
> better way. Surely someone has a better idea.
> 
> -- 
> http://www.mohawksoft.com
> 

_____________________________________________________________
Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
Sternberg Astronomical Institute, Moscow University (Russia)
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(095)939-16-83, +007(095)939-23-83



pgsql-hackers by date:

Previous
From: Philip Warner
Date:
Subject: Re: 7.0.3(nofsync) vs 7.1
Next
From: Oleg Bartunov
Date:
Subject: ...