Re: OK, does anyone have any better ideas? - Mailing list pgsql-hackers
From | mlw |
---|---|
Subject | Re: OK, does anyone have any better ideas? |
Date | |
Msg-id | 3A318143.ECB0A019@mohawksoft.com Whole thread Raw |
In response to | OK, does anyone have any better ideas? (mlw <markw@mohawksoft.com>) |
Responses |
Re: OK, does anyone have any better ideas?
Re: OK, does anyone have any better ideas? |
List | pgsql-hackers |
Tom Lane wrote: > > mlw <markw@mohawksoft.com> writes: > > I have a working version of a text search engine. I want to make it work > > for Postgres (I will be releasing it GPL). It can literally find the > > occurrence of a string of words within 5 million records in a few > > milliseconds. > > Where are the records coming from? Are they inside the database? > (If not, why do you care about integrating this with Postgres?) > > It seems like the right way to integrate this sort of functionality > is to turn it into a kind of index, so that you can do > > SELECT * FROM mytable WHERE keyfield ~~~ 'search string'; > > where ~~~ is the name of some operator that is associated with the > index. The temporary-table approach you are taking seems inherently > klugy, and would still be awkward even if we had functions returning > recordsets... OK, I get the misunderstanding, you are absolutely right it is VERY kludgy. It is sort of like a bitmap index, but it is more of a search engine. I actually have it working on a commercial website. You run a program periodically (cron job?) that executes a query, the query is then parsed and an index of words, keys, ranks and phrase meta-data is created. You also specify which fields in the query should be indexed and which field will be the "key." (It is not ACID if I understand what they term means.) The data for the text search need not even be in the database, as long as the "key" being indexed is. Then you call search with a string, such as "the long and winding road" or "software OR hardware AND engineer NOT sales." A few milliseconds later, a list of key/rank pairs are produced. This is FAR faster than the '~~~' operator because it never does a full table scan. It is assumed that the "key" field specified is properly indexed. If I had a way of getting the key/rank result pair deeper into Postgres, it would be an amazing platform to make some serious high speed search applications. Think about a million resumes' online and searchable with an arbitrary text string to get a list of candidates, powered by Postgres, handling 100 queries a second. Right now, the way I have it working is PHP makes the search call and then executes a query with the first result (highest rank) and returns the data. If I could get the key/rank pair into postgres as a table or multiple searches into postgres as a set of tables, then you could do some amazing queries really really fast. Still, you said that "select foo from bar where key = textsearch('bla bla',..)" could not be done, and my previous example was the only other way I have been able to even prototype my idea. -- http://www.mohawksoft.com
pgsql-hackers by date: