Thread: Boolean text, with phrase ranking, search under Postgres

Boolean text, with phrase ranking, search under Postgres

From
markw
Date:
I am working on a GPL version of a boolean text search engine for
PostgreSQL.

How it works:

You run a program which executes a query and builds a set of external
indexes.

Then you run a daemon process which does the processing of the text
query.

In postgres, you create a temporary table of results, call textsearch
which populates the table,
lastly, you join with the results table. The code looks like this:

>>>>>>>>>>>>>
--
-- Create a temporary table for search results
--
create temp table search_result (key integer, rank integer);
--
-- Call search daemon to populate table
--
select textsearch('performer2{ waitresses } song { i know what boys like
}');
--
-- Join result table with data tables
--
select title, song, performer2, rank from zsong, ztitles, search_result
        where search_result.key = zsong.trackid and
        zsong.muzenbr = ztitles.muzenbr
        order by search_result.rank desc;
--
-- Finished with result table, drop it.
--
drop table search_result ;
<<<<<<<<<<<<<<<<

he textsearch function looks like:
>>>>>>>>>>>>>>>>>>>>>>>>
create function textsearch(varchar) returns integer as
        '
        DECLARE
                handle  integer;
                count   integer;
                pos     integer;
        BEGIN
                handle = search_exec( \'localhost\', $1);

                count = search_count(handle);

                for pos in 0 .. count-1 loop
                        insert into search_result(key, rank)
                        values (search_key(handle,pos),
search_rank(handle,pos));
                end loop;

                return search_done(handle);

        END;
' language 'plpgsql';


What I would like to do is create the result table in the function
and/or accept a table name as a parameter. I can't seem to do this,
perhaps I am missing something trivial.

Is there a way to create a table from within 'c' and return it?  I am
not a postgres pro, I am a C/C++ guy and do not know the ins and outs of
Postgres, and it should be a lot easier to make something more eficient.
Any insign would be appreciated.


I too am interested in full text indexing under PostgreSQL.

Moreover, I have run into the same problem as you (in a different project
though): how do you return more than 1 value from a C function? I'd like to
return "rows", but, if at all possible, NOT to create a table. How about a
cursor? Perhaps one could then do something such as

insert into new_table select textsearch('bla...');

Does anyone know if it's possible?

> -----Original Message-----
> From: markw <markw@mohawksoft.com>
> Sent: Monday, October 16, 2000 3:33 PM
> To: pgsql-announce@postgresql.org,pgsql-general@postgresql.org
> Subject: [ANNOUNCE] Boolean text, with phrase ranking, search under
> Postgres
>
>
> I am working on a GPL version of a boolean text search engine for
> PostgreSQL.
>
> How it works:
>
> You run a program which executes a query and builds a set of external
> indexes.
>
> Then you run a daemon process which does the processing of the text
> query.
>
> In postgres, you create a temporary table of results, call textsearch
> which populates the table,
> lastly, you join with the results table. The code looks like this:
>
> >>>>>>>>>>>>>
> --
> -- Create a temporary table for search results
> --
> create temp table search_result (key integer, rank integer);
> --
> -- Call search daemon to populate table
> --
> select textsearch('performer2{ waitresses } song { i know what boys like
> }');
... (cut) ...
> What I would like to do is create the result table in the function
> and/or accept a table name as a parameter. I can't seem to do this,
> perhaps I am missing something trivial.
>
> Is there a way to create a table from within 'c' and return it?  I am
> not a postgres pro, I am a C/C++ guy and do not know the ins and outs of
> Postgres, and it should be a lot easier to make something more eficient.
> Any insign would be appreciated.


"Edmar Wiggers" <edmar@brasmap.com> writes:
> Moreover, I have run into the same problem as you (in a different project
> though): how do you return more than 1 value from a C function?

In current releases, a C function cannot return a set; the expression
evaluator has a hard-wired notion that only SQL-language functions can
return sets.

This is fixed in current development sources (7.1-to-be).  How to do it
might even be documented by the time 7.1 gets out, though it's not yet
:-(

            regards, tom lane

Edmar Wiggers wrote:
>
> I too am interested in full text indexing under PostgreSQL.

As I have described it, does it sound like something you would use? It
is designed more like a search engine. It will do a full text / boolean
search with phrase ranking in about 10~40 ms depending on the number of
terms, memory and CPU speed.

I need to return or, as it is implemented, fill a table with the index
keys as entries in a table to preserve rank.

BTW: it also does a modified metaphone parsing on text.

>
> Moreover, I have run into the same problem as you (in a different project
> though): how do you return more than 1 value from a C function? I'd like to
> return "rows", but, if at all possible, NOT to create a table. How about a
> cursor? Perhaps one could then do something such as
>
> insert into new_table select textsearch('bla...');

I would like this API best:

create temp table search_results as select textsearch('bla bla');

If I can do this, it will be very cool.

>
> Does anyone know if it's possible?
>
> > -----Original Message-----
> > From: markw <markw@mohawksoft.com>
> > Sent: Monday, October 16, 2000 3:33 PM
> > To: pgsql-announce@postgresql.org,pgsql-general@postgresql.org
> > Subject: [ANNOUNCE] Boolean text, with phrase ranking, search under
> > Postgres
> >
> >
> > I am working on a GPL version of a boolean text search engine for
> > PostgreSQL.
> >
> > How it works:
> >
> > You run a program which executes a query and builds a set of external
> > indexes.
> >
> > Then you run a daemon process which does the processing of the text
> > query.
> >
> > In postgres, you create a temporary table of results, call textsearch
> > which populates the table,
> > lastly, you join with the results table. The code looks like this:
> >
> > >>>>>>>>>>>>>
> > --
> > -- Create a temporary table for search results
> > --
> > create temp table search_result (key integer, rank integer);
> > --
> > -- Call search daemon to populate table
> > --
> > select textsearch('performer2{ waitresses } song { i know what boys like
> > }');
> ... (cut) ...
> > What I would like to do is create the result table in the function
> > and/or accept a table name as a parameter. I can't seem to do this,
> > perhaps I am missing something trivial.
> >
> > Is there a way to create a table from within 'c' and return it?  I am
> > not a postgres pro, I am a C/C++ guy and do not know the ins and outs of
> > Postgres, and it should be a lot easier to make something more eficient.
> > Any insign would be appreciated.

--
http://www.mohawksoft.com

> Edmar Wiggers wrote:
> >
> > I too am interested in full text indexing under PostgreSQL.
>
> As I have described it, does it sound like something you would use? It
> is designed more like a search engine. It will do a full text / boolean
> search with phrase ranking in about 10~40 ms depending on the number of
> terms, memory and CPU speed.

Does this actually exist or is it still in the design phase of development?

-Mitch