Re: Full text Indexing -out of contrib and into main.. - Mailing list pgsql-hackers

From Hannu Krosing
Subject Re: Full text Indexing -out of contrib and into main..
Date
Msg-id 3A237A97.D10C9E20@tm.ee
Whole thread Raw
In response to Full text Indexing -out of contrib and into main..  ("John Huttley" <John@mwk.co.nz>)
List pgsql-hackers
john huttley wrote:
> 
> > I believe that it is appropriate for contrib/ because it is a good demo
> > of FTI-like capabilities. But nothing more, yet. For at least a couple
> > of reasons:
> >
> > 1) It generates the "index" as a table, not a PostgreSQL index or
> > index-like thing.
> >
> > 2) It has a hardcoded list of non-indexed words. This should come from a
> > table, to allow it to be tuned to the application requirements.
> >
> > Comments?
> >
> >                      - Thomas
> >
> 
> In general..
> a) Considering that I was coding up the same thing with triggers and such,
> things could only get better.

AFAIK, the one in contrib _is_ the same thing coded up with triggers and
such ;)

> b) Check out MSSQL 7's  capabilities and weep.

BTW, have you studied MSSQL enough to tell me if it has a
separate/standalone 
(as a process) fti engine or just another index type.

I have been contemplating about implementing FTI for postgres for some
time and my 
current plan would be to implement a out-of-process fti engine (API +
sample 
implementation, in the spirit of PostgreSQLs extensibility) that could
postpone 
the actual indexing but still help with queries even for not yet fully
indexed stuff.

Will probably need some choreography but essential for high performance.

You generally don't want to wait for all index entries of an inverted
index to be saved.

Also the thing should be more general than the one in contrib , being
able to index 
both fields and full records and support functional indexes.


Is there a way to make PostgresQL optimiser aware of the
selectivity/cost of function, 
so that it can do the right thing for a query like

SELECT * FROM ARTICLES WHERE ADATE BETWEEN YESTERDAY AND TOMORROW   AND ARTICLES.FTI_MATCHES('(CAT & DOG) !
PRESIDENT')

It would be almost automatic if functions could return sets and then be
used like

SELECT * FROM ARTICLE WHERE ADATE BETWEEN YESTERDAY AND TOMORROW   AND ARTICLE_ID = ARTICLE.FTI_MATCHING_IDS('(CAT &
DOG)! PRESIDENT')
 

and somehow the optimiser would know that it can join on the returned
ids but this 
is probably not the case ;)

> c) It would be a start. One its in the tree, it gets used more, gets
> improved..

But, it is not a _real_ full text index, just a postgresql sample
application that 
implements a full text index using an sql database.

----------
Hannu


pgsql-hackers by date:

Previous
From: "He weiping"
Date:
Subject: is it a bug?
Next
From: Zeugswetter Andreas SB
Date:
Subject: AW: 8192 BLCKSZ ?