Thread: Full text Indexing -out of contrib and into main..

Full text Indexing -out of contrib and into main..

From
"John Huttley"
Date:
>
> Maybe asking 'Why isn't the contrib full-text-indexer not in the main
> tree?' would be more productive on that front.

Well, yes. Why isn't it?

Full text indexing should be just as much a feature as any other key feature in
PG.
With the advent of unlimited file and record lengths in 7.1, this would be a good
time to
include it.

FTI is particularly useful in the context of web content engines.


> How did you attempt to build it under the RPM install?  I assume you had
> the postgresql-devel package installed, and the include paths set
> properly.... Of course, most of what is in contrib assumes a full source
> tree is lying around (argh)....ie, it's wanting to include
> Makefile.global in its Makefile.  And, on the RPM dist, Makefile.global
> isn't (yet) packaged.


Yes, I have the devel RPM, but FTI  couldn't find its include files. Or its
libraries. Or something.

Building from a source tree has always been better for me. The catch is mixing
that with the RPMS,
which put things in unholy locations.
Its very hard (for me at least) to update an RPM version with a version compiled
from the source.

I recently updated my PG system from 6.5.3 to 7.0.3, still RPMs,  (another fun
job) and have not tried to compile FTI
subsequently.

However if I tried hard enough I'm sure I could fix it. For the moment, I'm on
another job so I'm not worrying.

Regards

John




Re: Full text Indexing -out of contrib and into main..

From
Don Baccus
Date:
At 02:51 PM 11/28/00 +1300, John Huttley wrote:
>>
>> Maybe asking 'Why isn't the contrib full-text-indexer not in the main
>> tree?' would be more productive on that front.
>
>Well, yes. Why isn't it?
>
>Full text indexing should be just as much a feature as any other key feature in
>PG.
>With the advent of unlimited file and record lengths in 7.1, this would be a good
>time to
>include it.
>
>FTI is particularly useful in the context of web content engines.

Well ... it's pretty inadequate, actually.  That might be one reason it's only
in contrib.



- Don Baccus, Portland OR <dhogaza@pacifier.com> Nature photos, on-line guides, Pacific Northwest Rare Bird Alert
Serviceand other goodies at http://donb.photo.net.
 


Re: Full text Indexing -out of contrib and into main..

From
Lamar Owen
Date:
John Huttley wrote:
> > Maybe asking 'Why isn't the contrib full-text-indexer not in the main
> > tree?' would be more productive on that front.
> Well, yes. Why isn't it?

I'm hoping to see the answer to that one myself, as that is outside my
scope currently. I just RPMize things...  Although, I didn't intend for
my statement to appear as harsh as it does -- for that I apologize.

> Yes, I have the devel RPM, but FTI  couldn't find its include files. Or its
> libraries. Or something.

Makefile.global.  Tried it here. The RPMset hasn't heretofore needed
Makefile.global.  I may package that, amongst other stuff necessary to
build certain things in the devel package -- once I find out how to go
about doing it.

> Building from a source tree has always been better for me. The catch is mixing
> that with the RPMS,
> which put things in unholy locations.
> Its very hard (for me at least) to update an RPM version with a version compiled
> from the source.

I recommend completely removing the RPM version and installing from
source rather than trying to upgrade from an RPM distribution to the
from-source distribution.  Or just install the next RPM version.  Let
RPM work the headaches for you.  If you want to run from a 'from-source'
build, then nix the RPMset altogether and don't worry about it
afterward.

Although I am going to consider a pre-built set of contribs -- most
notably, geodistance is likely to find its way into an RPM in the
future.  I just haven't decided whether to split out to individual
contribs or to just make a single 'postgresql-contrib' subpackage.  I'm
open to suggestions.

You can, however, install a source-tree preconfigured and built for the
RPM modifications by installing the _source_ RPM, and then issuing, as
root, 'rpm -bi postgresql.spec' from within the /usr/src/redhat/SPECS
dir.  You will then have a source tree primed for building whatever in
/usr/src/redhat/BUILD/postgresql-x.y.z (where x.y.z is the version, of
course). You will need python-devel installed in order to do that,
however.

HTH.
--
Lamar Owen
WGCR Internet Radio
1 Peter 4:11


Re: Full text Indexing -out of contrib and into main..

From
Bruce Momjian
Date:
> >Well, yes. Why isn't it?
> >
> >Full text indexing should be just as much a feature as any other key feature in
> >PG.
> >With the advent of unlimited file and record lengths in 7.1, this would be a good
> >time to
> >include it.
> >
> >FTI is particularly useful in the context of web content engines.
> 
> Well ... it's pretty inadequate, actually.  That might be one reason it's only
> in contrib.

OK, can someone collect suggestions, add the code, and integrate it for
7.1?

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: Full text Indexing -out of contrib and into main..

From
The Hermit Hacker
Date:
On Mon, 27 Nov 2000, Bruce Momjian wrote:

> > >Well, yes. Why isn't it?
> > >
> > >Full text indexing should be just as much a feature as any other key feature in
> > >PG.
> > >With the advent of unlimited file and record lengths in 7.1, this would be a good
> > >time to
> > >include it.
> > >
> > >FTI is particularly useful in the context of web content engines.
> > 
> > Well ... it's pretty inadequate, actually.  That might be one reason it's only
> > in contrib.
> 
> OK, can someone collect suggestions, add the code, and integrate it for
> 7.1?

too late in cycle ... 




Re: Full text Indexing -out of contrib and into main..

From
"Mitch Vincent"
Date:
    I modified the FTI trigger for my own use a while ago (indexes whole
words, eliminates duplicate a few other things) -- I'm not sure if it would
do anyone any good but you're welcome to it. To whom should I send it?

-Mitch

----- Original Message -----
From: "The Hermit Hacker" <scrappy@hub.org>
To: "Bruce Momjian" <pgman@candle.pha.pa.us>
Cc: "Don Baccus" <dhogaza@pacifier.com>; "John Huttley" <John@mwk.co.nz>;
<pgsql-hackers@postgresql.org>
Sent: Monday, November 27, 2000 7:06 PM
Subject: Re: [HACKERS] Full text Indexing -out of contrib and into main..


> On Mon, 27 Nov 2000, Bruce Momjian wrote:
>
> > > >Well, yes. Why isn't it?
> > > >
> > > >Full text indexing should be just as much a feature as any other key
feature in
> > > >PG.
> > > >With the advent of unlimited file and record lengths in 7.1, this
would be a good
> > > >time to
> > > >include it.
> > > >
> > > >FTI is particularly useful in the context of web content engines.
> > >
> > > Well ... it's pretty inadequate, actually.  That might be one reason
it's only
> > > in contrib.
> >
> > OK, can someone collect suggestions, add the code, and integrate it for
> > 7.1?
>
> too late in cycle ...
>
>
>



Re: Full text Indexing -out of contrib and into main..

From
Don Baccus
Date:
At 11:06 PM 11/27/00 -0400, The Hermit Hacker wrote:
>On Mon, 27 Nov 2000, Bruce Momjian wrote:

>> OK, can someone collect suggestions, add the code, and integrate it for
>> 7.1?
>
>too late in cycle ... 

Yes...



- Don Baccus, Portland OR <dhogaza@pacifier.com> Nature photos, on-line guides, Pacific Northwest Rare Bird Alert
Serviceand other goodies at http://donb.photo.net.
 


Re: Full text Indexing -out of contrib and into main..

From
Thomas Lockhart
Date:
> > > Maybe asking 'Why isn't the contrib full-text-indexer not in the main
> > > tree?' would be more productive on that front.
> > Well, yes. Why isn't it?

I believe that it is appropriate for contrib/ because it is a good demo
of FTI-like capabilities. But nothing more, yet. For at least a couple
of reasons:

1) It generates the "index" as a table, not a PostgreSQL index or
index-like thing.

2) It has a hardcoded list of non-indexed words. This should come from a
table, to allow it to be tuned to the application requirements.

Comments?
                    - Thomas


Re: Full text Indexing -out of contrib and into main..

From
Bruce Momjian
Date:
[ Charset ISO-8859-1 unsupported, converting... ]
>     I modified the FTI trigger for my own use a while ago (indexes whole
> words, eliminates duplicate a few other things) -- I'm not sure if it would
> do anyone any good but you're welcome to it. To whom should I send it?

Is full-word optional or mandatory?  It has to be an option.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: Full text Indexing -out of contrib and into main..

From
"john huttley"
Date:
> > OK, can someone collect suggestions, add the code, and integrate it for
> > 7.1?
> 
> too late in cycle ... 


How about first thing for 7.2 then? While it lies in limbo,
its never going to get the attention it deserves.

Regards




Re: Full text Indexing -out of contrib and into main..

From
"john huttley"
Date:
> I believe that it is appropriate for contrib/ because it is a good demo
> of FTI-like capabilities. But nothing more, yet. For at least a couple
> of reasons:
>
> 1) It generates the "index" as a table, not a PostgreSQL index or
> index-like thing.
>
> 2) It has a hardcoded list of non-indexed words. This should come from a
> table, to allow it to be tuned to the application requirements.
>
> Comments?
>
>                      - Thomas
>

In general..
a) Considering that I was coding up the same thing with triggers and such,
things could only get better.

b) Check out MSSQL 7's  capabilities and weep.

c) It would be a start. One its in the tree, it gets used more, gets
improved..

It would be a while yet before 7.2 starts, plenty of time then to develop
it further.

Regards

John





Re: Full text Indexing -out of contrib and into main..

From
Hannu Krosing
Date:
john huttley wrote:
> 
> > I believe that it is appropriate for contrib/ because it is a good demo
> > of FTI-like capabilities. But nothing more, yet. For at least a couple
> > of reasons:
> >
> > 1) It generates the "index" as a table, not a PostgreSQL index or
> > index-like thing.
> >
> > 2) It has a hardcoded list of non-indexed words. This should come from a
> > table, to allow it to be tuned to the application requirements.
> >
> > Comments?
> >
> >                      - Thomas
> >
> 
> In general..
> a) Considering that I was coding up the same thing with triggers and such,
> things could only get better.

AFAIK, the one in contrib _is_ the same thing coded up with triggers and
such ;)

> b) Check out MSSQL 7's  capabilities and weep.

BTW, have you studied MSSQL enough to tell me if it has a
separate/standalone 
(as a process) fti engine or just another index type.

I have been contemplating about implementing FTI for postgres for some
time and my 
current plan would be to implement a out-of-process fti engine (API +
sample 
implementation, in the spirit of PostgreSQLs extensibility) that could
postpone 
the actual indexing but still help with queries even for not yet fully
indexed stuff.

Will probably need some choreography but essential for high performance.

You generally don't want to wait for all index entries of an inverted
index to be saved.

Also the thing should be more general than the one in contrib , being
able to index 
both fields and full records and support functional indexes.


Is there a way to make PostgresQL optimiser aware of the
selectivity/cost of function, 
so that it can do the right thing for a query like

SELECT * FROM ARTICLES WHERE ADATE BETWEEN YESTERDAY AND TOMORROW   AND ARTICLES.FTI_MATCHES('(CAT & DOG) !
PRESIDENT')

It would be almost automatic if functions could return sets and then be
used like

SELECT * FROM ARTICLE WHERE ADATE BETWEEN YESTERDAY AND TOMORROW   AND ARTICLE_ID = ARTICLE.FTI_MATCHING_IDS('(CAT &
DOG)! PRESIDENT')
 

and somehow the optimiser would know that it can join on the returned
ids but this 
is probably not the case ;)

> c) It would be a start. One its in the tree, it gets used more, gets
> improved..

But, it is not a _real_ full text index, just a postgresql sample
application that 
implements a full text index using an sql database.

----------
Hannu


RE: Full text Indexing -out of contrib and into main..

From
Magnus Hagander
Date:
> > b) Check out MSSQL 7's  capabilities and weep.
> 
> BTW, have you studied MSSQL enough to tell me if it has a
> separate/standalone 
> (as a process) fti engine or just another index type.
It is standalone - separate process, data is stored in separate files (not
in db).

In SQL Server 7.0, you also have to manually update the index. Just updating
the values in the table does *NOT* update the index. (Can be scheduled, of
course, but not live)
In SQL Server 2000 the index can be auto-updated when rows change, but it's
not default.


//Magnus