Thread: Full text Indexing -out of contrib and into main..
> > Maybe asking 'Why isn't the contrib full-text-indexer not in the main > tree?' would be more productive on that front. Well, yes. Why isn't it? Full text indexing should be just as much a feature as any other key feature in PG. With the advent of unlimited file and record lengths in 7.1, this would be a good time to include it. FTI is particularly useful in the context of web content engines. > How did you attempt to build it under the RPM install? I assume you had > the postgresql-devel package installed, and the include paths set > properly.... Of course, most of what is in contrib assumes a full source > tree is lying around (argh)....ie, it's wanting to include > Makefile.global in its Makefile. And, on the RPM dist, Makefile.global > isn't (yet) packaged. Yes, I have the devel RPM, but FTI couldn't find its include files. Or its libraries. Or something. Building from a source tree has always been better for me. The catch is mixing that with the RPMS, which put things in unholy locations. Its very hard (for me at least) to update an RPM version with a version compiled from the source. I recently updated my PG system from 6.5.3 to 7.0.3, still RPMs, (another fun job) and have not tried to compile FTI subsequently. However if I tried hard enough I'm sure I could fix it. For the moment, I'm on another job so I'm not worrying. Regards John
At 02:51 PM 11/28/00 +1300, John Huttley wrote: >> >> Maybe asking 'Why isn't the contrib full-text-indexer not in the main >> tree?' would be more productive on that front. > >Well, yes. Why isn't it? > >Full text indexing should be just as much a feature as any other key feature in >PG. >With the advent of unlimited file and record lengths in 7.1, this would be a good >time to >include it. > >FTI is particularly useful in the context of web content engines. Well ... it's pretty inadequate, actually. That might be one reason it's only in contrib. - Don Baccus, Portland OR <dhogaza@pacifier.com> Nature photos, on-line guides, Pacific Northwest Rare Bird Alert Serviceand other goodies at http://donb.photo.net.
John Huttley wrote: > > Maybe asking 'Why isn't the contrib full-text-indexer not in the main > > tree?' would be more productive on that front. > Well, yes. Why isn't it? I'm hoping to see the answer to that one myself, as that is outside my scope currently. I just RPMize things... Although, I didn't intend for my statement to appear as harsh as it does -- for that I apologize. > Yes, I have the devel RPM, but FTI couldn't find its include files. Or its > libraries. Or something. Makefile.global. Tried it here. The RPMset hasn't heretofore needed Makefile.global. I may package that, amongst other stuff necessary to build certain things in the devel package -- once I find out how to go about doing it. > Building from a source tree has always been better for me. The catch is mixing > that with the RPMS, > which put things in unholy locations. > Its very hard (for me at least) to update an RPM version with a version compiled > from the source. I recommend completely removing the RPM version and installing from source rather than trying to upgrade from an RPM distribution to the from-source distribution. Or just install the next RPM version. Let RPM work the headaches for you. If you want to run from a 'from-source' build, then nix the RPMset altogether and don't worry about it afterward. Although I am going to consider a pre-built set of contribs -- most notably, geodistance is likely to find its way into an RPM in the future. I just haven't decided whether to split out to individual contribs or to just make a single 'postgresql-contrib' subpackage. I'm open to suggestions. You can, however, install a source-tree preconfigured and built for the RPM modifications by installing the _source_ RPM, and then issuing, as root, 'rpm -bi postgresql.spec' from within the /usr/src/redhat/SPECS dir. You will then have a source tree primed for building whatever in /usr/src/redhat/BUILD/postgresql-x.y.z (where x.y.z is the version, of course). You will need python-devel installed in order to do that, however. HTH. -- Lamar Owen WGCR Internet Radio 1 Peter 4:11
> >Well, yes. Why isn't it? > > > >Full text indexing should be just as much a feature as any other key feature in > >PG. > >With the advent of unlimited file and record lengths in 7.1, this would be a good > >time to > >include it. > > > >FTI is particularly useful in the context of web content engines. > > Well ... it's pretty inadequate, actually. That might be one reason it's only > in contrib. OK, can someone collect suggestions, add the code, and integrate it for 7.1? -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
On Mon, 27 Nov 2000, Bruce Momjian wrote: > > >Well, yes. Why isn't it? > > > > > >Full text indexing should be just as much a feature as any other key feature in > > >PG. > > >With the advent of unlimited file and record lengths in 7.1, this would be a good > > >time to > > >include it. > > > > > >FTI is particularly useful in the context of web content engines. > > > > Well ... it's pretty inadequate, actually. That might be one reason it's only > > in contrib. > > OK, can someone collect suggestions, add the code, and integrate it for > 7.1? too late in cycle ...
I modified the FTI trigger for my own use a while ago (indexes whole words, eliminates duplicate a few other things) -- I'm not sure if it would do anyone any good but you're welcome to it. To whom should I send it? -Mitch ----- Original Message ----- From: "The Hermit Hacker" <scrappy@hub.org> To: "Bruce Momjian" <pgman@candle.pha.pa.us> Cc: "Don Baccus" <dhogaza@pacifier.com>; "John Huttley" <John@mwk.co.nz>; <pgsql-hackers@postgresql.org> Sent: Monday, November 27, 2000 7:06 PM Subject: Re: [HACKERS] Full text Indexing -out of contrib and into main.. > On Mon, 27 Nov 2000, Bruce Momjian wrote: > > > > >Well, yes. Why isn't it? > > > > > > > >Full text indexing should be just as much a feature as any other key feature in > > > >PG. > > > >With the advent of unlimited file and record lengths in 7.1, this would be a good > > > >time to > > > >include it. > > > > > > > >FTI is particularly useful in the context of web content engines. > > > > > > Well ... it's pretty inadequate, actually. That might be one reason it's only > > > in contrib. > > > > OK, can someone collect suggestions, add the code, and integrate it for > > 7.1? > > too late in cycle ... > > >
At 11:06 PM 11/27/00 -0400, The Hermit Hacker wrote: >On Mon, 27 Nov 2000, Bruce Momjian wrote: >> OK, can someone collect suggestions, add the code, and integrate it for >> 7.1? > >too late in cycle ... Yes... - Don Baccus, Portland OR <dhogaza@pacifier.com> Nature photos, on-line guides, Pacific Northwest Rare Bird Alert Serviceand other goodies at http://donb.photo.net.
> > > Maybe asking 'Why isn't the contrib full-text-indexer not in the main > > > tree?' would be more productive on that front. > > Well, yes. Why isn't it? I believe that it is appropriate for contrib/ because it is a good demo of FTI-like capabilities. But nothing more, yet. For at least a couple of reasons: 1) It generates the "index" as a table, not a PostgreSQL index or index-like thing. 2) It has a hardcoded list of non-indexed words. This should come from a table, to allow it to be tuned to the application requirements. Comments? - Thomas
[ Charset ISO-8859-1 unsupported, converting... ] > I modified the FTI trigger for my own use a while ago (indexes whole > words, eliminates duplicate a few other things) -- I'm not sure if it would > do anyone any good but you're welcome to it. To whom should I send it? Is full-word optional or mandatory? It has to be an option. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
> > OK, can someone collect suggestions, add the code, and integrate it for > > 7.1? > > too late in cycle ... How about first thing for 7.2 then? While it lies in limbo, its never going to get the attention it deserves. Regards
> I believe that it is appropriate for contrib/ because it is a good demo > of FTI-like capabilities. But nothing more, yet. For at least a couple > of reasons: > > 1) It generates the "index" as a table, not a PostgreSQL index or > index-like thing. > > 2) It has a hardcoded list of non-indexed words. This should come from a > table, to allow it to be tuned to the application requirements. > > Comments? > > - Thomas > In general.. a) Considering that I was coding up the same thing with triggers and such, things could only get better. b) Check out MSSQL 7's capabilities and weep. c) It would be a start. One its in the tree, it gets used more, gets improved.. It would be a while yet before 7.2 starts, plenty of time then to develop it further. Regards John
john huttley wrote: > > > I believe that it is appropriate for contrib/ because it is a good demo > > of FTI-like capabilities. But nothing more, yet. For at least a couple > > of reasons: > > > > 1) It generates the "index" as a table, not a PostgreSQL index or > > index-like thing. > > > > 2) It has a hardcoded list of non-indexed words. This should come from a > > table, to allow it to be tuned to the application requirements. > > > > Comments? > > > > - Thomas > > > > In general.. > a) Considering that I was coding up the same thing with triggers and such, > things could only get better. AFAIK, the one in contrib _is_ the same thing coded up with triggers and such ;) > b) Check out MSSQL 7's capabilities and weep. BTW, have you studied MSSQL enough to tell me if it has a separate/standalone (as a process) fti engine or just another index type. I have been contemplating about implementing FTI for postgres for some time and my current plan would be to implement a out-of-process fti engine (API + sample implementation, in the spirit of PostgreSQLs extensibility) that could postpone the actual indexing but still help with queries even for not yet fully indexed stuff. Will probably need some choreography but essential for high performance. You generally don't want to wait for all index entries of an inverted index to be saved. Also the thing should be more general than the one in contrib , being able to index both fields and full records and support functional indexes. Is there a way to make PostgresQL optimiser aware of the selectivity/cost of function, so that it can do the right thing for a query like SELECT * FROM ARTICLES WHERE ADATE BETWEEN YESTERDAY AND TOMORROW AND ARTICLES.FTI_MATCHES('(CAT & DOG) ! PRESIDENT') It would be almost automatic if functions could return sets and then be used like SELECT * FROM ARTICLE WHERE ADATE BETWEEN YESTERDAY AND TOMORROW AND ARTICLE_ID = ARTICLE.FTI_MATCHING_IDS('(CAT & DOG)! PRESIDENT') and somehow the optimiser would know that it can join on the returned ids but this is probably not the case ;) > c) It would be a start. One its in the tree, it gets used more, gets > improved.. But, it is not a _real_ full text index, just a postgresql sample application that implements a full text index using an sql database. ---------- Hannu
> > b) Check out MSSQL 7's capabilities and weep. > > BTW, have you studied MSSQL enough to tell me if it has a > separate/standalone > (as a process) fti engine or just another index type. It is standalone - separate process, data is stored in separate files (not in db). In SQL Server 7.0, you also have to manually update the index. Just updating the values in the table does *NOT* update the index. (Can be scheduled, of course, but not live) In SQL Server 2000 the index can be auto-updated when rows change, but it's not default. //Magnus