Thread: First Step to Major Use: Integrated Full-Text Search
Hi, I've been using Postgres for 3 years now, and as far as I'm concerned, it blows all other open source (and many commercial) database systems out of the water. I'm kind of upset at its lack of support by web hosts, open source projects, and the web developer community-at-large. I'm putting together an e-Commerce application framework specifically designed for web programmers and developers (more flexibility/easier high-end customization at the expense of being useful to non-professionals) and I designed it originally with Postgres (though it does have a data abstraction layer). I realized that the full-text search support for Postgres would be sketchy and most users might opt for MySQL instead since (A) their host doesn't support Tsearch2 or OpenFTS (i.e., the user needs special rights to install custom functions on Pgsql that most hosts won't allow) or (B) they're great developers... but not great Sysadmins or DbAdmins and messing with the database server is out-of-their-league. Now, Tsearch2 (which I use primarily) beats MySQL's native search out of the water (or at least the last version that I used, which didn't return any results if the total number of results was below a certain threshold). It's very powerful, customizeable, and relatively easy to use - or at least easy enough for any developer familiar with SQL. I think a major step that the development community behind Pgsql could take to advance the project would be to take Tsearch2 or OpenFTS and make one of them (I prefer the Tsearch2 architecture, but then I haven't used OpenFTS in a while) native to Postgres rather than a contrib. What do you think? C. Filipe Medeiros http://www.nutritionalcenter.com/
Am Montag, 5. Dezember 2005 20:09 schrieb C. Filipe Medeiros: > I think a major step that the development community behind Pgsql > could take to advance the project would be to take Tsearch2 or OpenFTS > and make one of them (I prefer the Tsearch2 architecture, but then I > haven't used OpenFTS in a while) native to Postgres rather than a contrib. This is certainly a completely misguided solution to your problem. Imagine what software packages like Linux, Perl, or Apache would look like today if the response to every "my hoster has not installed module X" had been "let's put X in the core distribution". One could certainly argue about the merits of full-text search in particular, but that will only postpone your real problem until next week.
Peter Eisentraut wrote: > Am Montag, 5. Dezember 2005 20:09 schrieb C. Filipe Medeiros: > >> I think a major step that the development community behind Pgsql >> could take to advance the project would be to take Tsearch2 or OpenFTS >> and make one of them (I prefer the Tsearch2 architecture, but then I >> haven't used OpenFTS in a while) native to Postgres rather than a contrib. >> > > This is certainly a completely misguided solution to your problem. Imagine > what software packages like Linux, Perl, or Apache would look like today if > the response to every "my hoster has not installed module X" had been "let's > put X in the core distribution". One could certainly argue about the merits > of full-text search in particular, but that will only postpone your real > problem until next week. > Not to mention that Tsearch2 is part of PostgreSQL. If your hoster doesn't realize the benifits, change hosting companies ;)... Or ask them to install postgresql-contrib. Joshua D. Drake > ---------------------------(end of broadcast)--------------------------- > TIP 1: if posting/reading through Usenet, please send an appropriate > subscribe-nomail command to majordomo@postgresql.org so that your > message can get through to the mailing list cleanly >
Hi there,
All good points, however I want to point out this isn't my personal issue, since I host my own websites. My only problem is that if I license my software out to people, and they have the option of using Postgres or MySQL, they're going to support MySQL because more hosts support MySQL than those who support Postgres WITH a search module enabled. I don't have a problem installing Tsearch2 myself, but I'm not going to responsible for doing it for every single one of my licensees.
I'll break my argument FOR full-text integration in the following few points:
Development-wise may be another issue entirely.
- Filipe
Joshua D. Drake wrote:
All good points, however I want to point out this isn't my personal issue, since I host my own websites. My only problem is that if I license my software out to people, and they have the option of using Postgres or MySQL, they're going to support MySQL because more hosts support MySQL than those who support Postgres WITH a search module enabled. I don't have a problem installing Tsearch2 myself, but I'm not going to responsible for doing it for every single one of my licensees.
I'll break my argument FOR full-text integration in the following few points:
- Full-text search has widespread appeal, and in my use of Postgres (Postgres for websites) the need for full-text database search is practically universal. However, it's making less and less sense for me to support Postgres at all because so many of my licensees can't run it with their existing hosts (who run Postgres, but won't support a search module). Full text search in the native distribution would almost certainly bolster Postgres advocacy.
- MySQL supports full-text searching (not to mention commercial-brand dbs like SQL Server 2005). Now that MySQL supports transactions as well, there's less reason than ever for web developers to build their applications with Postgres support. Full-text search in Postgres (which as a contrib module is already more functional than MySQL's integrated search) could make a huge difference in database choice. Why be left behind when everyone else is providing this functionality to their users?
- Full-text search is just another, sophisticated way of querying a database. I don't see why this can't be integrated and made to be efficient like you would with any other common query.
- It may an issue of efficiency vs. bulk, but really it's at least partially an issue of efficiency vs. widespread appeal. It's true that you want to keep the core of any application as powerful as possible with as few unnecessary widgets as possible - but this is something that I think Postgres needs to be viable for consideration in larger licensed web applications.
Development-wise may be another issue entirely.
- Filipe
Joshua D. Drake wrote:
Peter Eisentraut wrote:Am Montag, 5. Dezember 2005 20:09 schrieb C. Filipe Medeiros:Not to mention that Tsearch2 is part of PostgreSQL. If your hoster doesn't realize the benifits, change
I think a major step that the development community behind Pgsql
could take to advance the project would be to take Tsearch2 or OpenFTS
and make one of them (I prefer the Tsearch2 architecture, but then I
haven't used OpenFTS in a while) native to Postgres rather than a contrib.
This is certainly a completely misguided solution to your problem. Imagine what software packages like Linux, Perl, or Apache would look like today if the response to every "my hoster has not installed module X" had been "let's put X in the core distribution". One could certainly argue about the merits of full-text search in particular, but that will only postpone your real problem until next week.
hosting companies ;)... Or ask them to install postgresql-contrib.
Joshua D. Drake---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql.org so that your
message can get through to the mailing list cleanly
---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings
> I think a major step that the development community behind Pgsql > could take to advance the project would be to take Tsearch2 or OpenFTS > and make one of them (I prefer the Tsearch2 architecture, but then I > haven't used OpenFTS in a while) native to Postgres rather than a contrib. I'm in favour, once tsearch2 supports all our backend encodings and perhaps is an inverted index :) Chris
> 2. MySQL supports full-text searching (not to mention > commercial-brand dbs like SQL Server 2005). Now that MySQL > supports transactions as well, there's less reason than ever for > web developers to build their applications with Postgres support. > Full-text search in Postgres (which as a contrib module is already > more functional than MySQL's integrated search) could make a huge > difference in database choice. Why be left behind when everyone > else is providing this functionality to their users? Although I agree with your arguments for integration, I should point out that MySQL's full text indexing and it's transactions are mutually exclusive. You can't have both. Chris
> > Although I agree with your arguments for integration, I should point > out that MySQL's full text indexing and it's transactions are mutually > exclusive. You can't have both. E.g; if you are using full text with MySQL, you are using a bum backend destined to loose data. Joshua D. Drake > > Chris > > > ---------------------------(end of broadcast)--------------------------- > TIP 4: Have you searched our list archives? > > http://archives.postgresql.org
>> E.g; if you are using full text with MySQL, you are using a bum backend destined to loose data. >> Although I agree with your arguments for integration, I should point out that MySQL's full text indexing and it's transactions are mutually exclusive. You can't have both. All the more reason to trump it, no? I am curious though. Let's say that eventually we come to a conclusion regarding Postgres, what steps does one take to get something like this implemented? - Filipe P.S., if this were implemented, I would be behind an organized Postgresql adoption campaign. Once it has this, in my book, it has most of what it needs for most license-distributed web apps.
>> Although I agree with your arguments for integration, I should point >> out that MySQL's full text indexing and it's transactions are mutually >> exclusive. You can't have both. > > E.g; if you are using full text with MySQL, you are using a bum backend > destined to loose data. One of the new things going on in MySQL is this new demo db they're doing called 'Sakila' (I've ported it to PgSQL, just waiting for license details). In it they "solve" the problem by having - wonder of wonders - a trigger! It's meant as an example of how the new trigger/stored proc features of MySQL 5 can maintain a non-transactional full text index table, and have the main table transactional. I then asked what happens when the transaction that causes the trigger to trigger gets rolled back - in that case the FTI table is not rolled back, so the index is quite out of date. I thought that was funny :D Chris
On Mon, 5 Dec 2005, C. Filipe Medeiros wrote: > Hi there, > > All good points, however I want to point out this isn't my personal > issue, since I host my own websites. My only problem is that if I > license my software out to people, and they have the option of using > Postgres or MySQL, they're going to support MySQL because more hosts > support MySQL than those who support Postgres WITH a search module > enabled. I don't have a problem installing Tsearch2 myself, but I'm not > going to responsible for doing it for every single one of my licensees. This falls under "make a list of recommended hosting companies" for your clients ... there are alot of companies out there that offer PostgreSQL hosting ... I don't know how flexible they are to adding modules, but I know that we (http://www.hub.org) definitely do, and I believe that Joshua's company (http://www.commandprompt.com) does as well ... that was my big buy-into PostgreSQL back in '95, the fact that you could extend things through modules ... seems silly to provide PostgreSQL and not be willing to make use of that big feature ... ---- Marc G. Fournier Hub.Org Networking Services (http://www.hub.org) Email: scrappy@hub.org Yahoo!: yscrappy ICQ: 7615664
Filipe, integrating of tsearch2 into PostgreSQL core is one of our major task. We have rather big TODO for tsearch2 http://www.sai.msu.su/~megera/oddmuse/index.cgi/Tsearch2_TODO We hope to complete it for next release (8.2). It depends only from sponsoring. UTF8 compatibility is underway and some code was already submitted to CVS. Oleg On Mon, 5 Dec 2005, C. Filipe Medeiros wrote: > Hi, > > I've been using Postgres for 3 years now, and as far as I'm concerned, it > blows all other open source (and many commercial) database systems out of the > water. I'm kind of upset at its lack of support by web hosts, open source > projects, and the web developer community-at-large. > > I'm putting together an e-Commerce application framework specifically > designed for web programmers and developers (more flexibility/easier high-end > customization at the expense of being useful to non-professionals) and I > designed it originally with Postgres (though it does have a data abstraction > layer). I realized that the full-text search support for Postgres would be > sketchy and most users might opt for MySQL instead since (A) their host > doesn't support Tsearch2 or OpenFTS (i.e., the user needs special rights to > install custom functions on Pgsql that most hosts won't allow) or (B) they're > great developers... but not great Sysadmins or DbAdmins and messing with the > database server is out-of-their-league. > > Now, Tsearch2 (which I use primarily) beats MySQL's native search out of > the water (or at least the last version that I used, which didn't return any > results if the total number of results was below a certain threshold). It's > very powerful, customizeable, and relatively easy to use - or at least easy > enough for any developer familiar with SQL. > > I think a major step that the development community behind Pgsql could > take to advance the project would be to take Tsearch2 or OpenFTS and make one > of them (I prefer the Tsearch2 architecture, but then I haven't used OpenFTS > in a while) native to Postgres rather than a contrib. > > What do you think? > > C. Filipe Medeiros > http://www.nutritionalcenter.com/ > > > > ---------------------------(end of broadcast)--------------------------- > TIP 5: don't forget to increase your free space map settings > Regards, Oleg _____________________________________________________________ Oleg Bartunov, sci.researcher, hostmaster of AstroNet, Sternberg Astronomical Institute, Moscow University (Russia) Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ phone: +007(095)939-16-83, +007(095)939-23-83
Felipe, Chris has the serious answer to your question: > I'm in favour, once tsearch2 supports all our backend encodings and > perhaps is an inverted index Basically there are technical deficiencies in Tsearch2 which prevent it from being integrated into the main backend. I think the answer to this is for someone to build an "ISP Postgres" distribution which includes optional packages that most ISPs want (like replication, pgcrypto and tsearch). -- --Josh Josh Berkus Aglio Database Solutions San Francisco
On 12/5/2005 7:05 PM, C. Filipe Medeiros wrote: > Hi there, > > All good points, however I want to point out this isn't my personal > issue, since I host my own websites. My only problem is that if I > license my software out to people, and they have the option of using > Postgres or MySQL, they're going to support MySQL because more hosts Point 1: Unless you license "your software" under GPL (or some other free open source license), there is a difference in price tag between MySQL and PostgreSQL. Point 2: As someone else already mentioned, InnoDB currently has not fulltext search support. And as we just a week ago learned, MySQL is about to switch to an alternative storage engine anyway in response to the Innobase purchase by Oracle. So I would not count on the proposed fulltext search support for InnoDB before those plans actually get a little more shape than the handwaving done so far. Jan -- #======================================================================# # It's easier to get forgiveness for being wrong than for being right. # # Let's break this rule - forgive me. # #================================================== JanWieck@Yahoo.com #
On Mon, 2005-12-05 at 18:24 -0800, Josh Berkus wrote: > Basically there are technical deficiencies in Tsearch2 which prevent it > from being integrated into the main backend. Such as? -Neil
> > Such as? > > Well previous to 8.0 it was really slow, prior to 8.1 it was not MVCC/WAL? safe. I believe there are some general limitations that are being resolved right now for 8.2 but I don't recall what they are. Joshua D. Drake > -Neil > > > > ---------------------------(end of broadcast)--------------------------- > TIP 9: In versions below 8.0, the planner will ignore your desire to > choose an index scan if your joining column's datatypes do not > match >
On Wed, 7 Dec 2005, Joshua D. Drake wrote: >> >> Such as? >> >> > Well previous to 8.0 it was really slow, prior to 8.1 it was not MVCC/WAL? > safe. I believe there > are some general limitations that are being resolved right now for 8.2 but I > don't recall what they are. Main obstacle is an UTF8 support we're working right now. Parser code is already in CVS. We need to work on dictionaries. Also, we'd like to have inverted index support before integration to the core. > > Joshua D. Drake > >> -Neil >> >> >> >> ---------------------------(end of broadcast)--------------------------- >> TIP 9: In versions below 8.0, the planner will ignore your desire to >> choose an index scan if your joining column's datatypes do not >> match >> > > > ---------------------------(end of broadcast)--------------------------- > TIP 3: Have you checked our extensive FAQ? > > http://www.postgresql.org/docs/faq > Regards, Oleg _____________________________________________________________ Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru), Sternberg Astronomical Institute, Moscow University, Russia Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ phone: +007(495)939-16-83, +007(495)939-23-83