Thread: First Step to Major Use: Integrated Full-Text Search

First Step to Major Use: Integrated Full-Text Search

From
"C. Filipe Medeiros"
Date:
Hi,

    I've been using Postgres for 3 years now, and as far as I'm
concerned, it blows all other open source (and many commercial) database
systems out of the water. I'm kind of upset at its lack of support by
web hosts, open source projects, and the web developer community-at-large.

    I'm putting together an e-Commerce application framework
specifically designed for web programmers and developers (more
flexibility/easier high-end customization at the expense of being useful
to non-professionals) and I designed it originally with Postgres (though
it does have a data abstraction layer). I realized that the full-text
search support for Postgres would be sketchy and most users might opt
for MySQL instead since (A) their host doesn't support Tsearch2 or
OpenFTS (i.e., the user needs special rights to install custom functions
on Pgsql that most hosts won't allow) or (B) they're great developers...
but not great Sysadmins or DbAdmins and messing with the database server
is out-of-their-league.

    Now, Tsearch2 (which I use primarily) beats MySQL's native search
out of the water (or at least the last version that I used, which didn't
return any results if the total number of results was below a certain
threshold). It's very powerful, customizeable, and relatively easy to
use - or at least easy enough for any developer familiar with SQL.

    I think a major step that the development community behind Pgsql
could take to advance the project would be to take Tsearch2 or OpenFTS
and make one of them (I prefer the Tsearch2 architecture, but then I
haven't used OpenFTS in a while) native to Postgres rather than a contrib.

    What do you think?

       C. Filipe Medeiros
       http://www.nutritionalcenter.com/



Re: First Step to Major Use: Integrated Full-Text Search

From
Peter Eisentraut
Date:
Am Montag, 5. Dezember 2005 20:09 schrieb C. Filipe Medeiros:
>     I think a major step that the development community behind Pgsql
> could take to advance the project would be to take Tsearch2 or OpenFTS
> and make one of them (I prefer the Tsearch2 architecture, but then I
> haven't used OpenFTS in a while) native to Postgres rather than a contrib.

This is certainly a completely misguided solution to your problem.  Imagine
what software packages like Linux, Perl, or Apache would look like today if
the response to every "my hoster has not installed module X" had been "let's
put X in the core distribution".  One could certainly argue about the merits
of full-text search in particular, but that will only postpone your real
problem until next week.

Re: First Step to Major Use: Integrated Full-Text

From
"Joshua D. Drake"
Date:
Peter Eisentraut wrote:
> Am Montag, 5. Dezember 2005 20:09 schrieb C. Filipe Medeiros:
>
>>     I think a major step that the development community behind Pgsql
>> could take to advance the project would be to take Tsearch2 or OpenFTS
>> and make one of them (I prefer the Tsearch2 architecture, but then I
>> haven't used OpenFTS in a while) native to Postgres rather than a contrib.
>>
>
> This is certainly a completely misguided solution to your problem.  Imagine
> what software packages like Linux, Perl, or Apache would look like today if
> the response to every "my hoster has not installed module X" had been "let's
> put X in the core distribution".  One could certainly argue about the merits
> of full-text search in particular, but that will only postpone your real
> problem until next week.
>
Not to mention that Tsearch2 is part of PostgreSQL. If your hoster
doesn't realize the benifits, change
hosting companies ;)... Or ask them to install postgresql-contrib.

Joshua D. Drake



> ---------------------------(end of broadcast)---------------------------
> TIP 1: if posting/reading through Usenet, please send an appropriate
>        subscribe-nomail command to majordomo@postgresql.org so that your
>        message can get through to the mailing list cleanly
>


Re: First Step to Major Use: Integrated Full-Text

From
"C. Filipe Medeiros"
Date:
Hi there,

    All good points, however I want to point out this isn't my personal issue, since I host my own websites. My only problem is that if I license my software out to people, and they have the option of using Postgres or MySQL, they're going to support MySQL because more hosts support MySQL than those who support Postgres WITH a search module enabled. I don't have a problem installing Tsearch2 myself, but I'm not going to responsible for doing it for every single one of my licensees.

I'll break my argument FOR full-text integration in the following few points:
  1. Full-text search has widespread appeal, and in my use of Postgres (Postgres for websites) the need for full-text database search is practically universal. However, it's making less and less sense for me to support Postgres at all because so many of my licensees can't run it with their existing hosts (who run Postgres, but won't support a search module). Full text search in the native distribution would almost certainly bolster Postgres advocacy.

  2. MySQL supports full-text searching (not to mention commercial-brand dbs like SQL Server 2005). Now that MySQL supports transactions as well, there's less reason than ever for web developers to build their applications with Postgres support. Full-text search in Postgres (which as a contrib module is already more functional than MySQL's integrated search) could make a huge difference in database choice. Why be left behind when everyone else is providing this functionality to their users?

  3. Full-text search is just another, sophisticated way of querying a database. I don't see why this can't be integrated and made to be efficient like you would with any other common query.

  4. It may an issue of efficiency vs. bulk, but really it's at least partially an issue of efficiency vs. widespread appeal. It's true that you want to keep the core of any application as powerful as possible with as few unnecessary widgets as possible - but this is something that I think Postgres needs to be viable for consideration in larger licensed web applications.
I posted this first to the the advocacy list because first and foremost I think that's what this is - an advocacy issue. I think that full-text search integration is important to getting the marketspace that's being used by what I think is Postgres's main competitor for web apps - MySQL.

    Development-wise may be another issue entirely.

       - Filipe



Joshua D. Drake wrote:
Peter Eisentraut wrote:
Am Montag, 5. Dezember 2005 20:09 schrieb C. Filipe Medeiros:
 
    I think a major step that the development community behind Pgsql
could take to advance the project would be to take Tsearch2 or OpenFTS
and make one of them (I prefer the Tsearch2 architecture, but then I
haven't used OpenFTS in a while) native to Postgres rather than a contrib.
   

This is certainly a completely misguided solution to your problem.  Imagine what software packages like Linux, Perl, or Apache would look like today if the response to every "my hoster has not installed module X" had been "let's put X in the core distribution".  One could certainly argue about the merits of full-text search in particular, but that will only postpone your real problem until next week.
 
Not to mention that Tsearch2 is part of PostgreSQL. If your hoster doesn't realize the benifits, change
hosting companies ;)... Or ask them to install postgresql-contrib.

Joshua D. Drake



---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
       subscribe-nomail command to majordomo@postgresql.org so that your
       message can get through to the mailing list cleanly
 


---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings


Re: First Step to Major Use: Integrated Full-Text

From
Christopher Kings-Lynne
Date:
>    I think a major step that the development community behind Pgsql
> could take to advance the project would be to take Tsearch2 or OpenFTS
> and make one of them (I prefer the Tsearch2 architecture, but then I
> haven't used OpenFTS in a while) native to Postgres rather than a contrib.

I'm in favour, once tsearch2 supports all our backend encodings and
perhaps is an inverted index :)

Chris


Re: First Step to Major Use: Integrated Full-Text

From
Christopher Kings-Lynne
Date:
>    2. MySQL supports full-text searching (not to mention
>       commercial-brand dbs like SQL Server 2005). Now that MySQL
>       supports transactions as well, there's less reason than ever for
>       web developers to build their applications with Postgres support.
>       Full-text search in Postgres (which as a contrib module is already
>       more functional than MySQL's integrated search) could make a huge
>       difference in database choice. Why be left behind when everyone
>       else is providing this functionality to their users?

Although I agree with your arguments for integration, I should point out
that MySQL's full text indexing and it's transactions are mutually
exclusive.  You can't have both.

Chris


Re: First Step to Major Use: Integrated Full-Text

From
"Joshua D. Drake"
Date:
>
> Although I agree with your arguments for integration, I should point
> out that MySQL's full text indexing and it's transactions are mutually
> exclusive.  You can't have both.
E.g; if you are using full text with MySQL, you are using a bum backend
destined to loose data.

Joshua D. Drake

>
> Chris
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Have you searched our list archives?
>
>               http://archives.postgresql.org


Re: First Step to Major Use: Integrated Full-Text

From
"C. Filipe Medeiros"
Date:
 >> E.g; if you are using full text with MySQL, you are using a bum
backend destined to loose data.

 >> Although I agree with your arguments for integration, I should point
out that MySQL's full text indexing and it's transactions are mutually
exclusive.  You can't have both.

All the more reason to trump it, no? I am curious though. Let's say that
eventually we come to a conclusion regarding Postgres, what steps does
one take to get something like this implemented?

    - Filipe

P.S., if this were implemented, I would be behind an organized
Postgresql adoption campaign. Once it has this, in my book, it has most
of what it needs for most license-distributed web apps.

Re: First Step to Major Use: Integrated Full-Text

From
Christopher Kings-Lynne
Date:
>> Although I agree with your arguments for integration, I should point
>> out that MySQL's full text indexing and it's transactions are mutually
>> exclusive.  You can't have both.
>
> E.g; if you are using full text with MySQL, you are using a bum backend
> destined to loose data.

One of the new things going on in MySQL is this new demo db they're
doing called 'Sakila' (I've ported it to PgSQL, just waiting for license
details).

In it they "solve" the problem by having - wonder of wonders - a
trigger!  It's meant as an example of how the new trigger/stored proc
features of MySQL 5 can maintain a non-transactional full text index
table, and have the main table transactional.

I then asked what happens when the transaction that causes the trigger
to trigger gets rolled back - in that case the FTI table is not rolled
back, so the index is quite out of date.  I thought that was funny :D

Chris


Re: First Step to Major Use: Integrated Full-Text

From
"Marc G. Fournier"
Date:
On Mon, 5 Dec 2005, C. Filipe Medeiros wrote:

> Hi there,
>
>   All good points, however I want to point out this isn't my personal
> issue, since I host my own websites. My only problem is that if I
> license my software out to people, and they have the option of using
> Postgres or MySQL, they're going to support MySQL because more hosts
> support MySQL than those who support Postgres WITH a search module
> enabled. I don't have a problem installing Tsearch2 myself, but I'm not
> going to responsible for doing it for every single one of my licensees.

This falls under "make a list of recommended hosting companies" for your
clients ... there are alot of companies out there that offer PostgreSQL
hosting ... I don't know how flexible they are to adding modules, but I
know that we (http://www.hub.org) definitely do, and I believe that
Joshua's company (http://www.commandprompt.com) does as well ... that was
my big buy-into PostgreSQL back in '95, the fact that you could extend
things through modules ... seems silly to provide PostgreSQL and not be
willing to make use of that big feature ...

----
Marc G. Fournier           Hub.Org Networking Services (http://www.hub.org)
Email: scrappy@hub.org           Yahoo!: yscrappy              ICQ: 7615664

Re: First Step to Major Use: Integrated Full-Text

From
Oleg Bartunov
Date:
Filipe,

integrating of tsearch2 into PostgreSQL core is one of our major task.
We have rather big TODO for tsearch2
http://www.sai.msu.su/~megera/oddmuse/index.cgi/Tsearch2_TODO
We hope to complete it for next release (8.2). It depends only from
sponsoring. UTF8 compatibility is underway and some code was already submitted
to CVS.

     Oleg
On Mon, 5 Dec 2005, C. Filipe Medeiros wrote:

> Hi,
>
>   I've been using Postgres for 3 years now, and as far as I'm concerned, it
> blows all other open source (and many commercial) database systems out of the
> water. I'm kind of upset at its lack of support by web hosts, open source
> projects, and the web developer community-at-large.
>
>   I'm putting together an e-Commerce application framework specifically
> designed for web programmers and developers (more flexibility/easier high-end
> customization at the expense of being useful to non-professionals) and I
> designed it originally with Postgres (though it does have a data abstraction
> layer). I realized that the full-text search support for Postgres would be
> sketchy and most users might opt for MySQL instead since (A) their host
> doesn't support Tsearch2 or OpenFTS (i.e., the user needs special rights to
> install custom functions on Pgsql that most hosts won't allow) or (B) they're
> great developers... but not great Sysadmins or DbAdmins and messing with the
> database server is out-of-their-league.
>
>   Now, Tsearch2 (which I use primarily) beats MySQL's native search out of
> the water (or at least the last version that I used, which didn't return any
> results if the total number of results was below a certain threshold). It's
> very powerful, customizeable, and relatively easy to use - or at least easy
> enough for any developer familiar with SQL.
>
>   I think a major step that the development community behind Pgsql could
> take to advance the project would be to take Tsearch2 or OpenFTS and make one
> of them (I prefer the Tsearch2 architecture, but then I haven't used OpenFTS
> in a while) native to Postgres rather than a contrib.
>
>   What do you think?
>
>      C. Filipe Medeiros
>      http://www.nutritionalcenter.com/
>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 5: don't forget to increase your free space map settings
>

     Regards,
         Oleg
_____________________________________________________________
Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
Sternberg Astronomical Institute, Moscow University (Russia)
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(095)939-16-83, +007(095)939-23-83

Re: First Step to Major Use: Integrated Full-Text

From
Josh Berkus
Date:
Felipe,

Chris has the serious answer to your question:
> I'm in favour, once tsearch2 supports all our backend encodings and
> perhaps is an inverted index

Basically there are technical deficiencies in Tsearch2 which prevent it
from being integrated into the main backend.

I think the answer to this is for someone to build an "ISP Postgres"
distribution which includes optional packages that most ISPs want (like
replication, pgcrypto and tsearch).

--
--Josh

Josh Berkus
Aglio Database Solutions
San Francisco

Re: First Step to Major Use: Integrated Full-Text

From
Jan Wieck
Date:
On 12/5/2005 7:05 PM, C. Filipe Medeiros wrote:

> Hi there,
>
>     All good points, however I want to point out this isn't my personal
> issue, since I host my own websites. My only problem is that if I
> license my software out to people, and they have the option of using
> Postgres or MySQL, they're going to support MySQL because more hosts

Point 1:

Unless you license "your software" under GPL (or some other free open
source license), there is a difference in price tag between MySQL and
PostgreSQL.

Point 2:

As someone else already mentioned, InnoDB currently has not fulltext
search support. And as we just a week ago learned, MySQL is about to
switch to an alternative storage engine anyway in response to the
Innobase purchase by Oracle. So I would not count on the proposed
fulltext search support for InnoDB before those plans actually get a
little more shape than the handwaving done so far.


Jan

--
#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.                                  #
#================================================== JanWieck@Yahoo.com #

Re: First Step to Major Use: Integrated Full-Text

From
Neil Conway
Date:
On Mon, 2005-12-05 at 18:24 -0800, Josh Berkus wrote:
> Basically there are technical deficiencies in Tsearch2 which prevent it
> from being integrated into the main backend.

Such as?

-Neil



Re: First Step to Major Use: Integrated Full-Text

From
"Joshua D. Drake"
Date:
>
> Such as?
>
>
Well previous to 8.0 it was really slow, prior to 8.1 it was not
MVCC/WAL? safe. I believe there
are some general limitations that are being resolved right now for 8.2
but I don't recall what they are.

Joshua D. Drake

> -Neil
>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 9: In versions below 8.0, the planner will ignore your desire to
>        choose an index scan if your joining column's datatypes do not
>        match
>


Re: First Step to Major Use: Integrated Full-Text

From
Oleg Bartunov
Date:
On Wed, 7 Dec 2005, Joshua D. Drake wrote:

>>
>> Such as?
>>
>>
> Well previous to 8.0 it was really slow, prior to 8.1 it was not MVCC/WAL?
> safe. I believe there
> are some general limitations that are being resolved right now for 8.2 but I
> don't recall what they are.

Main obstacle is an UTF8 support we're working right now. Parser code is
already in CVS. We need to work on dictionaries. Also, we'd like to have
inverted index support before integration to the core.

>
> Joshua D. Drake
>
>> -Neil
>>
>>
>>
>> ---------------------------(end of broadcast)---------------------------
>> TIP 9: In versions below 8.0, the planner will ignore your desire to
>>        choose an index scan if your joining column's datatypes do not
>>        match
>>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 3: Have you checked our extensive FAQ?
>
>              http://www.postgresql.org/docs/faq
>

     Regards,
         Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83