Re: Please comment on the following OpenFTS/tsearch2 issues! - Mailing list pgsql-general

From Teodor Sigaev
Subject Re: Please comment on the following OpenFTS/tsearch2 issues!
Date
Msg-id 444F1E86.3070008@sigaev.ru
Whole thread Raw
In response to Please comment on the following OpenFTS/tsearch2 issues!  ("Don Walker" <don.walker@versaterm.com>)
Responses Re: Please comment on the following OpenFTS/tsearch2 issues!
List pgsql-general
> 1. While tsearch2 provides fairly complete boolean search expression support
> with AND - &, OR - |, NOT - !, and grouping - (), OpenFTS appears to only
> have support for ANDing search terms. Is there some reason it hasn't been
> extended to support full tsearch2 search expressions? Has anyone modified
> OpenFTS to do this?

Historical and simplification. No more.
We didn't modify OpenFTS... People often asks us about conversation text ->
tsquery, so, in 8.2 will be plainto_tsquery() returning the same result as
OpenFTS query parser.



> 2. Neither OpenFTS or tsearch2 support exact phrase matching. I've seen the
> workaround to support matching a single exact phrase by modifying the WHERE
> clause with textcolumn ~* "exact phrase". Does this give reasonable
> performance? Has anyone implemented exact phrase matching in complex search
> expressions like ("exact phrase1" AND term1) OR (NOT "exact phrase2" AND
> "exact phrase3") ?

We didn't plan to develop phrase search unless we have clean idea to support
complex query and compound words, look discussion at
http://www.pgsql.ru/db/mw/msg.html?mid=2111601

>
> 3. The following summarizes what I've read about performance and scalability
> of OpenFTS and/or tsearch2:
>
> a) don't expect OpenFTS/tsearch2 to perform/scale as well as dedicated
> search engines like Lucene, http://lucene.apache.org/,
> http://archives.postgresql.org/pgsql-general/2002-05/msg01156.php.
Yes, GiST index is good for online update, but has problem with big sets.
We plan to add to 8.2 inverted index with which tsearch2 will work with
comparable speed with Lucene...

First version was already published, look for announce :)


> b) OR queries are slower than AND queries,
> http://www.sai.msu.su/~megera/postgres/gist/tsearch/V2/docs/oscon_tsearch2/o
> ptimization.html.

Yes

> Do you agree with this summary? If you are using either OpenFTS or tsearch2
> in production, has the performance been acceptable? For my application I
> could be looking at several million documents averaging about 3 pages each
> (I only have ballpark figures at present).

We knows installation of tsearch2 working with 4 millions docs.

>
> 4. If you are using either OpenFTS or tsearch2 in production why did you
> choose OpenFTS over tsearch2 or vice versa? One of the advantages of
> tsearch2 that I can see is that, once you have setup your database and
> indexed your documents, you can talk to the database directly from your
> application using SQL without needing to go through Perl first. This assumes
> that you're ok with tsearch2 search expression syntax so you can use
> functions like to_tsquery. It also assumes that you don't need sophisticated
> exact phrase matching.

OpenFTS may work on another box than pgsql, OpenFTS may index file directly from
file system.

>
> 5. Are there any scripts, tools, add-ons, etc. that you can recommend?

We can tweak OpenFTS/tsearch2 for you.

--
Teodor Sigaev                                   E-mail: teodor@sigaev.ru
                                                    WWW: http://www.sigaev.ru/

pgsql-general by date:

Previous
From: "chris smith"
Date:
Subject: Re: Problem with complex outer join expression
Next
From: Richard Huxton
Date:
Subject: Re: how can I check the error status??