Please comment on the following OpenFTS/tsearch2 issues! - Mailing list pgsql-general

From Don Walker
Subject Please comment on the following OpenFTS/tsearch2 issues!
Date
Msg-id 001101c668a0$c26d8010$dbd849c6@donxp
Whole thread Raw
Responses Re: Please comment on the following OpenFTS/tsearch2 issues!
Re: Please comment on the following OpenFTS/tsearch2 issues!
List pgsql-general
This topic was originally posted to the OpenFTS-general list on April 24,
2006. There were no replies in about 22 hours so I'm reposting to this more
active list.

I'm investigating OpenFTS and tsearch2 to see if they provide enough
full-text searching features to be used in a new application. I've run into
a number of issues that I would appreciate feedback/comments/workarounds on.

1. While tsearch2 provides fairly complete boolean search expression support
with AND - &, OR - |, NOT - !, and grouping - (), OpenFTS appears to only
have support for ANDing search terms. Is there some reason it hasn't been
extended to support full tsearch2 search expressions? Has anyone modified
OpenFTS to do this?

2. Neither OpenFTS or tsearch2 support exact phrase matching. I've seen the
workaround to support matching a single exact phrase by modifying the WHERE
clause with textcolumn ~* "exact phrase". Does this give reasonable
performance? Has anyone implemented exact phrase matching in complex search
expressions like ("exact phrase1" AND term1) OR (NOT "exact phrase2" AND
"exact phrase3") ?

3. The following summarizes what I've read about performance and scalability
of OpenFTS and/or tsearch2:

a) don't expect OpenFTS/tsearch2 to perform/scale as well as dedicated
search engines like Lucene, http://lucene.apache.org/,
http://archives.postgresql.org/pgsql-general/2002-05/msg01156.php.

b) OR queries are slower than AND queries,
http://www.sai.msu.su/~megera/postgres/gist/tsearch/V2/docs/oscon_tsearch2/o
ptimization.html.

c) the design trade-offs favor online indexing instead of search
performance/scalability - see Full text search engine section in
http://www.sai.msu.su/~megera/oddmuse/index.cgi/todo

d) there are a number of things you can do to improve performance - see the
thread starting at
http://sourceforge.net/mailarchive/message.php?msg_id=11444008.

Do you agree with this summary? If you are using either OpenFTS or tsearch2
in production, has the performance been acceptable? For my application I
could be looking at several million documents averaging about 3 pages each
(I only have ballpark figures at present).

4. If you are using either OpenFTS or tsearch2 in production why did you
choose OpenFTS over tsearch2 or vice versa? One of the advantages of
tsearch2 that I can see is that, once you have setup your database and
indexed your documents, you can talk to the database directly from your
application using SQL without needing to go through Perl first. This assumes
that you're ok with tsearch2 search expression syntax so you can use
functions like to_tsquery. It also assumes that you don't need sophisticated
exact phrase matching.

5. Are there any scripts, tools, add-ons, etc. that you can recommend?


pgsql-general by date:

Previous
From: Martijn van Oosterhout
Date:
Subject: Re: Having problems with a 25 million row table on 8.1.3
Next
From: "Joshua D. Drake"
Date:
Subject: Re: Having problems with a 25 million row table on 8.1.3