Thread: Tsearch2 / Create rule on select

Tsearch2 / Create rule on select

From
Jean-Michel Pouré
Date:
Dear Friends,

I am very impressed by TSearch2 and would like to thank Oleg and the
team for their hard work.

I would like to migrate a phpBB forum with more that 200.000 messages to
TSeach. Full text searches have become a bottleneck. When there are more
than 200 users, the server slow dramatically. I need TSearch!

So ... I installed TSearch2, a French dictionnary, everything is okay.
My database is PostgreSQL 8.2.3, UTF-8.

My questions now:
1) Should I migrate phpBB to TSeach2 in PHP code, rewriting the queries,
which is quite obvious or easy.

OR

2) Is there a smart way to catch queries on the fly in PostgreSQL, for
example using

"CREATE RULE "_RETURN" AS
    ON SELECT TO t1
    DO INSTEAD "

or the like.

The idea would be that when I search on a text field, PostgreSQL rule system would rewrite the query to seach using
Tsearch2. 
Is there any tool in TSeach2 doing this query rewriting rule?
Obviously, it does not seem to exist, but I would like to be sure.

3) Gin indexes
Are Gin indexes recommended for large databases?
How to calculate the size in memory of an index?

Kind regards,
Jean-Michel


Re: Tsearch2 / Create rule on select

From
Magnus Hagander
Date:
On Thu, Mar 08, 2007 at 09:56:23PM +0100, Jean-Michel Pouré wrote:
> Dear Friends,
>
> I am very impressed by TSearch2 and would like to thank Oleg and the
> team for their hard work.
>
> I would like to migrate a phpBB forum with more that 200.000 messages to
> TSeach. Full text searches have become a bottleneck. When there are more
> than 200 users, the server slow dramatically. I need TSearch!
>
> So ... I installed TSearch2, a French dictionnary, everything is okay.
> My database is PostgreSQL 8.2.3, UTF-8.
>
> My questions now:
> 1) Should I migrate phpBB to TSeach2 in PHP code, rewriting the queries,
> which is quite obvious or easy.

Yes.


> OR
>
> 2) Is there a smart way to catch queries on the fly in PostgreSQL, for
> example using
>
> "CREATE RULE "_RETURN" AS
>     ON SELECT TO t1
>     DO INSTEAD "
>
> or the like.

I don't think you can do that on-the-fly. But even if you can, if it's
reasonably easy to rewrite the query in the PHP, you should do that
instead IMHO.

> 3) Gin indexes
> Are Gin indexes recommended for large databases?

Yes. At least if you have lots of search activity, they can be a *lot*
faster than GiST (see for example
http://people.planetpostgresql.org/mha/index.php?/archives/112-GIN-performance-postgresql.org-websearch-part-2.html)


> How to calculate the size in memory of an index?

No idea. Assuming you want to do it beforehand. otherwise, just create
the index and see how large it got?

//Magnus

Re: Tsearch2 / Create rule on select

From
Jean-Michel Pouré
Date:
Le vendredi 09 mars 2007 à 10:58 +0100, Magnus Hagander a écrit :
> No idea. Assuming you want to do it beforehand. otherwise, just create
> the index and see how large it got?

Thank you for your comments. I will add TSeach2 support to phpBB 3.x
soon.

I had incredible response time on simple queries on more than plain-text
200.000 row. About 1 millisecond! Incredible but true.

explain analyse verbose SELECT * FROM phpbb_posts_text WHERE idxfti @@
'jmp'::tsquery limit 100;

Limit  (cost=0.00..444.47 rows=100 width=934) (actual time=0.046..0.824
rows=100 loops=1)
   ->  Index Scan using idxfti_idx on phpbb_posts_text
(cost=0.00..1053.38 rows=237 width=934) (actual time=0.040..0.418
rows=100 loops=1)
         Index Cond: (idxfti @@ '''jmp'''::tsquery)
 [color=red]Total runtime: 1.068 ms

TSearch2 will be used by wikimedia shortly (it is supported in beta
version).

IMHO, it would be better if PostgreSQL parser was able to find Tseach
indexes alone, rewritting the query automatically. Maybe it is on the
radar list of hackers.

Kind regards,
Jean-Michel




Re: Tsearch2 / Create rule on select

From
Oleg Bartunov
Date:
On Fri, 9 Mar 2007, Jean-Michel Pour? wrote:

> Le vendredi 09 mars 2007 ЪЪ 10:58 +0100, Magnus Hagander a ЪЪcrit :
>> No idea. Assuming you want to do it beforehand. otherwise, just create
>> the index and see how large it got?
>
> Thank you for your comments. I will add TSeach2 support to phpBB 3.x
> soon.
>
> I had incredible response time on simple queries on more than plain-text
> 200.000 row. About 1 millisecond! Incredible but true.
>
> explain analyse verbose SELECT * FROM phpbb_posts_text WHERE idxfti @@
> 'jmp'::tsquery limit 100;
>
> Limit  (cost=0.00..444.47 rows=100 width=934) (actual time=0.046..0.824
> rows=100 loops=1)
>   ->  Index Scan using idxfti_idx on phpbb_posts_text
> (cost=0.00..1053.38 rows=237 width=934) (actual time=0.040..0.418
> rows=100 loops=1)
>         Index Cond: (idxfti @@ '''jmp'''::tsquery)
> [color=red]Total runtime: 1.068 ms
>
> TSearch2 will be used by wikimedia shortly (it is supported in beta
> version).

I've already seen tsearch2 in 1.9.X release.

>
> IMHO, it would be better if PostgreSQL parser was able to find Tseach
> indexes alone, rewritting the query automatically. Maybe it is on the
> radar list of hackers.

btw, we're working on new FTS feature of PostgreSQL, it's basically
tsearch integrated into the pg core. But it has some new features and
new SQL commands for configuring of FTS, so if the matter is not pressing I'd recommend
to check http://mira.sai.msu.su/~megera/pgsql/ftsdoc


>
> Kind regards,
> Jean-Michel
>
>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 2: Don't 'kill -9' the postmaster
>

     Regards,
         Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

Re: Tsearch2 / Create rule on select

From
cedric
Date:
Le samedi 10 mars 2007 06:32, Oleg Bartunov a écrit :
> On Fri, 9 Mar 2007, Jean-Michel Pour? wrote:
> > Le vendredi 09 mars 2007 ЪЪ 10:58 +0100, Magnus Hagander a ЪЪcrit :
> >> No idea. Assuming you want to do it beforehand. otherwise, just create
> >> the index and see how large it got?
> >
> > Thank you for your comments. I will add TSeach2 support to phpBB 3.x
> > soon.
> >
> > I had incredible response time on simple queries on more than plain-text
> > 200.000 row. About 1 millisecond! Incredible but true.
> >
> > explain analyse verbose SELECT * FROM phpbb_posts_text WHERE idxfti @@
> > 'jmp'::tsquery limit 100;
> >
> > Limit  (cost=0.00..444.47 rows=100 width=934) (actual time=0.046..0.824
> > rows=100 loops=1)
> >   ->  Index Scan using idxfti_idx on phpbb_posts_text
> > (cost=0.00..1053.38 rows=237 width=934) (actual time=0.040..0.418
> > rows=100 loops=1)
> >         Index Cond: (idxfti @@ '''jmp'''::tsquery)
> > [color=red]Total runtime: 1.068 ms
> >
> > TSearch2 will be used by wikimedia shortly (it is supported in beta
> > version).
>
> I've already seen tsearch2 in 1.9.X release
unfortunely, rev 20094 wich start to correctly handle tsearch2 operator in
query parser is not in the 1.9.3 mediawiki svn tag but in the branch and
trunk.
>
> > IMHO, it would be better if PostgreSQL parser was able to find Tseach
> > indexes alone, rewritting the query automatically. Maybe it is on the
> > radar list of hackers.
>
> btw, we're working on new FTS feature of PostgreSQL, it's basically
> tsearch integrated into the pg core. But it has some new features and
> new SQL commands for configuring of FTS, so if the matter is not pressing
> I'd recommend to check http://mira.sai.msu.su/~megera/pgsql/ftsdoc
>
> > Kind regards,
> > Jean-Michel
> >
> >
> >
> >
> > ---------------------------(end of broadcast)---------------------------
> > TIP 2: Don't 'kill -9' the postmaster
>
>      Regards,
>          Oleg
> _____________________________________________________________
> Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
> Sternberg Astronomical Institute, Moscow University, Russia
> Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
> phone: +007(495)939-16-83, +007(495)939-23-83
> ---------------------------(end of broadcast)---------------------------
> TIP 3: Have you checked our extensive FAQ?
>
>                http://www.postgresql.org/docs/faq

Re: Tsearch2 / Create rule on select

From
Jean-Michel Pouré
Date:
Le samedi 10 mars 2007 à 08:32 +0300, Oleg Bartunov a écrit :
> btw, we're working on new FTS feature of PostgreSQL, it's basically
> tsearch integrated into the pg core. But it has some new features and
> new SQL commands for configuring of FTS, so if the matter is not
> pressing I'd recommend
> to check http://mira.sai.msu.su/~megera/pgsql/ftsdoc

Great. Thank you very much Oleg.
As suggested, I will play with tsearch_core-0.37.gz patch.

After testing first, I might be interested in using tseach_core on a
production database. It is a community forum with 200.000 messages.
tseach-core would only be used for searching messages. There is no money
at stake, the database is backed-up twice a day using cron jobs.

Whenever a bug happens, are there chances that PostgreSQL database gets
perverted? On only the parser? If there is no chance to destroy the
database, I may be interested in using Tsearch-core in production,
quickly.

When will tsearch-core become part of CVS-head?

Kind regards,
Jean-Michel


Re: Tsearch2 / Create rule on select

From
Magnus Hagander
Date:
Jean-Michel Pouré wrote:

> IMHO, it would be better if PostgreSQL parser was able to find Tseach
> indexes alone, rewritting the query automatically. Maybe it is on the
> radar list of hackers.

How would you "rewrite a query automatically"? PostgreSQL will find it
automatically when you attempt to do a match using the fulltext search
operator, which is the only one supported by tsearch. I'm not sure what
else it could do?

//Magnus